Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/99418
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor傅立成zh_TW
dc.contributor.advisorLi-Chen Fuen
dc.contributor.author陳裕翔zh_TW
dc.contributor.authorYu-Hsiang Chenen
dc.date.accessioned2025-09-10T16:13:42Z-
dc.date.available2025-09-11-
dc.date.copyright2025-09-10-
dc.date.issued2025-
dc.date.submitted2025-08-04-
dc.identifier.citation[1] Agora. Agora real-time voice and video engagement, 2025. 26, 44
[2] J. Brooke. Sus-a quick and dirty usability scale. Usability evaluation in industry, 189(194):4–7, 1996. x, 47, 51, 65
[3] J. Chen, Z.Lv, S.Wu,K.QinghongLin,C.Song,D.Gao,J.-W.Liu,Z.Gao,D.Mao, and M. Z. Shou. Videollm-online: Online video large language model for streaming video, June 01, 2024 2024. CVPR 2024. 3, 4, 5
[4] J. Chen, S. Xiao, P. Zhang, K. Luo, D. Lian, and Z. Liu. Bge m3-embedding: Multi-lingual, multi-functionality, multi-granularity text embeddings through self knowledge distillation, February 01, 2024 2024. 18
[5] W.Cheng, E.Kim, andJ.H.Ko. Handdagt: Adenoisingadaptive graph transformer for 3d hand pose estimation, 2024. 9
[6] M. Douze, A. Guzhva, C. Deng, J. Johnson, G. Szilvasy, P.-E. Mazaré, M. Lomeli, L. Hosseini, and H. Jégou. The faiss library, January 01, 2024 2024. 37
[7] H. Durrant-Whyte and T. Bailey. Simultaneous localization and mapping: part i. IEEE Robotics & Automation Magazine, 13(2):99–110, 2006. 8
[8] S. Fernández, M. Montagud, G. Cernigliaro, and D. Rincón. Multi-party holomeetings: Toward a new era of low-cost volumetric holographic meetings in virtual reality, June 01, 2022 2022. 42
[9] E. Games. Unreal engine: The most powerful real-time 3d creation tool, 2025. 11
[10] Google. Google meet: Online web and video conferencing calls, 2025. 1
[11] S. W. Greenwald, W. Corning, G. McDowell, P. Maes, and J. Belcher. Electrovr: an electrostatic playground for collaborative, simulation-based exploratory learning in immersive virtual reality, 2019. 3, 4, 5
[12] S. G. Hart and L. E. Staveland. Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research, volume 52, pages 139–183. Elsevier, 1988. x, 47, 51, 53, 65
[13] J. T. Inc. Jorjin technologies, 2019. 50
[14] R. Johansen. GroupWare: Computer Support for Business Teams. The Free Press, 1988. 7
[15] G. Lee, Y. Yang, J. Healey, and D. Manocha. Since u been gone: Augmenting context-aware transcriptions for re-engaging in immersive vr meetings, March 01, 2025 2025. 3, 4, 5, 45, 46
[16] P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Küttler, M. Lewis, W.-t. Yih, T. Rocktäschel, S. Riedel, and D. Kiela. Retrieval-augmented generation for knowledge-intensive nlp tasks, May 01, 2020 2020. 17, 37
[17] F. Li, R. Zhang, H. Zhang, Y. Zhang, B. Li, W. Li, Z. Ma, and C. Li. Llava-next-interleave: Tackling multi-image, video, and 3d in large multimodal models, July 01, 2024 2024. 15, 18, 37
[18] Meta. Orion ai glasses: The future of ar glasses technology, 2024. 1
[19] Microsoft. Microsoft teams- video conferencing, meetings, calling, 2025. 1
[20] R. Mur-Artal and J. D. Tardos. Orb-slam2: an open-source slam system for monocular, stereo and rgb-d cameras, October 01, 2016 2016. Accepted for publication in IEEE Transactions on Robotics; doi:10.1109/TRO.2017.2705103. 9
[21] OpenAI, J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. Leoni Aleman, D.Almeida, J. Altenschmidt, S. Altman, S. Anadkat, R. Avila, I. Babuschkin, S. Balaji, V. Balcom, P. Baltescu, H. Bao, M. Bavarian, J. Belgum, I. Bello, J. Berdine, G. Bernadett-Shapiro, C. Berner, L. Bogdonoff, O. Boiko, M. Boyd, A.-L. Brakman, G. Brockman, T. Brooks, M. Brundage, K. Button, T. Cai, R. Campbell, A. Cann, B. Carey, C. Carlson, R. Carmichael, B. Chan, C. Chang, F. Chantzis, D. Chen, S. Chen, R. Chen, J. Chen, M. Chen, B. Chess, C. Cho, C. Chu, H. W. Chung, D. Cummings, J. Currier, Y. Dai, C. Decareaux, T. Degry, N. Deutsch, D. Deville, A. Dhar, D. Dohan, S. Dowling, S. Dunning, A. Ecoffet, A. Eleti, T. Eloundou, D. Farhi, L. Fedus, N. Felix, S. Posada Fishman, J. Forte, I. Fulford, L. Gao, E. Georges, C. Gibson, V. Goel, T. Gogineni, G. Goh, R. Gontijo-Lopes, J. Gordon, M. Grafstein, S. Gray, R. Greene, J. Gross, S. S. Gu, Y. Guo, C. Hallacy, J. Han, J. Harris, Y. He, M. Heaton, J. Heidecke, C. Hesse, A. Hickey, W. Hickey, P. Hoeschele, B. Houghton, K. Hsu, S. Hu, X. Hu, J. Huizinga, S. Jain, S. Jain, et al. Gpt-4 technical report, March 01, 2023 2023. 2
[22] G. Pavlakos, V. Choutas, N. Ghorbani, T. Bolkart, A. A. A. Osman, D. Tzionas, and M.J. Black. Expressive body capture: 3d hands, face, and body from a single image, April 01, 2019 2019. 11
[23] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever. Learning transferable visual models from natural language supervision, February 01, 2021 2021. CLIP. 14, 15, 37
[24] A. Radford, J. W. Kim, T. Xu, G. Brockman, C. McLeavey, and I. Sutskever. Robust speech recognition via large-scale weak supervision, December 01, 2022 2022. 12, 36
[25] J. Romero, D. Tzionas, and M. J. Black. Embodied hands: Modeling and capturing hands and bodies together, January 01, 2022 2022. SIGGRAPH ASIA 2017; ACM Transactions on Graphics, Vol. 36, No. 6, Article 245. 9, 10
[26] U. Technologies. Unity real-time development platform, 2025. 11
[27] M. F. Ursu, M. Groen, M. Falelakis, M. Frantzis, V. Zsombori, and R. Kaiser. Orchestration: tv-like mixing grammars applied to video-communication for social groups, 2013. 42
[28] W. Weiss, R. Kaiser, and M. Falelakis. Orchestration for group videoconferencing: An interactive demonstrator, 2014. 42
[29] A.Yang,A.Li,B.Yang,B.Zhang,B.Hui,B.Zheng,B.Yu,C.Gao,C.Huang,C.Lv, C. Zheng, D. Liu, F. Zhou, F. Huang, F. Hu, H. Ge, H. Wei, H. Lin, J. Tang, J. Yang, J. Tu, J. Zhang, J. Yang, J. Yang, J. Zhou, J. Zhou, J. Lin, K. Dang, K. Bao, K. Yang, L. Yu, L. Deng, M. Li, M. Xue, M. Li, P. Zhang, P. Wang, Q. Zhu, R. Men, R. Gao, S. Liu, S. Luo, T. Li, T. Tang, W. Yin, X. Ren, X. Wang, X. Zhang, X. Ren, Y. Fan, Y. Su, Y. Zhang, Y. Zhang, Y. Wan, Y. Liu, Z. Wang, Z. Cui, Z. Zhang, Z. Zhou, and Z. Qiu. Qwen3 technical report, May 01, 2025 2025. 13, 15, 18
[30] S. Yang, J. Yim, J. Kim, and H. V. Shin. Catchlive: Real-time summarization of live streams with stream content and interaction data, 2022. 3, 4, 5
[31] F. Zhang, V. Bazarevsky, A. Vakunov, A. Tkachenka, G. Sung, C.-L. Chang, and M.Grundmann. Mediapipehands: On-devicereal-timehandtracking, June01, 2020. CVPR Workshop on Computer Vision for Augmented and Virtual Reality, Seattle, WA, USA, 2020. 9, 10, 21, 22, 23
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/99418-
dc.description.abstract自疫情以後,遠端會議成為部分工作的常態,現在主流的會議系統如 Google Meet 與 Microsoft Teams 被廣泛使用。而隨著 AR/VR 技術的進步,虛擬會議能夠得到進一步的提升,並提供高度沉浸感與互動性的體驗給使用者。
傳統會議中,通常會有人整理會議並紀錄以方便事後釐清內容;而現在視訊會議也有逐字稿等工具將講者的話轉錄下來,但在現在 VR/AR 的會議內,缺少紀錄並輔助使用者的工具。因此我們提出了 ARM-RSPA,一個即時紀錄會議內容與虛擬環境狀態,並根據使用者要求摘要且重播關鍵的互動的助手系統。透過 AR 重播,ARM-RSPA 提供使用者更加方便且易於理解的方法。
在後面的實驗中,ARM-RSPA 展現出在會議中的運用,並成功幫助使用者理解講者所講解的內容。我們也探討 ARM-RSPA 與其他遠端會議差別以及未來 AR 會議工具可能的發展。
zh_TW
dc.description.abstractSince the pandemic, remote meetings have become the norm for some work environments, with mainstream meeting systems like Google Meet and Microsoft Teams being widely used. With advancements in AR/VR technology, virtual meetings can be further enhanced to provide users with highly immersive and interactive experiences.
In traditional meetings, someone typically organizes and records the meeting to clarify content afterward. While video conferences now have tools like transcripts to record speakers' words, VR/AR meetings lack tools to help with the recording and assist users. To address this challenge, we develop ARM-RSPA in the research work, which is a real-time assistant system that records meeting content and virtual environment states, summarizes based on user requests, and replays key interactions. Through AR replay, ARM-RSPA provides users with a more convenient and easily understandable way to realize the content of the meetings.
To validate our research, we have conducted various experiments, where ARM-RSPA demonstrates its application in meetings and successfully helps users understand the content explained by speakers. We also discuss the differences between ARM-RSPA and other remote meeting solutions to highlight the high promise of our proposal system. Finally, we also mention our potential developments for other AR meeting tools in the future.
en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-09-10T16:13:42Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2025-09-10T16:13:42Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontentsAcknowledgements i
摘要ii
Abstract iii
Contents v
ListofFigures viii
ListofTables x
Chapter1 Introduction 1
1.1 Background. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 RelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 ThesisOrganization . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Chapter2 PreliminaryWork 7
2.1 RemoteCollaboration . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 SimultaneouslyLocalizationAndMapping . . . . . . . . . . . . . . 8
2.3 HandPoseEstimation . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4 HumanAvatar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.5 AutomaticSpeechRecognition. . . . . . . . . . . . . . . . . . . . . 12
2.5.1 Whisper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.6 LargeLanguageModel . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.6.1 Qwen3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.7 MutilmodalLargeLanguageModel . . . . . . . . . . . . . . . . . . 14
2.7.1 CLIP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.7.2 Llava-Next . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.8 VideoSummarization. . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.9 Retrieval-AugmentedGeneration . . . . . . . . . . . . . . . . . . . 17
Chapter3 Methodology 19
3.1 ApplicationDevelopment . . . . . . . . . . . . . . . . . . . . . . . 21
3.1.1 UserInterfaceControl. . . . . . . . . . . . . . . . . . . . . . . . . 22
3.1.2 ObjectManipulation . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.1.3 SocialInteractions. . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.1.4 MeetingTools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.1.5 UserEvent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.1.5.1 ActionFrameFile . . . . . . . . . . . . . . . . . . . . 30
3.1.6 MeetingAssistant . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2 AssistantFramework . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2.1 Multi-agentArchitecture . . . . . . . . . . . . . . . . . . . . . . . 36
3.2.2 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.2.3 SummaryGeneration . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.2.4 PlaybackLogRetriever . . . . . . . . . . . . . . . . . . . . . . . . 40
3.3 ServerCommunicationandSynchronization . . . . . . . . . . . . . 41
3.3.1 CommunicationProtocol . . . . . . . . . . . . . . . . . . . . . . . 42
3.3.2 Orchestration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.3.3 MediaServers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.4 UseCaseScenario . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.4.1 ScenarioSettings:RemoteCarSalesPresentation . . . . . . . . . . 45
3.4.2 Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.1 UserStudy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.1.1 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.1.2 ExperimentalSetup . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.1.3 Participant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.1.4 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.1.5 ResultandDiscussion. . . . . . . . . . . . . . . . . . . . . . . . . 55
4.2 RuntimePerformanceEvaluation . . . . . . . . . . . . . . . . . . . 58
4.2.1 ExperimentalSetup . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.2.2 ResultandDiscussion. . . . . . . . . . . . . . . . . . . . . . . . . 60
Chapter5 Conclusion 62
References 64
AppendixA—UserStudy 69
A.1 Questionnaire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
-
dc.language.isoen-
dc.subject虛擬實境/擴增實境zh_TW
dc.subject大型語言模型助理zh_TW
dc.subject即時摘要zh_TW
dc.subjectLLM Assistanten
dc.subjectVR/ARen
dc.subjectLive Stream Summarizationen
dc.titleAR遠端會議中的即時摘要與重播助理zh_TW
dc.titleARM-RSPA: Augmented Reality Meeting with Real-Time Summarization and Playback Assistanten
dc.typeThesis-
dc.date.schoolyear113-2-
dc.description.degree碩士-
dc.contributor.oralexamcommittee歐陽明;陳祝嵩;鄭龍磻;莊永裕;徐偉恩zh_TW
dc.contributor.oralexamcommitteeMing Ouhyoung;Chu-Song Chen;Lung-Pan Cheng;Yung-Yu Chuang;Wei En Hsuen
dc.subject.keyword虛擬實境/擴增實境,即時摘要,大型語言模型助理,zh_TW
dc.subject.keywordVR/AR,Live Stream Summarization,LLM Assistant,en
dc.relation.page69-
dc.identifier.doi10.6342/NTU202502618-
dc.rights.note同意授權(全球公開)-
dc.date.accepted2025-08-07-
dc.contributor.author-college電機資訊學院-
dc.contributor.author-dept資訊工程學系-
dc.date.embargo-lift2028-09-01-
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
ntu-113-2.pdf
  此日期後於網路公開 2028-09-01
64.01 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved