Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/99318
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor洪一平zh_TW
dc.contributor.advisorYi-Ping Hungen
dc.contributor.author李瑋軒zh_TW
dc.contributor.authorWei-Hsuan Lien
dc.date.accessioned2025-08-22T16:09:39Z-
dc.date.available2025-08-23-
dc.date.copyright2025-08-22-
dc.date.issued2025-
dc.date.submitted2025-08-06-
dc.identifier.citation[1] N. Amato, B. De Carolis, F. de Gioia, C. Loglisci, G. Palestra, and M. N. Venezia. Can an ai-driven vtuber engage people? the kawaii case study. In SOCIALIZE 2024, CEUR Workshop Proceedings, 2024.
[2] T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amodei. Language models are few-shot learners. In H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020.
[3] P. Chen, S. Cheng, W. Chen, Y. Lin, and Y. Chen. Measuring taiwanese mandarin language understanding. CoRR, abs/2403.20180, 2024.
[4] M. Douze, A. Guzhva, C. Deng, J. Johnson, G. Szilvasy, P.-E. Mazar’e, M. Lomeli, L. Hosseini, and H. J’egou. The faiss library. ArXiv, abs/2401.08281, 2024.
[5] A. Dubey, A. Jauhri, A. Pandey, A. Kadian, A. Al-Dahle, A. Letman, A. Mathur, A. Schelten, A. Yang, A. Fan, A. Goyal, A. Hartshorn, A. Yang, A. Mitra, A. Sravankumar, A. Korenev, A. Hinsvark, A. Rao, A. Zhang, A. Rodriguez, A. Gregerson, A. Spataru, B. Rozière, B. Biron, B. Tang, B. Chern, C. Caucheteux, C. Nayak, C. Bi, C. Marra, C. McConnell, C. Keller, C. Touret, C. Wu, C. Wong, C. C. Ferrer, C. Nikolaidis, D. Allonsius, D. Song, D. Pintz, D. Livshits, D. Esiobu, D. Choudhary, D. Mahajan, D. Garcia-Olano, D. Perino, D. Hupkes, E. Lakomkin, E. AlBadawy, E. Lobanova, E. Dinan, E. M. Smith, F. Radenovic, F. Zhang, G. Synnaeve, G. Lee, G. L. Anderson, G. Nail, G. Mialon, G. Pang, G. Cucurell, H. Nguyen, H. Korevaar, H. Xu, H. Touvron, I. Zarov, I. A. Ibarra, I. M. Kloumann, I. Misra, I. Evtimov, J. Copet, J. Lee, J. Geffert, J. Vranes, J. Park, J. Mahadeokar, J. Shah, J. van der Linde, J. Billock, J. Hong, J. Lee, J. Fu, J. Chi, J. Huang, J. Liu, J. Wang, J. Yu, J. Bitton, J. Spisak, J. Park, J. Rocca, J. Johnstun, J. Saxe, J. Jia, K. V. Alwala, K. Upasani, K. Plawiak, K. Li, K. Heafield, K. Stone, and et al. The llama 3 herd of models. CoRR, abs/2407.21783, 2024.
[6] M. Duguleană, V.-A. Briciu, I.-A. Duduman, and O. M. Machidon. A virtual assistant for natural interactions in museums. Sustainability, 12(17), 2020.
[7] C.-J. Hsu, C.-L. Liu, F. Liao, P.-C. Hsu, Y.-C. Chen, and D. shan Shiu. Advancing the evaluation of traditional chinese language models: Towards a comprehensive benchmark suite. ArXiv, abs/2309.08448, 2023.
[8] C.-J. Hsu, C.-L. Liu, F. Liao, P.-C. Hsu, Y.-C. Chen, and D. shan Shiu. Breeze-7b technical report. ArXiv, abs/2403.02712, 2024.
[9] C.-J. Hsu, C.-S. Liu, M.-H. Chen, M. Chen, P.-C. Hsu, Y.-C. Chen, and D. shan Shiu. The breeze 2 herd of models: Traditional chinese llms based on llama with vision-aware and function-calling capabilities. ArXiv, abs/2501.13921, 2025.
[10] J. Huang. Vtubers: The influence of crossing cultural boundaries from japan to america on this media genre. The Macksey Journal, 5(1), 2025.
[11] L. Huang, W. Yu, W. Ma, W. Zhong, Z. Feng, H. Wang, Q. Chen, W. Peng, X. Feng, B. Qin, et al. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. ACM Transactions on Information Systems, 43(2):1–55, 2025.
[12] A. Q. Jiang, A. Sablayrolles, A. Mensch, C. Bamford, D. S. Chaplot, D. de Las Casas, F. Bressand, G. Lengyel, G. Lample, L. Saulnier, L. R. Lavaud, M.-A. Lachaux, P. Stock, T. L. Scao, T. Lavril, T. Wang, T. Lacroix, and W. E. Sayed. Mistral 7b. ArXiv, abs/2310.06825, 2023.
[13] A. Q. Jiang, A. Sablayrolles, A. Roux, A. Mensch, B. Savary, C. Bamford, D. S. Chaplot, D. de Las Casas, E. B. Hanna, F. Bressand, G. Lengyel, G. Bour, G. Lample, L. R. Lavaud, L. Saulnier, M. Lachaux, P. Stock, S. Subramanian, S. Yang, S. Antoniak, T. L. Scao, T. Gervet, T. Lavril, T. Wang, T. Lacroix, and W. E. Sayed. Mixtral of experts. CoRR, abs/2401.04088, 2024.
[14] B. Kerbl, G. Kopanas, T. Leimkühler, and G. Drettakis. 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics, 42(4), July 2023.
[15] A. Laurentini. The visual hull concept for silhouette-based image understanding. IEEE Trans. Pattern Anal. Mach. Intell., 16(2):150–162, 1994.
[16] P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Küttler, M. Lewis, W.-t. Yih, T. Rocktäschel, et al. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in neural information processing systems, 33:9459–9474, 2020.
[17] Y. Lin and Y. Chen. Taiwan LLM: bridging the linguistic divide with a culturally aligned language model. CoRR, abs/2311.17487, 2023.
[18] Z. Lu, C. Shen, J. Li, H. Shen, and D. Wigdor. More kawaii than a real-person live streamer: understanding how the otaku community engages with and perceives virtual youtubers. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, pages 1–14, 2021.
[19] B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng. Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV, 2020.
[20] R. Rafailov, A. Sharma, E. Mitchell, C. D. Manning, S. Ermon, and C. Finn. Direct preference optimization: Your language model is secretly a reward model. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors, Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023, 2023.
[21] J. L. Schönberger and J.-M. Frahm. Structure-from-motion revisited. In Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
[22] Science & Technology Policy Research and Information Center (NARLabs). Trustworthy AI Dialogue Engine (TAIDE). https://taide.tw, 2024. Accessed: 2025- 06-26.
[23] Z. R. Tam, Y.-T. Pai, Y.-W. Lee, S. Cheng, and H.-H. Shuai. An improved traditional chinese evaluation suite for foundation model. ArXiv, abs/2403.01858, 2024.
[24] Tayx. Graphy: A real-time graph display plugin for unity. https://github.com/ Tayx94/graphy, 2019. Accessed: 2025-07-24.
[25] G. Trichopoulos. Large language models for cultural heritage. In Proceedings of the 2nd International Conference of the ACM Greek SIGCHI Chapter, pages 1–5, 2023.
[26] I. Vasic, H.-G. Fill, R. Quattrini, and R. Pierdicca. Llm-aided museum guide: Personalized tours based on user preferences. In International Conference on Extended Reality, pages 249–262. Springer, 2024.
[27] S. Wang, V. Leroy, Y. Cabon, B. Chidlovskii, and J. Revaud. Dust3r: Geometric 3d vision made easy. In CVPR, 2024.
[28] Z. Wang, L.-P. Yuan, L. Wang, B. Jiang, and W. Zeng. Virtuwander: Enhancing multi-modal interaction for virtual tour guidance through large language models. In Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems, CHI ’24, New York, NY, USA, 2024. Association for Computing Machinery.
[29] C. Yang, S. Li, J. Fang, R. Liang, L. Xie, X. Zhang, W. Shen, and Q. Tian. Gaussianobject: High-quality 3d object reconstruction from four views with gaussian splatting. ACM Transactions on Graphics, 2024.
[30] Q. Zhou, J. Park, and V. Koltun. Open3d: A modern library for 3d data processing. CoRR, abs/1801.09847, 2018.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/99318-
dc.description.abstract本文提出一種創新的文物展示方式,設計了一個能讓文物「說話」的虛擬實境系統。不同於傳統由虛擬導覽員解說的方式,本系統賦予文物聲音與個性,讓文物具備了對話的能力,而不只是靜態展示品。我們首先利用 3D 高斯潑灑技術重建真實且高品質的文物模型,接著結合本地部署的大型語言模型讓文物能夠根據輸入的語音產生回應。我們亦導入檢索增強生成技術,讓模型能引用正確資料以提升回答的正確性。為了找出最適合展現文物個性以及產生生動回應的大型語言模型,我們對不同的繁體中文大型語言模型進行了比較,也分析了檢索增強生成對於提升回答正確性的效果。最後,我們於頭戴式裝置上測試 3D 高斯潑灑文物模型的視覺表現,探討高斯點數量、文物大小、與觀看距離對渲染順暢度的影響。zh_TW
dc.description.abstractWe present a new approach to the exhibition of cultural artifacts by designing a system that enables the artifacts to speak in virtual reality. Unlike traditional virtual museum guides, this system gives artifacts their own voices and personalities, allowing them to engage in conversation instead of just being silent objects on display. We first reconstruct high-quality artifact models with 3D Gaussian Splatting. Then we integrate a locally deployed large language model to generate responses from speech input. We also incorporate Retrieval-Augmented Generation (RAG) to improve the correctness of the responses by allowing the model to reference relevant context. We compare different Traditional Chinese large language models to identify the best for generating vivid and characterful responses, and we analyze the effectiveness of RAG in enhancing response quality. Finally, we evaluate the visual performance of artifact models on a VR headset, examining how splat count, artifact size, and viewing distance affect rendering performance.en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-08-22T16:09:39Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2025-08-22T16:09:39Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontentsAcknowledgements i
摘要 ii
Abstract iii
Contents iv
List of Figures vi
List of Tables viii
Chapter 1 Introduction 1
Chapter 2 Related Work 3
2.1 3D Gaussian Splatting . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Traditional Chinese Large Language Models . . . . . . . . . . . . . 4
2.3 Retrieval-Augmented Generation (RAG) . . . . . . . . . . . . . . . 5
2.4 Conversational Agents . . . . . . . . . . . . . . . . . . . . . . . . . 6
Chapter 3 3D Reconstruction of Cultural Artifacts 8
3.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.3 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Chapter 4 System Design 16
4.1 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.2 Backend Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.3 Retrieval-Augmented Generation . . . . . . . . . . . . . . . . . . . 17
4.4 VR Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Chapter 5 System Evaluation 22
5.1 Large Language Model Comparison . . . . . . . . . . . . . . . . . . 22
5.2 Retrieval-Augmented Generation Comparison . . . . . . . . . . . . . 26
5.3 Rendering Performance Evaluation . . . . . . . . . . . . . . . . . . 29
Chapter 6 Conclusion and Future Work 37
References 39
-
dc.language.isoen-
dc.subject3D 高斯潑灑zh_TW
dc.subject虛擬實境zh_TW
dc.subject對話式AIzh_TW
dc.subject大型語言模型zh_TW
dc.subject語音互動zh_TW
dc.subjectConversational AIen
dc.subject3D Gaussian Splattingen
dc.subjectVoice Interactionen
dc.subjectLarge Language Modelen
dc.subjectVirtual Realityen
dc.title結合 3D 高斯潑灑與大型語言模型於文物之對話展示zh_TW
dc.titleIntegrating 3D Gaussian Splatting and Large Language Models for Conversational Exhibition of Cultural Artifactsen
dc.typeThesis-
dc.date.schoolyear113-2-
dc.description.degree碩士-
dc.contributor.oralexamcommittee歐陽明;王鈺強;林經堯;陳昱吉zh_TW
dc.contributor.oralexamcommitteeMing Ouhyoung;Yu-Chiang Wang;Jin-Yao Lin;Yu-Chi Chenen
dc.subject.keyword3D 高斯潑灑,虛擬實境,對話式AI,大型語言模型,語音互動,zh_TW
dc.subject.keyword3D Gaussian Splatting,Virtual Reality,Conversational AI,Large Language Model,Voice Interaction,en
dc.relation.page43-
dc.identifier.doi10.6342/NTU202502499-
dc.rights.note同意授權(全球公開)-
dc.date.accepted2025-08-12-
dc.contributor.author-college電機資訊學院-
dc.contributor.author-dept資訊工程學系-
dc.date.embargo-lift2025-08-23-
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
ntu-113-2.pdf18.65 MBAdobe PDF檢視/開啟
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved