結合 3D 高斯潑灑與大型語言模型於文物之對話展示

李瑋軒; Wei-Hsuan Li

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/99318

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	洪一平	zh_TW
dc.contributor.advisor	Yi-Ping Hung	en
dc.contributor.author	李瑋軒	zh_TW
dc.contributor.author	Wei-Hsuan Li	en
dc.date.accessioned	2025-08-22T16:09:39Z	-
dc.date.available	2025-08-23	-
dc.date.copyright	2025-08-22	-
dc.date.issued	2025	-
dc.date.submitted	2025-08-06	-
dc.identifier.citation	[1] N. Amato, B. De Carolis, F. de Gioia, C. Loglisci, G. Palestra, and M. N. Venezia. Can an ai-driven vtuber engage people? the kawaii case study. In SOCIALIZE 2024, CEUR Workshop Proceedings, 2024. [2] T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amodei. Language models are few-shot learners. In H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020. [3] P. Chen, S. Cheng, W. Chen, Y. Lin, and Y. Chen. Measuring taiwanese mandarin language understanding. CoRR, abs/2403.20180, 2024. [4] M. Douze, A. Guzhva, C. Deng, J. Johnson, G. Szilvasy, P.-E. Mazar’e, M. Lomeli, L. Hosseini, and H. J’egou. The faiss library. ArXiv, abs/2401.08281, 2024. [5] A. Dubey, A. Jauhri, A. Pandey, A. Kadian, A. Al-Dahle, A. Letman, A. Mathur, A. Schelten, A. Yang, A. Fan, A. Goyal, A. Hartshorn, A. Yang, A. Mitra, A. Sravankumar, A. Korenev, A. Hinsvark, A. Rao, A. Zhang, A. Rodriguez, A. Gregerson, A. Spataru, B. Rozière, B. Biron, B. Tang, B. Chern, C. Caucheteux, C. Nayak, C. Bi, C. Marra, C. McConnell, C. Keller, C. Touret, C. Wu, C. Wong, C. C. Ferrer, C. Nikolaidis, D. Allonsius, D. Song, D. Pintz, D. Livshits, D. Esiobu, D. Choudhary, D. Mahajan, D. Garcia-Olano, D. Perino, D. Hupkes, E. Lakomkin, E. AlBadawy, E. Lobanova, E. Dinan, E. M. Smith, F. Radenovic, F. Zhang, G. Synnaeve, G. Lee, G. L. Anderson, G. Nail, G. Mialon, G. Pang, G. Cucurell, H. Nguyen, H. Korevaar, H. Xu, H. Touvron, I. Zarov, I. A. Ibarra, I. M. Kloumann, I. Misra, I. Evtimov, J. Copet, J. Lee, J. Geffert, J. Vranes, J. Park, J. Mahadeokar, J. Shah, J. van der Linde, J. Billock, J. Hong, J. Lee, J. Fu, J. Chi, J. Huang, J. Liu, J. Wang, J. Yu, J. Bitton, J. Spisak, J. Park, J. Rocca, J. Johnstun, J. Saxe, J. Jia, K. V. Alwala, K. Upasani, K. Plawiak, K. Li, K. Heafield, K. Stone, and et al. The llama 3 herd of models. CoRR, abs/2407.21783, 2024. [6] M. Duguleană, V.-A. Briciu, I.-A. Duduman, and O. M. Machidon. A virtual assistant for natural interactions in museums. Sustainability, 12(17), 2020. [7] C.-J. Hsu, C.-L. Liu, F. Liao, P.-C. Hsu, Y.-C. Chen, and D. shan Shiu. Advancing the evaluation of traditional chinese language models: Towards a comprehensive benchmark suite. ArXiv, abs/2309.08448, 2023. [8] C.-J. Hsu, C.-L. Liu, F. Liao, P.-C. Hsu, Y.-C. Chen, and D. shan Shiu. Breeze-7b technical report. ArXiv, abs/2403.02712, 2024. [9] C.-J. Hsu, C.-S. Liu, M.-H. Chen, M. Chen, P.-C. Hsu, Y.-C. Chen, and D. shan Shiu. The breeze 2 herd of models: Traditional chinese llms based on llama with vision-aware and function-calling capabilities. ArXiv, abs/2501.13921, 2025. [10] J. Huang. Vtubers: The influence of crossing cultural boundaries from japan to america on this media genre. The Macksey Journal, 5(1), 2025. [11] L. Huang, W. Yu, W. Ma, W. Zhong, Z. Feng, H. Wang, Q. Chen, W. Peng, X. Feng, B. Qin, et al. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. ACM Transactions on Information Systems, 43(2):1–55, 2025. [12] A. Q. Jiang, A. Sablayrolles, A. Mensch, C. Bamford, D. S. Chaplot, D. de Las Casas, F. Bressand, G. Lengyel, G. Lample, L. Saulnier, L. R. Lavaud, M.-A. Lachaux, P. Stock, T. L. Scao, T. Lavril, T. Wang, T. Lacroix, and W. E. Sayed. Mistral 7b. ArXiv, abs/2310.06825, 2023. [13] A. Q. Jiang, A. Sablayrolles, A. Roux, A. Mensch, B. Savary, C. Bamford, D. S. Chaplot, D. de Las Casas, E. B. Hanna, F. Bressand, G. Lengyel, G. Bour, G. Lample, L. R. Lavaud, L. Saulnier, M. Lachaux, P. Stock, S. Subramanian, S. Yang, S. Antoniak, T. L. Scao, T. Gervet, T. Lavril, T. Wang, T. Lacroix, and W. E. Sayed. Mixtral of experts. CoRR, abs/2401.04088, 2024. [14] B. Kerbl, G. Kopanas, T. Leimkühler, and G. Drettakis. 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics, 42(4), July 2023. [15] A. Laurentini. The visual hull concept for silhouette-based image understanding. IEEE Trans. Pattern Anal. Mach. Intell., 16(2):150–162, 1994. [16] P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Küttler, M. Lewis, W.-t. Yih, T. Rocktäschel, et al. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in neural information processing systems, 33:9459–9474, 2020. [17] Y. Lin and Y. Chen. Taiwan LLM: bridging the linguistic divide with a culturally aligned language model. CoRR, abs/2311.17487, 2023. [18] Z. Lu, C. Shen, J. Li, H. Shen, and D. Wigdor. More kawaii than a real-person live streamer: understanding how the otaku community engages with and perceives virtual youtubers. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, pages 1–14, 2021. [19] B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng. Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV, 2020. [20] R. Rafailov, A. Sharma, E. Mitchell, C. D. Manning, S. Ermon, and C. Finn. Direct preference optimization: Your language model is secretly a reward model. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors, Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023, 2023. [21] J. L. Schönberger and J.-M. Frahm. Structure-from-motion revisited. In Conference on Computer Vision and Pattern Recognition (CVPR), 2016. [22] Science & Technology Policy Research and Information Center (NARLabs). Trustworthy AI Dialogue Engine (TAIDE). https://taide.tw, 2024. Accessed: 2025- 06-26. [23] Z. R. Tam, Y.-T. Pai, Y.-W. Lee, S. Cheng, and H.-H. Shuai. An improved traditional chinese evaluation suite for foundation model. ArXiv, abs/2403.01858, 2024. [24] Tayx. Graphy: A real-time graph display plugin for unity. https://github.com/ Tayx94/graphy, 2019. Accessed: 2025-07-24. [25] G. Trichopoulos. Large language models for cultural heritage. In Proceedings of the 2nd International Conference of the ACM Greek SIGCHI Chapter, pages 1–5, 2023. [26] I. Vasic, H.-G. Fill, R. Quattrini, and R. Pierdicca. Llm-aided museum guide: Personalized tours based on user preferences. In International Conference on Extended Reality, pages 249–262. Springer, 2024. [27] S. Wang, V. Leroy, Y. Cabon, B. Chidlovskii, and J. Revaud. Dust3r: Geometric 3d vision made easy. In CVPR, 2024. [28] Z. Wang, L.-P. Yuan, L. Wang, B. Jiang, and W. Zeng. Virtuwander: Enhancing multi-modal interaction for virtual tour guidance through large language models. In Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems, CHI ’24, New York, NY, USA, 2024. Association for Computing Machinery. [29] C. Yang, S. Li, J. Fang, R. Liang, L. Xie, X. Zhang, W. Shen, and Q. Tian. Gaussianobject: High-quality 3d object reconstruction from four views with gaussian splatting. ACM Transactions on Graphics, 2024. [30] Q. Zhou, J. Park, and V. Koltun. Open3d: A modern library for 3d data processing. CoRR, abs/1801.09847, 2018.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/99318	-
dc.description.abstract	本文提出一種創新的文物展示方式，設計了一個能讓文物「說話」的虛擬實境系統。不同於傳統由虛擬導覽員解說的方式，本系統賦予文物聲音與個性，讓文物具備了對話的能力，而不只是靜態展示品。我們首先利用 3D 高斯潑灑技術重建真實且高品質的文物模型，接著結合本地部署的大型語言模型讓文物能夠根據輸入的語音產生回應。我們亦導入檢索增強生成技術，讓模型能引用正確資料以提升回答的正確性。為了找出最適合展現文物個性以及產生生動回應的大型語言模型，我們對不同的繁體中文大型語言模型進行了比較，也分析了檢索增強生成對於提升回答正確性的效果。最後，我們於頭戴式裝置上測試 3D 高斯潑灑文物模型的視覺表現，探討高斯點數量、文物大小、與觀看距離對渲染順暢度的影響。	zh_TW
dc.description.abstract	We present a new approach to the exhibition of cultural artifacts by designing a system that enables the artifacts to speak in virtual reality. Unlike traditional virtual museum guides, this system gives artifacts their own voices and personalities, allowing them to engage in conversation instead of just being silent objects on display. We first reconstruct high-quality artifact models with 3D Gaussian Splatting. Then we integrate a locally deployed large language model to generate responses from speech input. We also incorporate Retrieval-Augmented Generation (RAG) to improve the correctness of the responses by allowing the model to reference relevant context. We compare different Traditional Chinese large language models to identify the best for generating vivid and characterful responses, and we analyze the effectiveness of RAG in enhancing response quality. Finally, we evaluate the visual performance of artifact models on a VR headset, examining how splat count, artifact size, and viewing distance affect rendering performance.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-08-22T16:09:39Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2025-08-22T16:09:39Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	Acknowledgements i 摘要 ii Abstract iii Contents iv List of Figures vi List of Tables viii Chapter 1 Introduction 1 Chapter 2 Related Work 3 2.1 3D Gaussian Splatting . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.2 Traditional Chinese Large Language Models . . . . . . . . . . . . . 4 2.3 Retrieval-Augmented Generation (RAG) . . . . . . . . . . . . . . . 5 2.4 Conversational Agents . . . . . . . . . . . . . . . . . . . . . . . . . 6 Chapter 3 3D Reconstruction of Cultural Artifacts 8 3.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.2 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.3 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Chapter 4 System Design 16 4.1 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 4.2 Backend Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 17 4.3 Retrieval-Augmented Generation . . . . . . . . . . . . . . . . . . . 17 4.4 VR Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Chapter 5 System Evaluation 22 5.1 Large Language Model Comparison . . . . . . . . . . . . . . . . . . 22 5.2 Retrieval-Augmented Generation Comparison . . . . . . . . . . . . . 26 5.3 Rendering Performance Evaluation . . . . . . . . . . . . . . . . . . 29 Chapter 6 Conclusion and Future Work 37 References 39	-
dc.language.iso	en	-
dc.subject	3D 高斯潑灑	zh_TW
dc.subject	虛擬實境	zh_TW
dc.subject	對話式AI	zh_TW
dc.subject	大型語言模型	zh_TW
dc.subject	語音互動	zh_TW
dc.subject	Conversational AI	en
dc.subject	3D Gaussian Splatting	en
dc.subject	Voice Interaction	en
dc.subject	Large Language Model	en
dc.subject	Virtual Reality	en
dc.title	結合 3D 高斯潑灑與大型語言模型於文物之對話展示	zh_TW
dc.title	Integrating 3D Gaussian Splatting and Large Language Models for Conversational Exhibition of Cultural Artifacts	en
dc.type	Thesis	-
dc.date.schoolyear	113-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	歐陽明;王鈺強;林經堯;陳昱吉	zh_TW
dc.contributor.oralexamcommittee	Ming Ouhyoung;Yu-Chiang Wang;Jin-Yao Lin;Yu-Chi Chen	en
dc.subject.keyword	3D 高斯潑灑,虛擬實境,對話式AI,大型語言模型,語音互動,	zh_TW
dc.subject.keyword	3D Gaussian Splatting,Virtual Reality,Conversational AI,Large Language Model,Voice Interaction,	en
dc.relation.page	43	-
dc.identifier.doi	10.6342/NTU202502499	-
dc.rights.note	同意授權(全球公開)	-
dc.date.accepted	2025-08-12	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	資訊工程學系	-
dc.date.embargo-lift	2025-08-23	-
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-113-2.pdf	18.65 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。