即時性多模態大型語言模型代理人架構於機械手臂對話機器人一般性取放任務及自訂義按摩任務之應用

薛龍; Lung Hsueh

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/100918

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	蔡孟勳	zh_TW
dc.contributor.advisor	Meng-Shiun Tsai	en
dc.contributor.author	薛龍	zh_TW
dc.contributor.author	Lung Hsueh	en
dc.date.accessioned	2025-11-26T16:05:00Z	-
dc.date.available	2025-11-27	-
dc.date.copyright	2025-11-26	-
dc.date.issued	2025	-
dc.date.submitted	2025-11-20	-
dc.identifier.citation	Vaswani et al., “Attention Is All You Need,” Advances in Neural Information Processing Systems (NeurIPS), 2017. [Online]. Available: https://arxiv.org/abs/1706.03762 J. Devlin, M. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” 2019. [Online]. Available: https://arxiv.org/abs/1810.04805 T. Brown et al., “Language Models are Few-Shot Learners,” NeurIPS, 2020. [Online]. Available: https://arxiv.org/abs/2005.14165 J. Kaplan et al., “Scaling Laws for Neural Language Models,” 2020. [Online]. Available: https://arxiv.org/abs/2001.08361 J. Hoffmann et al., “Training Compute-Optimal Large Language Models,” 2022. [Online]. Available: https://arxiv.org/abs/2203.15556 L. Ouyang et al., “Training language models to follow instructions with human feedback,” 2022. [Online]. Available: https://arxiv.org/abs/2203.02155 P. Christiano et al., “Deep Reinforcement Learning from Human Preferences,” 2017. [Online]. Available: https://arxiv.org/abs/1706.03741 Dosovitskiy et al., “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale,” 2020. [Online]. Available: https://arxiv.org/abs/2010.11929 Apple Machine Learning Research, “4M: Massively Multimodal Masked Modeling,” 2024. [Online]. Available: https://machinelearning.apple.com/research/4m Z. Li et al., “Eagle 2: Building Post-Training Data Strategies from Scratch for Frontier VLMs,” 2025. [Online]. Available: https://arxiv.org/abs/2501.01234 NVIDIA Research, “EAGLE/Eagle2 project,” 2025. [Online]. Available: https://github.com/NVLabs/EAGLE P. Wang et al., “Qwen2-VL: Enhancing Vision-Language Models’ Dynamic Resolution,” 2024. [Online]. Available: https://arxiv.org/abs/2407.12345 OpenCompass, “Open-VLM Leaderboard,” 2024. [Online]. Available: https://opencompass.org.cn/leaderboard/vlm Y. Xu et al., “VLMEvalKit: An Open-Source Toolkit for Evaluating LVLMs,” 2024. [Online]. Available: https://github.com/open-compass/VLMEvalKit J. Chen et al., “MEGA-Bench: Scaling Multimodal Evaluation to 500+ Real-World Tasks,” 2024. [Online]. Available: https://openreview.net/forum?id=xyz456 T. Rädsch et al., “Scalable Benchmark Generation for VLM Evaluation,” 2025. [Online]. Available: https://arxiv.org/abs/2503.04567 M. J. Kim et al., “OpenVLA: An Open-Source Vision-Language-Action Model,” 2024. [Online]. Available: https://openvla.github.io NVIDIA Research, “GR00T N1: An Open Foundation Model for Generalist Humanoid Robots,” 2025. [Online]. Available: https://research.nvidia.com/labs/toronto-ai/gr00t S. Zhang et al., “VLABench: A Large-Scale Benchmark for Language-Conditioned Manipulation,” 2024. [Online]. Available: https://arxiv.org/abs/2408.06789 K. Guu, T. B. Hashimoto, Y. Oren, and P. Liang, “REALM: Retrieval-Augmented Language Model Pre-Training,” International Conference on Machine Learning (ICML), 2020. [Online]. Available: https://arxiv.org/abs/2002.08909 V. Karpukhin, B. Oguz, S. Min, L. Wu, S. Edunov, D. Chen, and W. Yih, “Dense Passage Retrieval for Open-Domain Question Answering,” Empirical Methods in Natural Language Processing (EMNLP), 2020. [Online]. Available: https://arxiv.org/abs/2004.04906 P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Küttler, M. Lewis, W. Yih, T. Rocktäschel, S. Riedel, and D. Kiela, “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks,” Advances in Neural Information Processing Systems (NeurIPS), 2020. [Online]. Available: https://arxiv.org/abs/2005.11401 N. Thakur, N. Reimers, J. Daxenberger, and I. Gurevych, “BEIR: A Heterogeneous Benchmark for Zero-shot Evaluation of Information Retrieval Models,” NeurIPS Datasets and Benchmarks Track, 2021. [Online]. Available: https://arxiv.org/abs/2104.08663 Khattab and M. Zaharia, “ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT,” SIGIR, 2020. [Online]. Available: https://arxiv.org/abs/2004.12832 Y. A. Malkov and D. A. Yashunin, “Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs,” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2018. [Online]. Available: https://arxiv.org/abs/1603.09320 J. Formal, B. Piwowarski, and S. Clinchant, “SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking,” SIGIR, 2021. [Online]. Available: https://arxiv.org/abs/2107.05720 Chawla, “RAG vs Agentic RAG,” _Daily Dose of Data Science_ (blog), Dec. 20, 2024. [Online]. Available: https://blog.dailydoseofds.com/p/rag-vs-agentic-rag Z. Yang, P. Qi, S. Zhang, Y. Bengio, W. Cohen, R. Salakhutdinov, and C. D. Manning, “HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering,” EMNLP, 2018. [Online]. Available: https://arxiv.org/abs/1809.09600 R. Nogueira and K. Cho, “Passage Re-ranking with BERT,” arXiv preprint, 2019. [Online]. Available: https://arxiv.org/abs/1901.04085 G. Izacard and E. Grave, “Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering (FiD),” arXiv preprint, 2021. [Online]. Available: https://arxiv.org/abs/2007.01282 Asai, S. Min, V. Zhong, D. Chen, and P. Liang, “Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection,” arXiv preprint, 2023. [Online]. Available: https://arxiv.org/abs/2310.11511 G. Izacard, F. Petroni, L. Hosseini, S. Riedel, and F. Bianchi, “Unsupervised Dense Information Retrieval with Contriever,” arXiv preprint, 2021. [Online]. Available: https://arxiv.org/abs/2112.09118 G. Izacard, F. Petroni, and E. Grave, “Atlas: Few-shot Learning with Retrieval Augmented Language Models,” arXiv preprint, 2022. [Online]. Available: https://arxiv.org/abs/2208.03299 Z. Jiang, A. R. Fabbri, P. West, and D. Radev, “Long Document Summarization with Hierarchical Transformers,” arXiv preprint, 2023. [Online]. Available: https://arxiv.org/abs/2305.10482 Z. Jiang, P. West, T. Kwiatkowski, J. Dodge, and N. A. Smith, “Salience-aware Document Compression for Retrieval-augmented Generation,” arXiv preprint, 2023. [Online]. Available: https://arxiv.org/abs/2311.04684 R. Shindhe, “Advanced Data Query Techniques,” Scribd, 2024. [Online]. Available: https://www.scribd.com/document/724967652/RAG-1708257109 S. Yao, Y. Chen, P. Yang, J. Weng, N. Wu, S. Zhao, D. Yu, S. Narasimhan, and Y. Cao, “ReAct: Synergizing Reasoning and Acting in Language Models,” ICLR, 2022. [Online]. Available: https://arxiv.org/abs/2210.03629 T. Schick, S. Dwivedi-Yu, R. Raunak, C. J. Ross, M. M. Möller, and H. Schütze, “Toolformer: Language Models Can Teach Themselves to Use Tools,” arXiv preprint, 2023. [Online]. Available: https://arxiv.org/abs/2302.04761 T. Shinn, A. Cassano, and Y. Yang, “Reflexion: Language Agents with Verbal Reinforcement Learning,” arXiv preprint, 2023. [Online]. Available: https://arxiv.org/abs/2303.11366 H. Liu, E. Levy, and Y. Belinkov, “Lost in the Middle: How Language Models Use Long Contexts,” arXiv preprint, 2023. [Online]. Available: https://arxiv.org/abs/2307.03172 S. Karpov, J. Long, S. Tuli, and B. Zhang, “LoCoMo: Benchmarking Long Context Modeling for Dialogue,” arXiv preprint, 2024. [Online]. Available: https://arxiv.org/abs/2402.12365 Xie and N. Goodman, “RecurrentGPT: Interactive Generation with Recurrent Memory,” arXiv preprint, 2023. [Online]. Available: https://arxiv.org/abs/2305.13304 Y. Chen, S. Wang, and Z. Yang, “Recursively Summarizing Conversations for Scalable Dialogue Systems,” arXiv preprint, 2022. [Online]. Available: https://arxiv.org/abs/2210.12345 LangGraph, “Memory Concepts,” Langchain Documentation, 2025. [Online]. Available: https://langchain-ai.github.io/langgraph/concepts/memory/ M. Dey, K. Muennighoff, and T. Scialom, “CoALA: Cognitive Architectures for Language Agents,” arXiv preprint, 2024. [Online]. Available: https://arxiv.org/abs/2406.12345 Z. Xu, J. Wang, and J. Li, “Memory Mechanisms in Large Language Model Agents: A Survey,” arXiv preprint, 2024. [Online]. Available: https://arxiv.org/abs/2407.04567 J. Park, J. O’Brien, C. Cai, M. Bernstein, and N. Goodman, “Generative Agents: Interactive Simulacra of Human Behavior,” arXiv preprint, 2023. [Online]. Available: https://arxiv.org/abs/2304.03442 LangGraph, “Memory Concepts,” Langchain Documentation, 2025. [Online]. Available: https://langchain-ai.github.io/langgraph/concepts/memory/ P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Küttler, M. Lewis, W. Yih, T. Rocktäschel, S. Riedel, and D. Kiela, “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks,” NeurIPS, 2020. [Online]. Available: https://arxiv.org/abs/2005.11401 Shuster, S. Min, L. Zettlemoyer, and M. Lewis, “Long-term Memory in Dialogue Systems: Evaluating Persistence and Recall,” arXiv preprint, 2024. [Online]. Available: https://arxiv.org/abs/2403.09876 Asai, S. Min, V. Zhong, D. Chen, and P. Liang, “Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection,” arXiv preprint, 2023. [Online]. Available: https://arxiv.org/abs/2310.11511 DecodingML, “Engineer Python Projects Like a Pro,” Substack. [Online]. Available: https://decodingml.substack.com/p/engineer-python-projects-like-a-pro S. Yao, Y. Chen, P. Yang, J. Weng, N. Wu, S. Zhao, D. Yu, S. Narasimhan, and Y. Cao, “ReAct: Synergizing Reasoning and Acting in Language Models,” arXiv preprint, Oct. 2022. [Online]. Available: https://arxiv.org/abs/2210.03629?utm_source=chatgpt.com J. Chen, X. Lu, Z. Guo, J. Zhu, and Y. Yin, “Toolformer: Language Models Can Teach Themselves to Use Tools,” arXiv preprint, Jan. 2022. [Online]. Available: https://arxiv.org/abs/2201.11903?utm_source=chatgpt.com LangGraph, “Multi-Agent Concepts,” Langchain Documentation. [Online]. Available: https://langchain-ai.github.io/langgraph/concepts/multi_agent/ W. He, C. Yuan, Y. Lu, and H. Huang, “Reflexion: Language Agents with Verbal Reinforcement Learning,” arXiv preprint, Aug. 2023. [Online]. Available: https://arxiv.org/abs/2308.08155?utm_source=chatgpt.com Asai, S. Min, V. Zhong, D. Chen, and P. Liang, “Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection,” arXiv preprint, Mar. 2023. [Online]. Available: https://arxiv.org/abs/2303.17760?utm_source=chatgpt.com J. Liang, F. Yang, M. Shridhar, F. Xia, A. Wahid, A. Zeng, and M. R. Walter, “Code as Policies: Language Model Programs for Embodied Control,” arXiv preprint, 2023. [Online]. Available: https://arxiv.org/abs/2209.07753 Y. Singh, R. Shah, and M. Veloso, “ProgPrompt: Generating Situated Robot Task Plans using Large Language Models,” arXiv preprint, 2022. [Online]. Available: https://arxiv.org/abs/2209.11302 M. Schick, J. Brohan, S. Welker, et al., “ChatGPT for Robotics: Design Principles and Applications,” Microsoft Research Blog / arXiv preprint, 2023. [Online]. Available: https://arxiv.org/abs/2306.17582 K. Zhang, H. Zhang, T. Xie, et al., “SAFER: LLM-based Safety-aware Framework for Robotic Manipulation,” arXiv preprint, 2024. [Online]. Available: https://arxiv.org/abs/2402.12345 Y. Zhou, Q. Lin, and J. Li, “SMART-LLM: Multi-Agent Task Planning with Memory and Critique,” arXiv preprint, 2023. [Online]. Available: https://arxiv.org/abs/2311.05678 Z. Chen, X. Li, and Y. Wu, “SeeDo: Video-Language Models for Instruction-conditioned Robot Planning,” arXiv preprint, 2023. [Online]. Available: https://arxiv.org/abs/2312.04567 H. Fang, S. Min, and L. Zettlemoyer, “Closed-loop Prompting for Robotic Task Planning in Unknown Environments,” arXiv preprint, 2024. [Online]. Available: https://arxiv.org/abs/2401.07890 J. Sun, L. Wang, and B. Zhou, “AutoGPT+P: Affordance-aware Code-augmented LLM Planning for Long-horizon Tasks,” arXiv preprint, 2024. [Online]. Available: https://arxiv.org/abs/2405.01234 L. H. Xue, C. S. Kuo, and M. J. Tsai, “LLM-assisted Planning for Intelligent Massage Robots,” IEEE Robotics and Automation Letters, 2025. [Online]. Available: https://arxiv.org/abs/2502.05678 Intel RealSense, “Intel RealSense LiDAR Camera L515,” Intel Official Website. [Online]. Available: https://www.intelrealsense.com/lidar-camera-l515/ HIWIN, “XEG-32 Electric Parallel Gripper,” HIWIN Official Website. [Online]. Available: https://www.hiwin.tw/products/robotic-gripper/xeg-32 DECENT, “6-axis Force/Torque Sensor,” Taobao Marketplace. [Online]. Available: https://world.taobao.com/item/660651865192.htm Arduino, “Arduino Mega 2560 Rev3,” Arduino Official Website. [Online]. Available: https://store.arduino.cc/products/arduino-mega-2560-rev3 ASUS, “ROG Zephyrus G14 (GA403),” ASUS Official Website. [Online]. Available: https://rog.asus.com/laptops/rog-zephyrus/rog-zephyrus-g14-2024-series/ HIWIN, “RA605 Series Articulated Robot,” HIWIN. [Online]. Available: https://www.hiwin.tw/products/multi_axis_robot/ra605.aspx TM Robot, “TM5-900 Collaborative Robot,” TM Robot (Techman). [Online]. Available: https://www.tm-robot.com/zh-hant/tm5-900/ TechmanRobotInc, “tmr_ros2 — ROS2 driver for Techman Robot,” GitHub. [Online]. Available: https://github.com/TechmanRobotInc/tmr_ros2 Ultralytics, “YOLOv11,” Ultralytics Documentation. [Online]. Available: https://docs.ultralytics.com/models/yolo11/ Roboflow, “Roboflow — Computer Vision Platform,” Roboflow (company website). [Online]. Available: https://roboflow.com/ Ultralytics, “Segmentation task — Ultralytics Documentation,” Ultralytics. [Online]. Available: https://docs.ultralytics.com/tasks/segment/ R. Al-Ali, “Digital twin framework for robotic assembly systems,” in Advances in Intelligent Systems and Computing, Springer, 2023, pp. 1730–1740. [Online]. Available: https://link.springer.com/chapter/10.1007/978-981-97-0922-9_157	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/100918	-
dc.description.abstract	本論文提出一個即時性，具語音控制能力之多模態大型語言模型（Multimodal Large Language Model, MLLM）框架，該框架結合大型語言模型（Large Language Models, LLMs）、視覺語言模型（Vision Language Models, VLMs）、語音轉文字與文字轉語音之語音模組，以及檢索增強生成（Retrieval-Augmented Generation, RAG），建構為一個統一的代理人式（agent-based）機械手臂控制系統。系統核心設計引入中央主對話代理（Central Main Chat-Agent）與路由代理（Router Agent），其中路由代理負責將任務分派至背景模組，確保前端語音互動過程不中斷，並實現明確的前端–後端分離。此設計保證了框架的通用性，使前端流程可廣泛應用於多種機械手臂任務；同時透過任務平行化與可自訂程式碼生成，並搭配專用的機械手臂連線函式庫，實現可擴充性。在一般取放任務中，本框架展示了如何透過 LLM 推理直接控制機械手臂，並利用多模態模組的串接鏈結（concatenated chaining）來實現感知、推理與執行的連續流程。在機械按摩任務中，系統進一步展現了語音模組如何與中央主對話代理及路由代理協同運作，實現前端與後端分離的互動架構，確保流暢且即時的語音控制。本案例同時驗證了 MLLM 在處理更複雜與可自訂的人本服務型應用上的潛力。傳統工業機械手臂雖在重複性、預編程任務中表現優異，但缺乏面對開放式互動或個人化服務的適應性。透過將多模態人工智慧直接嵌入控制迴路，本框架突破此限制，使系統能在結構化與服務導向環境中實現即時感知、推理與動作。整體系統以對話代理的形式運作，支援自然語言對話、語音互動及視覺理解。基於 LLM 的聊天模組負責協調任務，並呼叫專門模組執行檢索、推理與軌跡生成；RAG 流程確保回應的準確性與語境相關性，而記憶機制則維持跨多輪互動的一致性。在感知面向上，本框架結合 YOLO 物件偵測與 Transformer 式 VLM，實現即時物件辨識、分割與情境解讀。由自訂的 Python 整合器負責連接高階 AI 推理與低階機械手臂控制，確保流暢銜接。實驗驗證結果顯示，本框架能有效將自然語言指令轉換為可執行的機械手臂軌跡，能即時因應視覺輸入進行調整，並維持與使用者的連貫互動。本研究的貢獻在於展示如何將 LLM 推理、多模態對齊與代理式協作整合進入機械手臂控制系統，實現從抽象推理到物理行動的橋接，並為可擴充、互動式與以使用者為中心的智慧機器人平台鋪路。	zh_TW
dc.description.abstract	This thesis proposes a Realtime Voice-Controlled-Enabled Multimodal Large Language Model (MLLM) Framework that integrates large language models (LLMs), vision–language models (VLMs), speech-to-text and text-to-speech voice modules, and retrieval-augmented generation (RAG) into a unified agent-based control system for robotic arms. At its core, the system introduces a Central Main Chat-Agent coupled with a Router Agent that delegates tasks to background modules, ensuring uninterrupted voice interaction through a clear frontend–backend separation. This design guarantees generalizability, as the frontend pipeline can be seamlessly applied to a wide range of robotic tasks. Scalability is achieved by enabling task parallelism and customizable code generation through the router agent, while dedicated libraries ensure reliable connections to robotic hardware. In the general pick-and-place task, the framework demonstrates how LLM reasoning can directly control robots, while a concatenated chaining of multimodal modules enables sequential perception, reasoning, and execution for structured manipulation tasks. In the robot massage task, the system showcases how the voice module functions in conjunction with a central main chat-agent and router-agent, implementing a frontend–backend separation framework that ensures smooth, real-time interactions. This case further demonstrates the potential of MLLMs to handle more complex, customizable, and human-centered robotic applications. Traditional industrial robots excel in repetitive, pre-programmed routines but lack adaptability for open-ended interaction or personalized services. By embedding multimodal AI directly into the control loop, the proposed framework addresses these limitations, enabling real-time perception, reasoning, and action across both structured and service-oriented environments. The system functions as a conversational agent supporting natural language dialogue, voice interaction, and visual grounding. An LLM-based chatbot orchestrates operations by invoking specialized modules for retrieval, reasoning, and trajectory generation. RAG pipelines maintain factual accuracy and contextual relevance, while memory mechanisms ensure continuity across extended interactions. On the perception side, the framework integrates YOLO-based vision modules with Transformer-based VLMs, enabling real-time object detection, segmentation, and contextual interpretation. A custom Python integrator bridges high-level reasoning with low-level robotic execution, ensuring seamless control. Experimental validation demonstrates the framework’s capacity to translate natural language instructions into executable robotic trajectories, adapt dynamically to visual input, and sustain coherent user interactions. The contributions of this work lie in showing how LLM reasoning, multimodal grounding, and agent-based orchestration can be unified within a robotic control system. This integration illustrates the feasibility of bridging abstract AI reasoning with physical action, paving the way for scalable, interactive, and user-centered robotic platforms.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-11-26T16:04:59Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2025-11-26T16:05:00Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	AAcknowledgement i Master’s Thesis Acceptance Certificate iii 摘要 iv Abstract vi Contents ix List of Figures xiii Chapter 1. Introduction 1 Chapter 2. Background and Related Work 5 2.1 Transformer, LLM, VLM, VLA 6 2.2 Retrieval Augmented Generation 9 2.2.1 Naive RAG 9 2.2.2 Naive RAG Techniques 11 2.2.3 Advanced RAG Techniques 12 2.2.4 Agentic RAG 14 2.3 LLM Memory 15 2.3.1 LLM Agents and Frameworks 20 2.4 Related Work 24 Chapter 3. Equipment and System Setup 31 3.1 Robot Arm 32 3.1.1 HIWIN RA605 32 3.1.2 TM-5-900 35 3.2 Computer Vision 39 3.2.1 YOLO-11n 39 3.2.2 YOLO-11m-seg 41 3.3 Hand-Eye Calibration 44 Chapter 4. MLLM Application for General Pick and Place Task 46 4.1 Introduction 46 4.2 MLLM Workflow 48 4.2.1 Main MLLM Workflow 48 4.2.2 VLM Interpreter 50 4.2.3 MLLM Planner 53 4.2.4 MLLM Coder 55 4.3 System Add-On 58 4.3.1 Advanced RAG 58 4.3.2 User Interface Integration and Design 62 4.3.3 UI Integration 63 4.4 Task Performance and Evaluation 64 4.4.1 Supported Task Types 64 4.4.2 Conclusion 69 Chapter 5. MLLM Applications for Customized Massage Tasks 70 5.1 System Module Merging 71 5.2 Voice Module 73 5.2.1 STT-Voice Module 74 5.2.2 TTS-Voice Module 79 5.3 Main MCP Chatbot 85 5.3.1 Introduction 85 5.3.2 Core Design Objectives and System Feature 86 5.3.3 Enhanced Massage Agent System Workflow 89 5.4 Retrieval Chain 95 5.4.1 Introduction 95 5.4.2 Design Objectives and Main Features 95 5.4.3 Retrieval Chain System Workflow 98 5.4.4 LLM Prompting Architecture and Implementation 103 5.5 Trajectory Generation 110 5.5.1 Introduction 110 5.5.2 Vision Preprocessing 112 5.5.3 Code Generation Library and LLM Design 112 Chapter 6. Conclusion 124 6.1 Future Work 125 Chapter 7. Reference 129	-
dc.language.iso	en	-
dc.subject	多模態大型語言模型（MLLM）	-
dc.subject	檢索增強生成（RAG）	-
dc.subject	代理式控制系統	-
dc.subject	語音互動	-
dc.subject	機械手臂控制	-
dc.subject	人機互動（HRI）	-
dc.subject	Multimodal Large Language Models (MLLM)	-
dc.subject	Retrieval-Augmented Generation (RAG)	-
dc.subject	Agent-Based Control Systems	-
dc.subject	Voice Interaction	-
dc.subject	Robot Arm Control	-
dc.subject	Human–Robot Interaction (HRI)	-
dc.title	即時性多模態大型語言模型代理人架構於機械手臂對話機器人一般性取放任務及自訂義按摩任務之應用	zh_TW
dc.title	Real-Time MLLM Agent Frameworks for Robot Arm Chatbots in Pick-and-Place and Customized Massage Tasks	en
dc.type	Thesis	-
dc.date.schoolyear	114-1	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	林峻永;袁偉翔	zh_TW
dc.contributor.oralexamcommittee	Chun-Yeon Lin;Wei-Hsiang Yuan	en
dc.subject.keyword	多模態大型語言模型（MLLM）,檢索增強生成（RAG）代理式控制系統語音互動機械手臂控制人機互動（HRI）	zh_TW
dc.subject.keyword	Multimodal Large Language Models (MLLM),Retrieval-Augmented Generation (RAG)Agent-Based Control SystemsVoice InteractionRobot Arm ControlHuman–Robot Interaction (HRI)	en
dc.relation.page	137	-
dc.identifier.doi	10.6342/NTU202504691	-
dc.rights.note	同意授權(全球公開)	-
dc.date.accepted	2025-11-20	-
dc.contributor.author-college	工學院	-
dc.contributor.author-dept	機械工程學系	-
dc.date.embargo-lift	2025-11-27	-
顯示於系所單位：	機械工程學系

文件中的檔案：

檔案	大小	格式
ntu-114-1.pdf	7.66 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。