基於常識知識與大型語言模型之具有生成式行動規劃之認知家用社交型機器人

陳慈安; Cih-An Chen

Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/92979

Title:	基於常識知識與大型語言模型之具有生成式行動規劃之認知家用社交型機器人 Cognitive Home Social Robot with Generative Action Planning Based on Commonsense Knowledge Base and Large Language Models
Authors:	陳慈安 Cih-An Chen
Advisor:	傅立成 Li-Chen Fu
Keyword:	生成式行動規劃,重規劃演算法,認知家用社交機器人,常識知識庫,大型語言模型, Generative Action Planning,Replan Algorithm,Cognitive Home Social Robot,Commonsense Knowledge Base,Large Language Models,
Publication Year :	2024
Degree:	碩士
Abstract:	隨著人工智慧與機器學習技術的快速發展，機器人正逐步融入我們的日常生活，成為不可或缺的一部分。在居家環境中，認知社交型機器人(Cognitive Social Robot)不僅可以作為交談夥伴來陪伴使用者，還能擔任個人管家的角色，按照使用者的當前需求提供相應的服務和互動。近年來，大型語言模型的出現進一步提升了機器人在推理和決策的能力，除了用於產生與上下文更相關的回覆外，生成式行動規劃（Generative Action Planning）也是一個正積極被開發的領域。藉由處理及剖析指令，認知機器人能夠自主生成適當的行動序列，並透過與環境互動來調整其策略，以達成最終目標。若將此應用於居家環境，機器人將被賦予更高的認知推理能力，進而根據使用者的需求提供高效且適切的支援，不僅可提升居住者的生活品質，同時也促進了更為友好和自然的人機互動。本研究旨在結合常識知識與大型語言模型，開發一個具有生成式行動規劃的認知家用社交型機器人系統，依據機器人從其視野所捕捉到的影像及接收到的使用者語音輸入，自動判斷並選擇最適當的角色。這些角色包括接待者、陪伴者和居家服務者，分別負責執行環境介紹、進行互動對話及提供特定的居家服務。為了有效理解和回應使用者的需求，我們會分別利用視覺語言模型和深度學習模型來進行場景辨識、物件偵測、臉部辨識及年齡估測等，以收集有關當前環境和使用者的重要資訊。此外，在居家服務者的角色中，對於隱含的語音輸入，我們藉由萃取ATOMIC2020知識庫中的常識知識以及善用大型語言模型來使機器人產生自我指令（Self-instruction），同時考量豐富的使用者及環境資訊，讓認知機器人進一步自行推理出可達成指令的一系列高階計畫序列（High-level-plans)。對於執行中的每一高階計畫，我們也提出一個重規劃演算法，讓機器人得以根據環境觀察來進行即時的反思及修正，大幅提升任務執行的效率及成功率。最後，我們將採用結合視覺語言模型及認知地圖的模組作為底層規劃者（Low-level-planner），讓機器人系統具備高度空間認知能力，對應高階計畫，在居家環境進行有效率的定位及導航。 With the rapid development of artificial intelligence and machine learning technologies, robots are progressively integrated into our daily lives, becoming an indispensable part. Within home environment settings, cognitive social robots not only serve as conversational companions but also assume the role of personal assistants, providing home services and interactions tailored to the users’ needs. The emergence of large language models (LLMs) has further enhanced the capabilities of robots in reasoning and decision-making. In addition to generating more contextually relevant responses, generative action planning is also actively being researched. By analyzing given instructions, cognitive robots are capable of autonomously generating action lists and adjusting their strategies through interaction with the environment. When operating within a domestic setting, these robots are endowed with enhanced cognitive reasoning abilities, thereby providing more efficient services based on user needs. This not only promotes the living quality for residents but also fosters more amiable and natural human-machine interactions. This research aims to propose a framework which integrates commonsense knowledge with LLMs to develop a cognitive home social robot capable of generative action planning. Based on images captured from its field of vision and user utterance, the robot automatically and dynamically determines its most appropriate role to assume by itself, such as a reception robot, a companion robot, and a home service robot, etc. In order to respond to user needs effectively and efficiently, we utilize vision-language models (VLMs) to resolve tasks including scene recognition and object detection, along with deep learning models for facial recognition and age estimation, gathering crucial context about the environment and users. In particular, if the role is home service robot and user utterance is with implicit meaning, we first infer the explicit meaning from the ATOMIC 2020, which is a commonsense knowledge base, and harness LLMs to enable the robot to generate self-instructions. By considering versatile user and environmental information, the cognitive robot autonomously reasons a series of high-level plans that fulfill self-instructions. For each high-level executing plan, we propose a replanning algorithm that allows the robot to reflect and online replan based on environmental observations, significantly improves efficiency and success rate of the tasks being executed. Lastly, we incorporate a module that integrates VLMs and a cognitive map to serve as the low-level planner, endowing the robot system with advanced spatial cognition capabilities, to effectively localize and navigate within a home environment.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/92979
DOI:	10.6342/NTU202400866
Fulltext Rights:	同意授權(限校園內公開)
Appears in Collections:	電機工程學系

Files in This Item:

File	Size	Format
ntu-112-2.pdf Restricted Access	27.67 MB	Adobe PDF	View/Open

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets