基於人類行為及認知記憶之物件目標導航居家照護型機器人

陳建婷; Chien-Ting Chen

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/90523

標題:	基於人類行為及認知記憶之物件目標導航居家照護型機器人 Object-Goal Navigation of Home Care Robot based on Human Activity Inference and Cognitive Memory
作者:	陳建婷 Chien-Ting Chen
指導教授:	傅立成 Li-Chen Fu
關鍵字:	物件目標導航,3D語義地圖,人類行為推論,認知記憶,常識圖譜,人機互動, Object-Goal navigation,3D Semantic Map,Human Activity Inference,Cognitive Memory,Commonsense Knowledge Graph,Human Robot Iteration,
出版年 :	2023
學位:	碩士
摘要:	隨著醫療技術的進步，世界正面臨著人口高齡化及青年勞動力不足的問題，高齡長輩們的長期照護給居家服務型機器人帶來了大量需求。而隨著年齡增長，人類的記憶認知能力逐漸退化，故機器人具備物件搜索的推理決策力尤為重要，此外，為了使機器人能更好的提供居家照護服務，機器人須擁有對物品、環境與人類行為的辨識力，以及編碼、儲存及提取記憶的認知力，其中，在人類社會中，又以語言為最方便的互動方式，故機器人需要具備能理解人類語言並迅速給予人類適當回應的語言理解力，才能在人機互動中實現良好的移動導航。且為了能讓機器人能更輕巧便利適用於居家環境中，在設計系統架構時，考慮機器人上有限的感測器及運算資源也是一大難題。在本研究中，我們提出了一個基於人類行為及記憶認知的物件搜索推理系統，在機器人上架設RGBD相機與2D Laser感測器，讓機器人在第一次進入新環境中可以在考慮不同場景的狀況下自主探索環境，使其對於新環境有初步的空間認知。此外，透過計算資料集裡物體出現的位置分布給予機器人先前的知識，使機器人對於目標物出現的地點有概念，除了使用2D Laser建立的2D平面地圖外，我們也利用RGBD相機搭配語義分割模型及場景辨識模型產生具有語義的點雲，建構擁有物件及場景的3D語義地圖，並在日後可以重複利用及更新此地圖來完成更多的動態物件搜索。另一方面，我們使用視覺語言模型CLIP來推論人類行為，將描述人類行為的文字與相機影像上的特徵值做相似度比對，最終取出相似度最高的文字當作影像中的人類行為，此模型在實體實驗中能實時推論出人類行為。接著先挑選出人類的重要行為，再將人類行為、行為人、行為地點、行為時間儲存成記憶，並利用ConceptNet所建立的常識圖譜得到與目標物相關的目標行為，並從記憶中提取與此目標行為最相關的目標人物，控制機器人找到目標人物，並與人類進行互動的語言中理解並得出目標物件的位置。 With the advancement of medical technology and the increase of life expectancy, the world is facing the problem of an aging population and a shortage of youth labor. The long-term care of the elderly has brought a large demand for home care robots. With the increase of age, human memory and cognitive ability gradually degrade, it gets particularly difficult for them to memorize the position of different objects in the house. Therefore, it is important for the robot to have the ability to detect the object, scene, human activity, and reason to make decisions so that they are able to search for the object for the human. Only the cognitive ability to encode, store and retrieve memory can achieve a good movement trajectory in human-robot interaction. In order to achieve good mobile navigation in human-robot interaction, the robots need to have language understanding so that they can understand human’s words and quickly respond to humans appropriately. To make the robot more lightweight and convenient for use in the home environment, it is also quite challenging to design the robot system with only a limited sensor set and computing resources. In this research, we develop a home care mobile robot system which can reason the location of a target object and navigate to find it based on human activity inference and cognitive memory. Such a robot aims to help humans search for objects lost in the home environment due to forgetting, and the lost objects can be either personal objects or public ones. The whole system can be decomposed into 3 parts: the first part is the object-goal navigation module which includes the exploration and spatial surrounding detection sub-modules to construct the semantic voxel map; the second part is the cognitive memory module which executes the human and activity recognition, stores the encoded information into the robot's episodic memory ready for retrieving the memory concerning the lost object if necessary; the last part is the interaction module which can infer the position of the target object by interacting with the related humans. In our design, the robot carrying RGBD cameras and 2D Laser sensors can explore the environment autonomously and simultaneously, construct a 3D semantic map of every scene and objects, that can be updated and used for inferring the locations of the target objects in the future. To make the inference more human-like, we leverage the existing big data to calculate the location distribution of objects ahead of time, collect cues from human identities and their activities through face and human activity recognition, builded a memory to store these cues together with time and place, and finally take advantage of a commonsense knowledge graph to reason where the target(lost) object might be located while interacting with humans logically. We validate the proposed method with respect to SR (success rate), SPL (success weighted by path length) and DTS (distance to success) in a simulated environment, and the results show that our method can search for the target object efficiently and robustly with shorter path length and higher success rate than 3 comparative methods in the literature. We further demonstrate our system in a real-world home environment which searches either for a personal object for a single user or for a public object for several users.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/90523
DOI:	10.6342/NTU202300825
全文授權:	未授權
顯示於系所單位：	電機工程學系

文件中的檔案：

檔案	大小	格式
ntu-111-2.pdf 目前未授權公開取用	6.59 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。