請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/8152
標題: | 結合知識資訊與圖像問答之人機互動技術於智慧機器人應用 Human-Robot Interaction Using Knowledge-Based Visual Question Answering System for Intelligent Service Robotics |
作者: | Yu-Cheng Wen 温郁承 |
指導教授: | 羅仁權(Ren-Chyaun Luo) |
關鍵字: | 圖像問答,服務型機器人,語意式地圖,深度學習, Visual Question Answering,Service Robot,Semantic map,Deep learning, |
出版年 : | 2020 |
學位: | 碩士 |
摘要: | 圖像問答(Visual Question Answering)是一個包含電腦視覺和自然語言處理的多模組任務,其輸入是一張圖和與其相關的自然語言問句,系統必須輸出正確的自然語言答案。人工智慧系統必須從輸入影像中萃取特徵並轉換這些資訊為合理的知識來回答問句。 更進一步的研究是由一個具體代理人(embodied agent)來回答問題,也就是 代理人要在環境中探索並找到與答案相關的圖像線索。由於很多模擬環境可以提供相片擬真的場景以及可互動的物件,目前大多數的研究都在模擬中進行測試。然而,當這項技術實際應用在服務型機器人上時有更多可以使用的資訊,其中之一便是使用者的生活習慣。舉例來說,一台服務型機器人通常一生都在相同的場域內服務,機器人可以認知使用者經驗和物品習慣擺放位置,例如蘋果通常放在冰箱裡,而冰箱在廚房的角落等。這些資訊可以幫機器人優化探索環境尋找答案的流程。 這篇論文的目的是將圖像問答落實於服務機器人。我們提出了一個針對具體代理人系統的記憶性語意式地圖,該地圖會記錄在過去任務序列中探索所產生的語意式地圖。利用這項序列式的記憶,代理人可以對使用者習慣更加熟悉並完成環境探索更精簡且有效率。另外同時加入了提前停止機制,根據序列式記憶可以在地圖上標記出數個關鍵區域,當代理人在這些關鍵區域已有一定程度的探索率,可以提前結束探索避免多餘的路程。更進一步我們將此系統實際在移動式手臂機器人上,機器人可以使用導航、操作以及感知能力來完成在環境中進行圖像問答的任務。 我們的實驗顯示使用了記憶型語意式地圖的圖像問答系統比未使用的模型更加準確,同時在最好的情況可以減少17.4%探索時需要的步數。這樣的結果顯示我們的方法確實能幫助機器人完成圖像問答任務。 Visual Question Answering (VQA) is a multi-modal task that includes computer vision and natural language processing techniques. Given an image and a natural language question about the image, the task is to provide an accurate natural language answer. In a such task, the artificial intelligence must extract information from the input image and transform them into reasonable knowledge to answer the question. An advanced issue is to solve this task by an embodied agent, that is, the agent must explore the environment and find visual clues to answer the given question. Most of the research has implemented this topic in simulated environments as they provide photo-realistic views and interactive environments for real-world scenes. However, there are more information can be obtained as this technique is utilized on service robots. One of them is the habits of users. For example, a service robot usually serves in the same area whole life long. So, it can recognize users’ experience and custom placement of each object, such as the apples are usually in the fridge and the fridge is in the corner of the kitchen. These hints can be used to improve the exploration procedure of finding the answer in the rooms. The purpose of this thesis is to ground the question answering system to the service robot. We propose a memorial semantic map for an embodied agent system that overlays all the semantic memory in the previous task sequence. Using the sequential memory, the agent can be familiar with the user’s habits and complete the exploration more concisely and efficiently. Also, an early stop mechanism is established. The sequential memory indicates several critical regions that the target object usually appears within the map. As the agent achieves enough high exploration rate of these regions, the exploration procedure can be terminated to avoid redundancies. Furthermore, we implement the system on a mobile manipulator robot in the real environment. The navigation, manipulation, and perception abilities of the robot make it an appropriate device to operate question answering tasks. Our experiments show that our question answering system with the proposed memorial semantic map is more accurate than the system without one. With the early stop mechanism, the system can reduce steps by 17.4% in the best case. The result of the real-world implementation shows that our system can help service robots complete visual question answering tasks. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/8152 |
DOI: | 10.6342/NTU202003842 |
全文授權: | 同意授權(全球公開) |
顯示於系所單位: | 電機工程學系 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
U0001-1708202019255900.pdf | 4.9 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。