結合知識資訊與圖像問答之人機互動技術於智慧機器人應用

Yu-Cheng Wen; 温郁承

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/8152

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	羅仁權(Ren-Chyaun Luo)
dc.contributor.author	Yu-Cheng Wen	en
dc.contributor.author	温郁承	zh_TW
dc.date.accessioned	2021-05-20T00:49:21Z	-
dc.date.available	2020-08-24
dc.date.available	2021-05-20T00:49:21Z	-
dc.date.copyright	2020-08-24
dc.date.issued	2020
dc.date.submitted	2020-08-18
dc.identifier.citation	S. Antol, et al. 'Vqa: Visual question answering.' Proceedings of the IEEE international conference on computer vision. 2015. A. Das, et al. 'Embodied question answering.' Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2018. A. Das, et al. 'Neural modular control for embodied question answering.' arXiv preprint arXiv:1810.11181 (2018). D. Gordon, et al. 'Iqa: Visual question answering in interactive environments.' Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018. V. Mnih, et al. 'Asynchronous methods for deep reinforcement learning.' International conference on machine learning. 2016. J. Redmon, et al. 'You only look once: Unified, real-time object detection.' Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. J. Redmon and A. Farhadi. 'Yolov3: An incremental improvement.' arXiv preprint arXiv:1804.02767 (2018). A. Chang, et al. 'Matterport3d: Learning from rgb-d data in indoor environments.' arXiv preprint arXiv:1709.06158 (2017). S. Brodeur, et al. 'Home: A household multimodal environment.' arXiv preprint arXiv:1711.11017 (2017). M. Savva, et al. 'MINOS: Multimodal indoor simulator for navigation in complex environments.' arXiv preprint arXiv:1712.03931 (2017). E. Kolve, et al. 'Ai2-thor: An interactive 3d environment for visual ai.' arXiv preprint arXiv:1712.05474 (2017). Y. Wu, et al. 'Building generalizable agents with a realistic and rich 3d environment.' arXiv preprint arXiv:1801.02209 (2018). C. Yan, et al. 'Chalet: Cornell house agent learning environment.' arXiv preprint arXiv:1801.07357 (2018). F. Xia, et al. 'Gibson env: Real-world perception for embodied agents.' Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018. M. Savva, et al. 'Habitat: A platform for embodied ai research.' Proceedings of the IEEE International Conference on Computer Vision. 2019. D. Zhang. Parallel robotic machine tools. Springer Science Business Media, 2009. Y. Goyal, et al. 'Making the V in VQA matter: Elevating the role of image understanding in Visual Question Answering.' Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017. G. Grisetti, C. Stachniss, and W. Burgard. 'Improved techniques for grid mapping with rao-blackwellized particle filters.' IEEE transactions on Robotics 23.1 (2007): 34-46. D. Fox, et al. 'Monte carlo localization: Efficient position estimation for mobile robots.' AAAI/IAAI 1999.343-349 (1999): 2-2. C. Watkins, J. C. Hellaby. 'Learning from delayed rewards.' (1989). J. Peng, and R. J. Williams. 'Incremental multi-step Q-learning.' Machine Learning Proceedings 1994. Morgan Kaufmann, 1994. 226-232. Williams, Ronald J. 'Simple statistical gradient-following algorithms for connectionist reinforcement learning.' Machine learning 8.3-4 (1992): 229-256. R. S. Sutton, and A. G. Barto. Reinforcement learning: An introduction. MIT press, 2018. T. Degris, P. M. Pilarski, and R. S. Sutton. 'Model-free reinforcement learning with continuous action in practice.' 2012 American Control Conference (ACC). IEEE, 2012. T. Tieleman, and G. Hinton. 'Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude.' COURSERA: Neural networks for machine learning 4.2 (2012): 26-31. Williams, Ronald J., and Jing Peng. 'Function optimization using connectionist reinforcement learning algorithms.' Connection Science 3.3 (1991): 241-268. M. Everingham, et al. 'The PASCAL visual object classes challenge 2007 (VOC2007) results.' (2007). C. Szegedy, et al. 'Going deeper with convolutions.' Proceedings of the IEEE conference on computer vision and pattern recognition. 2015. M. Lin, Q. Chen, and S. Yan. 'Network in network.' arXiv preprint arXiv:1312.4400 (2013). O. Russakovsky, et al. 'Imagenet large scale visual recognition challenge.' International journal of computer vision 115.3 (2015): 211-252. S. Ren, et al. 'Object detection networks on convolutional feature maps.' IEEE transactions on pattern analysis and machine intelligence 39.7 (2016): 1476-1481. G. Hinton, et al. 'Improving neural networks by preventing co-adaptation of feature detectors.' arXiv preprint arXiv:1207.0580 (2012). ROS-Introduction http://wiki.ros.org/ROS/Introduction
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/8152	-
dc.description.abstract	圖像問答(Visual Question Answering)是一個包含電腦視覺和自然語言處理的多模組任務，其輸入是一張圖和與其相關的自然語言問句，系統必須輸出正確的自然語言答案。人工智慧系統必須從輸入影像中萃取特徵並轉換這些資訊為合理的知識來回答問句。更進一步的研究是由一個具體代理人(embodied agent)來回答問題，也就是代理人要在環境中探索並找到與答案相關的圖像線索。由於很多模擬環境可以提供相片擬真的場景以及可互動的物件，目前大多數的研究都在模擬中進行測試。然而，當這項技術實際應用在服務型機器人上時有更多可以使用的資訊，其中之一便是使用者的生活習慣。舉例來說，一台服務型機器人通常一生都在相同的場域內服務，機器人可以認知使用者經驗和物品習慣擺放位置，例如蘋果通常放在冰箱裡，而冰箱在廚房的角落等。這些資訊可以幫機器人優化探索環境尋找答案的流程。這篇論文的目的是將圖像問答落實於服務機器人。我們提出了一個針對具體代理人系統的記憶性語意式地圖，該地圖會記錄在過去任務序列中探索所產生的語意式地圖。利用這項序列式的記憶，代理人可以對使用者習慣更加熟悉並完成環境探索更精簡且有效率。另外同時加入了提前停止機制，根據序列式記憶可以在地圖上標記出數個關鍵區域，當代理人在這些關鍵區域已有一定程度的探索率，可以提前結束探索避免多餘的路程。更進一步我們將此系統實際在移動式手臂機器人上，機器人可以使用導航、操作以及感知能力來完成在環境中進行圖像問答的任務。我們的實驗顯示使用了記憶型語意式地圖的圖像問答系統比未使用的模型更加準確，同時在最好的情況可以減少17.4%探索時需要的步數。這樣的結果顯示我們的方法確實能幫助機器人完成圖像問答任務。	zh_TW
dc.description.abstract	Visual Question Answering (VQA) is a multi-modal task that includes computer vision and natural language processing techniques. Given an image and a natural language question about the image, the task is to provide an accurate natural language answer. In a such task, the artificial intelligence must extract information from the input image and transform them into reasonable knowledge to answer the question. An advanced issue is to solve this task by an embodied agent, that is, the agent must explore the environment and find visual clues to answer the given question. Most of the research has implemented this topic in simulated environments as they provide photo-realistic views and interactive environments for real-world scenes. However, there are more information can be obtained as this technique is utilized on service robots. One of them is the habits of users. For example, a service robot usually serves in the same area whole life long. So, it can recognize users’ experience and custom placement of each object, such as the apples are usually in the fridge and the fridge is in the corner of the kitchen. These hints can be used to improve the exploration procedure of finding the answer in the rooms. The purpose of this thesis is to ground the question answering system to the service robot. We propose a memorial semantic map for an embodied agent system that overlays all the semantic memory in the previous task sequence. Using the sequential memory, the agent can be familiar with the user’s habits and complete the exploration more concisely and efficiently. Also, an early stop mechanism is established. The sequential memory indicates several critical regions that the target object usually appears within the map. As the agent achieves enough high exploration rate of these regions, the exploration procedure can be terminated to avoid redundancies. Furthermore, we implement the system on a mobile manipulator robot in the real environment. The navigation, manipulation, and perception abilities of the robot make it an appropriate device to operate question answering tasks. Our experiments show that our question answering system with the proposed memorial semantic map is more accurate than the system without one. With the early stop mechanism, the system can reduce steps by 17.4% in the best case. The result of the real-world implementation shows that our system can help service robots complete visual question answering tasks.	en
dc.description.provenance	Made available in DSpace on 2021-05-20T00:49:21Z (GMT). No. of bitstreams: 1 U0001-1708202019255900.pdf: 5015749 bytes, checksum: c3677d9cf5563157591e1202d568d2fa (MD5) Previous issue date: 2020	en
dc.description.tableofcontents	致謝 i 中文摘要 ii ABSTRACT iii CONTENTS v LIST OF FIGURES viii LIST OF TABLES x Chapter 1 Introduction 1 1.1 Objectives 1 1.2 Background 2 1.3 Problem Statement 4 1.4 Solutions 5 1.5 Thesis Organization 7 Chapter 2 Question Answering System 10 2.1 Embodied Question Answering [2] 10 2.2 Interactive Question Answering [4] 10 2.2.1 Introduction 10 2.2.2 Hierarchical Interactive Memory Network (HIMN) 11 2.2.3 Interactive Question Answering Dataset 15 2.3 Asynchronous Advantage Actor-Critic (A3C) [5] 17 2.3.1 Introduction 17 2.3.2 Asynchronous Reinforcement Learning Framework 19 2.4 You Only Look Once (YOLO) [6] 21 2.4.1 Introduction 21 2.4.2 Network Design 23 2.4.3 Training 24 2.4.4 Inference 26 Chapter 3 Memorial Semantic Map 29 3.1 Sequential task memory 29 3.2 Visualization of Spatial Memory 31 3.3 Early Stop Mechanism 34 Chapter 4 Robot Specification 37 4.1 Introduction 37 4.2 Manipulator Configuration 37 4.2.1 Forward Kinematics Analysis 37 4.2.2 Inverse Kinematics Analysis 45 4.3 Software Configuration 54 4.3.1 Robot Operating System 54 4.3.2 Action Primitives 59 Chapter 5 Experiment 62 5.1 Introduction 62 5.2 Simulated Environment implementation 62 5.2.1 Introduction 62 5.2.2 Dataset Rearrangement 65 5.2.3 Comparison Using Reordering IQAUD Testing Dataset 66 5.2.4 Ablation Analysis 67 5.2.5 Comparison Using Static IQAUD Testing Dataset 69 5.3 Real-world Implementation 70 5.3.1 Introduction 70 5.3.2 Scene Configuration 70 5.3.3 Hybrid Object Detection Model 74 5.3.4 Mapping, Localization and Navigation 78 5.3.5 Full Demonstration 81 5.4 Discussion 89 Chapter 6 Contributions, Conclusions and Future Works 92 6.1 Contributions 92 6.2 Conclusions 92 6.3 Future Works 93 REFERENCE 96 VITA 99
dc.language.iso	en
dc.title	結合知識資訊與圖像問答之人機互動技術於智慧機器人應用	zh_TW
dc.title	Human-Robot Interaction Using Knowledge-Based Visual Question Answering System for Intelligent Service Robotics	en
dc.type	Thesis
dc.date.schoolyear	108-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	張帆人(Fan-Ren Chang),王富正(Fu-Cheng Wang)
dc.subject.keyword	圖像問答,服務型機器人,語意式地圖,深度學習,	zh_TW
dc.subject.keyword	Visual Question Answering,Service Robot,Semantic map,Deep learning,	en
dc.relation.page	100
dc.identifier.doi	10.6342/NTU202003842
dc.rights.note	同意授權(全球公開)
dc.date.accepted	2020-08-19
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	電機工程學研究所	zh_TW
顯示於系所單位：	電機工程學系

文件中的檔案：

檔案	大小	格式
U0001-1708202019255900.pdf	4.9 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。