基於深度學習及遷移式學習之機器人操作平板電腦虛擬鍵盤的視覺與動作協調系統

Shao-Yu Chang; 張邵瑀

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/686

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	鄭士康
dc.contributor.author	Shao-Yu Chang	en
dc.contributor.author	張邵瑀	zh_TW
dc.date.accessioned	2021-05-11T04:57:58Z	-
dc.date.available	2019-08-13
dc.date.available	2021-05-11T04:57:58Z	-
dc.date.copyright	2019-08-13
dc.date.issued	2019
dc.date.submitted	2019-08-08
dc.identifier.citation	[1] Y. D. Qian Yu. 'Attention-OCR.' https://github.com/da03/Attention-OCR (accessed. [2] A. Robotics. 'Aldebaran official website.' http://doc.aldebaran.com/2-5/home_nao.html (accessed. [3] S. Feng, E. Whitman, X. Xinjilefu, and C. G. Atkeson, 'Optimization based full body control for the atlas robot,' in 2014 IEEE-RAS International Conference on Humanoid Robots, 2014: IEEE, pp. 120-127. [4] S. Feng, X. Xinjilefu, C. G. Atkeson, and J. Kim, 'Optimization based controller design and implementation for the atlas robot in the darpa robotics challenge finals,' in 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids), 2015: IEEE, pp. 1028-1035. [5] C. W. Wampler, 'Manipulator inverse kinematic solutions based on vector formulations and damped least-squares methods,' IEEE Transactions on Systems, Man, and Cybernetics, vol. 16, no. 1, pp. 93-101, 1986. [6] X. Xinjilefu, S. Feng, W. Huang, and C. G. Atkeson, 'Decoupled state estimation for humanoids using full-body dynamics,' in 2014 IEEE International Conference on Robotics and Automation (ICRA), 2014: IEEE, pp. 195-201. [7] S. J. Julier and J. K. Uhlmann, 'New extension of the Kalman filter to nonlinear systems,' in Signal processing, sensor fusion, and target recognition VI, 1997, vol. 3068: International Society for Optics and Photonics, pp. 182-194. [8] S. Levine, C. Finn, T. Darrell, and P. Abbeel, 'End-to-end training of deep visuomotor policies,' The Journal of Machine Learning Research, vol. 17, no. 1, pp. 1334-1373, 2016. [9] S. Levine, P. Pastor, A. Krizhevsky, J. Ibarz, and D. Quillen, 'Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection,' The International Journal of Robotics Research, vol. 37, no. 4-5, pp. 421-436, 2018. [10] Q. Wan, 'Developments of Drawing Capability for NAO Humanoid Robot,' Information Technology, Vaasan Ammattikorkeakoulu University of Applied Sciences, 2015. [11] X. Sun, 'Development of a Vision System and Basic Drawing with NAO Robot,' Information Technology, Vaasan Ammattikorkeakoulu University of Applied Sciences, 2016. [12] L. Calvo-Varela, C. V. Regueiro, D. S. Canzobre, and R. Iglesias, 'Development of a Nao humanoid robot able to play Tic-Tac-Toe game on a tactile tablet,' in Robot 2015: Second Iberian Robotics Conference, 2016: Springer, pp. 203-215. [13] Y. Philippczyk, 'Implementing Deep Learning Object Recognition on NAO,' Bachelor’s Thesis in the Degree Course, Computer Science and Media, Stuttgart Media University, 2016. [14] D. Albani, A. Youssef, V. Suriani, D. Nardi, and D. D. Bloisi, 'A deep learning approach for object recognition with nao soccer robots,' in Robot World Cup, 2016: Springer, pp. 392-403. [15] B. Sahiner et al., 'Classification of mass and normal breast tissue: a convolution neural network classifier with spatial domain and texture images,' IEEE transactions on Medical Imaging, vol. 15, no. 5, pp. 598-610, 1996. [16] A. Krizhevsky, I. Sutskever, and G. E. Hinton, 'Imagenet classification with deep convolutional neural networks,' in Advances in neural information processing systems, 2012, pp. 1097-1105. [17] H. Sak, A. Senior, and F. Beaufays, 'Long short-term memory recurrent neural network architectures for large scale acoustic modeling,' in Fifteenth annual conference of the international speech communication association, 2014. [18] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, 'You only look once: Unified, real-time object detection,' in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779-788. [19] J. Redmon and A. Farhadi, 'YOLO9000: better, faster, stronger,' in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 7263-7271. [20] J. Redmon and A. Farhadi, 'Yolov3: An incremental improvement,' arXiv preprint arXiv:1804.02767, 2018. [21] C.-Y. Fu, W. Liu, A. Ranga, A. Tyagi, and A. C. Berg, 'Dssd: Deconvolutional single shot detector,' arXiv preprint arXiv:1701.06659, 2017. [22] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, 'Feature pyramid networks for object detection,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 2117-2125. [23] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, 'Focal loss for dense object detection,' in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2980-2988. [24] R. Girshick, J. Donahue, T. Darrell, and J. Malik, 'Rich feature hierarchies for accurate object detection and semantic segmentation,' in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 580-587. [25] V. Mnih et al., 'Human-level control through deep reinforcement learning,' Nature, vol. 518, no. 7540, p. 529, 2015. [26] D. Silver et al., 'Mastering the game of Go with deep neural networks and tree search,' nature, vol. 529, no. 7587, p. 484, 2016. [27] A. Y. Ng et al., 'Autonomous inverted helicopter flight via reinforcement learning,' in Experimental robotics IX: Springer, 2006, pp. 363-372. [28] S. Levine, N. Wagener, and P. Abbeel, 'Learning contact-rich manipulation skills with guided policy search,' in 2015 IEEE international conference on robotics and automation (ICRA), 2015: IEEE, pp. 156-163. [29] C. J. Watkins and P. Dayan, 'Q-learning,' Machine learning, vol. 8, no. 3-4, pp. 279-292, 1992. [30] T. P. Lillicrap et al., 'Continuous control with deep reinforcement learning,' arXiv preprint arXiv:1509.02971, 2015. [31] M. Večerík et al., 'Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards,' arXiv preprint arXiv:1707.08817, 2017. [32] S. Han, J. Pool, J. Tran, and W. Dally, 'Learning both weights and connections for efficient neural network,' in Advances in neural information processing systems, 2015, pp. 1135-1143. [33] T.-Y. Lin et al., 'Microsoft coco: Common objects in context,' in European conference on computer vision, 2014: Springer, pp. 740-755. [34] Z. Lin et al., 'A structured self-attentive sentence embedding,' arXiv preprint arXiv:1703.03130, 2017. [35] S. J. Pan and Q. Yang, 'A survey on transfer learning,' IEEE Transactions on knowledge and data engineering, vol. 22, no. 10, pp. 1345-1359, 2009. [36] Opencv. 'Computer Vision Annotation Tool (CVAT).' https://github.com/opencv/cvat (accessed 2019). [37] J. Canny, 'A computational approach to edge detection,' in Readings in computer vision: Elsevier, 1987, pp. 184-203. [38] C. Tomasi and R. Manduchi, 'Bilateral filtering for gray and color images,' in Iccv, 1998, vol. 98, no. 1, p. 2.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/handle/123456789/686	-
dc.description.abstract	在本論文中，我們提出了一種新穎的視覺和運動協調系統機器人作為長期護理長者的物理代理人操作鍵盤。要年長者去學習現代多樣化的應用程式是如何使用是一項非常困難的挑戰，如果機器人可以為他們操作這些智慧裝置，勢必可以大幅減少照護者的負擔。我們所提出的系統使用卷積神經網絡物件偵測來感知目標按鈕位置，並通過深度神經網絡來控制其動作。我們設計了一個虛擬代理人NAOgym，他負責管理機器人感知和運動模型之間的訊息交換。我們使用了基於CNN的視覺模型來偵測顯示在平板電腦上的目標按鍵與出現在視線中的觸控筆，並且計算它們的相對位置和距離作為觀察到的高階語義信息。而基於DNN的運動模型，將會根據結合了相對位置與物理代理人傳感器的狀態訊息，通過運動模型的策略來產生下一個動作。另外，我們把注意機制應用在動作控制模型上，並將其受專注的程度當作關節的運動速度，來加速強化學習演算法的訓練。在虛擬手臂環境中，我們設計了像NAO一樣的手臂來評估訓練過程和效能，特徵的選擇對效能的影響以及演算法對無須預先校准的假設。通過虛擬手臂進行實驗以評估所提出的系統。實驗結果驗證了我們提出的概念。	zh_TW
dc.description.abstract	In this work, we propose a novel vision and motion coordination system robot as a physical agent typing keyboard for elders in long-term care. It is a challenge for older people to learn how to use modern and diverse applications; if robots can operate these smart devices for them, it will inevitably reduce the burden on caregivers. Our proposed system uses convolutional neural network object detection to sense the position of the target button and control its motion through a deep neural network. We designed a cyber-agent, NAOgym, who manages the exchange of information between robot perception and motion models. We used a CNN-based model to detect the target buttons displayed on the tablet computer and the stylus pen that appeared in sight, and calculate their relative position and distance as the observed high-level semantic information. The DNN-based actor model will generate the next action through the policy of the actor model based on the state information combined with the relative position and the physical agent sensor. In addition, we apply the attention mechanism to the motion control model and use the degree of concentration as the speed of the joint to accelerate the training of the reinforcement learning algorithm. In the virtual arm environment, we design an arm like NAO’s to evaluate the training process and performance, the features affection to the performance, and the calibration-free assumption of the algorithm. The experiments are conducted through the virtual arm environment to evaluate the proposed system. Experiment results verify the conception we proposed.	en
dc.description.provenance	Made available in DSpace on 2021-05-11T04:57:58Z (GMT). No. of bitstreams: 1 ntu-108-R06921081-1.pdf: 4002871 bytes, checksum: d6cecd6fa6809349340153c3a65e2b12 (MD5) Previous issue date: 2019	en
dc.description.tableofcontents	口試委員會審定書 i 誌謝 ii 中文摘要 iii ABSTRACT iv CONTENTS v LIST OF FIGURES viii LIST OF TABLES ix Chapter 1 Introduction 1 1.1 Objective and Motivation 1 1.2 Problem Statement 3 1.3 Literature Survey and Related Work 3 1.4 Contributions 7 1.5 Chapter Outline 7 Chapter 2 Background 8 2.1 Physical Agent: NAO 8 2.1.1 Introduction 8 2.1.2 Features 8 2.1.3 Broker 9 2.1.4 Programming Language 10 2.2 Computer Vision 10 2.2.1 Optical Character Recognition 10 2.2.2 YOLO 11 2.3 Robotic Motion Control 13 2.4 Virtual Environment 16 Chapter 3 System Design 18 3.1 System Scheme 18 3.2 NAO core 19 3.3 Vision Model 20 3.3.1 Detection 21 3.3.2 Training Vision model on Our Task 22 Figure 3.4 Detecting pen tail and target button 22 3.4 Action Generator 23 3.4.1 Training Core 23 3.5 Virtual Arm Environment 23 Figure 3.6 Virtual arm environment 24 3.6 Attention DDPG 24 Chapter 4 Experiment Design and Implementation 26 4.1 Experiment Platform 26 4.1.1 NAO 26 4.1.2 Physical Setup 27 4.1.3 Remote Computation System Setup 29 4.1.4 Virtual Keyboard on the Tablet 30 4.1.5 NAO gym package 31 4.2 Vision 32 4.2.1 Transfer Learning 32 4.2.2 Data Augmentation 33 4.3 Motion 37 4.3.1 Constraints on Joint Angles 37 4.3.2 Reward Engineering 38 4.3.3 State Observation 39 4.3.4 Attention Mechanism 40 Chapter 5 Experiment Results and Discussion 42 5.1 Performance of Attention-OCR and YOLO 42 5.2 Attention-Based Actor 44 5.2.1 Performance in Virtual Arm v0 46 5.2.2 Performance with Absolute Position 48 5.2.1 Performance with Randomized Arm 48 5.3 Type Single Character 50 Chapter 6 Conclusion 51 References 52 Appendix 55
dc.language.iso	en
dc.subject	NAO	zh_TW
dc.subject	機器學習	zh_TW
dc.subject	強化學習	zh_TW
dc.subject	機器人動作控制	zh_TW
dc.subject	NAO	en
dc.subject	reinforcement learning	en
dc.subject	continuous robotic control	en
dc.subject	convolution neural network	en
dc.title	基於深度學習及遷移式學習之機器人操作平板電腦虛擬鍵盤的視覺與動作協調系統	zh_TW
dc.title	A Vision and Motion Coordination System Based on Deep Learning and Transfer Learning for a Robot to Type Virtual Keyboards on a Tablet Computer	en
dc.date.schoolyear	107-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	李弘毅,袁世一
dc.subject.keyword	機器學習,強化學習,機器人動作控制,NAO,	zh_TW
dc.subject.keyword	reinforcement learning,continuous robotic control,convolution neural network,NAO,	en
dc.relation.page	56
dc.identifier.doi	10.6342/NTU201902835
dc.rights.note	同意授權(全球公開)
dc.date.accepted	2019-08-08
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	電機工程學研究所	zh_TW
顯示於系所單位：	電機工程學系

文件中的檔案：

檔案	大小	格式
ntu-108-1.pdf	3.91 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。