Q-PointNet:以RGBD相機和多感測器夾爪結合深度學習技術 實現被遮擋物之取件任務

Chi-Heng Wang; 王啟恆

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/54095

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	林沛群(Pei-Chun Lin peichunlin@ntu.edu.tw )
dc.contributor.author	Chi-Heng Wang	en
dc.contributor.author	王啟恆	zh_TW
dc.date.accessioned	2021-06-16T02:39:48Z	-
dc.date.available	2025-08-04
dc.date.copyright	2020-08-07
dc.date.issued	2020
dc.date.submitted	2020-08-04
dc.identifier.citation	[1] Y. Xiang, T. Schmidt, V. Narayanan, and D. Fox, 'Posecnn: A convolutional neural network for 6d object pose estimation in cluttered scenes,' arXiv preprint arXiv:1711.00199, 2017. [2] C. Wang et al., 'Densefusion: 6d object pose estimation by iterative dense fusion,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 3343-3352. [3] M. Schwarz et al., 'Fast object learning and dual-arm coordination for cluttered stowing, picking, and packing,' in IEEE International Conference on Robotics and Automation (ICRA), 2018, pp. 3347-3354. [4] 林昱辰, '可變與自順應外型多感測器夾爪之開發與其在多外型與多尺寸物件自動化取放任務之應用,' 碩士學位, 工學院機械工程學研究所, 國立台灣大學, 台北, 2014. [5] B. Technology. 'barretthand.' (accessed 07/14, 2020). Available: https://advanced.barrett.com/barretthand [6] S. SDH. 'servo-electric 3-Finger.' (accessed 07/14, 2020). Available: https://schunk.com/de_en/gripping-systems/series/sdh/ [7] ROBOTIQ. '3-Finger Adaptive Robot Gripper.' (accessed 07/14, 2020). Available: https://robotiq.com/products/3-finger-adaptive-robot-gripper?301=%2Fen%2Fproducts%2Findustrial-robot-hand [8] J. Lahoud and B. Ghanem, '2d-driven 3d object detection in rgb-d images,' in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 4622-4630. [9] J. Tremblay, T. To, B. Sundaralingam, Y. Xiang, D. Fox, and S. Birchfield, 'Deep object pose estimation for semantic robotic grasping of household objects,' arXiv preprint arXiv:1809.10790, 2018. [10] D. Kappler, J. Bohg, and S. Schaal, 'Leveraging big data for grasp planning,' in IEEE International Conference on Robotics and Automation (ICRA), 2015, pp. 4304-4311. [11] E. Johns, S. Leutenegger, and A. J. Davison, 'Deep learning a grasp function for grasping under gripper pose uncertainty,' in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2016, pp. 4461-4468. [12] P. Schmidt, N. Vahrenkamp, M. Wächter, and T. Asfour, 'Grasping of unknown objects using deep convolutional neural networks based on depth images,' in IEEE International Conference on Robotics and Automation (ICRA), 2018, pp. 6831-6838. [13] M. Gualtieri, A. Ten Pas, K. Saenko, and R. Platt, 'High precision grasp pose detection in dense clutter,' in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2016, pp. 598-605. [14] A. Zeng, S. Song, S. Welker, J. Lee, A. Rodriguez, and T. Funkhouser, 'Learning synergies between pushing and grasping with self-supervised deep reinforcement learning,' in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2018, pp. 4238-4245. [15] D. Kalashnikov et al., 'Qt-opt: Scalable deep reinforcement learning for vision-based robotic manipulation,' arXiv preprint arXiv:1806.10293, 2018. [16] 蔡謹容, '以基礎幾何模型搭配具主被動自由度和掌內壓力與近接感測之夾爪達到快速低計算成本之多樣化物體夾取,' 碩士學位, 工學院機械工程學研究所, 國立台灣大學, 台北, 2017. [17] J.-R. Tsai and P.-C. Lin, 'A low-computation object grasping method by using primitive shapes and in-hand proximity sensing,' in IEEE International Conference on Advanced Intelligent Mechatronics (AIM), 2017, pp. 497-502. [18] 維基百科. '超文本傳輸協定.' (accessed 07/14, 2020). Available: https://zh.wikipedia.org/wiki/%E8%B6%85%E6%96%87%E6%9C%AC%E5%82%B3%E8%BC%B8%E5%8D%94%E5%AE%9A [19] 游崴舜, '可側傾雙輪機器人之運動控制與其內部機器人泛用機電系統架構,' 碩士學位, 機械工程學系, 國立台灣大學, 台北, 2012. [20] 巨承科技. '巨承科技首頁.' (accessed 07/14, 2020). Available: http://www.g-chen.com/about_us/index.php [21] Adafruit. 'Force Sensitive Resistor (FSR).' (accessed 07/14, 2020). Available: https://learn.adafruit.com/force-sensitive-resistor-fsr/using-an-fsr?view=all [22] Intel. 'Intel® RealSense™ Depth Camera D435.' (accessed 07/14, 2020). Available: https://www.intelrealsense.com/depth-camera-d435/ [23] OpenCV. 'hand-eye calibration.' (accessed 07/14, 2020). Available: https://answers.opencv.org/question/204910/calculate-robot-coordinates-from-measured-chessbaord-corners-hand-eye-calibration/?fbclid=IwAR3R4_SeLbez15tkuHzziQt2QLlu9om_MOPv0JS4BL8Ye-vk83HvX3kE2mY [24] L. University. 'Calibration and Registration Techniques for Robotics.' (accessed 07/14, 2020). Available: http://math.loyola.edu/~mili/Calibration/index.html [25] R.-h. Liang and J.-f. Mao, 'Hand-eye calibration with a new linear decomposition algorithm,' Journal of Zhejiang University-SCIENCE A, 2008. vol. 9, no. 10, pp. 1363-1368. [26] R. T. Fielding and R. N. Taylor, Architectural styles and the design of network-based software architectures. University of California, Irvine Irvine, 2000. [27] C.-H. Wang. 'RESTful API實作.' (accessed 07/14, 2020). Available: https://github.com/Chihengwang/flaskRestfulApi [28] 林沛群. '機器人學一(Robotics(1)).' (accessed 07/14, 2020). Available: https://www.coursera.org/learn/robotics1 [29] 陳柏勳, '基於位置與力複合誤差控制之雙機器手臂協同持物操作與學習演算法之應用,' 碩士學位, 機械工程學系, 國立臺灣大學, 台北, 2018. [30] K. He, G. Gkioxari, P. Dollár, and R. Girshick, 'Mask r-cnn,' in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2961-2969. [31] C. R. Qi, H. Su, K. Mo, and L. J. Guibas, 'Pointnet: Deep learning on point sets for 3d classification and segmentation,' in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 652-660. [32] F. A. Research. 'Advancing AI to bring the world closer together.' (accessed 07/14, 2020). Available: https://ai.facebook.com/research/ [33] A. Garcia-Garcia, S. Orts-Escolano, S. Oprea, V. Villena-Martinez, and J. Garcia-Rodriguez, 'A review on deep learning techniques applied to semantic segmentation,' arXiv preprint arXiv:1704.06857, 2017. [34] S. Ren, K. He, R. Girshick, and J. Sun, 'Faster r-cnn: Towards real-time object detection with region proposal networks,' in Advances in neural information processing systems, 2015, pp. 91-99. [35] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, 'You only look once: Unified, real-time object detection,' in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779-788. [36] W. Liu et al., 'Ssd: Single shot multibox detector,' in European conference on computer vision, 2016: Springer, pp. 21-37. [37] A. Umam, '4 Mask RCNN Arc.(Part3) - How RoI Pooling, RoI Warping RoI Align Work,' ed, 2017. [38] B. C. Russell, A. Torralba, K. P. Murphy, and W. T. Freeman, 'LabelMe: a database and web-based tool for image annotation,' International journal of computer vision, 2008. vol. 77, no. 1-3, pp. 157-173. [39] C.-H. Wang. 'Auto labeling.' (accessed 07/14, 2020). Available: https://github.com/Chihengwang/autolabel [40] 將門創投. '點雲上的深度學習及其在三維場景理解中的應用.' (accessed 07/14, 2020). Available: https://www.youtube.com/watch?v=Ew24Rac8eYE [41] C. Manoj, 'An Efficient System for Forward Collision Avoidance Using Low Cost Camera Embedded Processor in Autonomous Vehicles.' [42] J. Yosinski, J. Clune, Y. Bengio, and H. Lipson, 'How transferable are features in deep neural networks?,' in Advances in neural information processing systems, 2014, pp. 3320-3328. [43] S. Malgonde. 'Transfer learning using Tensorflow.' (accessed 07/14, 2020). Available: https://medium.com/@subodh.malgonde/transfer-learning-using-tensorflow-52a4f6bcde3e [44] Q.-Y. Zhou, J. Park, and V. Koltun, 'Open3D: A modern library for 3D data processing,' arXiv preprint arXiv:1801.09847, 2018. [45] C.-H. Wang. 'Point Cloud tool.' (accessed. Available: https://github.com/chihengwang/pointcloudtool [46] Q.-Y. Zhou, J. Park, and V. Koltun. 'A Modern Library for 3D Data Processing.' (accessed 07/14, 2020). Available: http://www.open3d.org/ [47] O. Perception. 'Removing outliers using a Conditional or RadiusOutlier removal.' (accessed 07/14, 2020). Available: http://pointclouds.org/documentation/tutorials/remove_outliers.php [48] O. Perception. 'Removing outliers using a StatisticalOutlierRemoval filter.' (accessed 07/14, 2020). Available: http://pointclouds.org/documentation/tutorials/statistical_outlier.php [49] G. Peyré. 'Mesh Processing Course : Geodesic Sampling.' (accessed 07/14, 2020). Available: https://www.slideshare.net/gpeyre/mesh-processing-course-geodesic-sampling [50] F. Pedregosa et al., 'Scikit-learn: Machine learning in Python,' the Journal of machine Learning research, 2011. vol. 12, pp. 2825-2830. [51] F. Pedregosa et al. 'sklearn.decomposition.PCA.' (accessed 07/14, 2020). Available: https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html [52] Q. Lei, G. Chen, and M. Wisse, 'Fast grasping of unknown objects using principal component analysis,' Aip Advances, 2017. vol. 7, no. 9, p. 095126. [53] V. Mas. 'ViTables.' (accessed 07/14, 2020). Available: https://vitables.org/ [54] Z. Wu et al., '3d shapenets: A deep representation for volumetric shapes,' in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1912-1920. [55] C.-H. Wang. 'stacked box three trial.' (accessed 8/1, 2020). Available: https://www.youtube.com/watch?v=YVWvAMHFJwI [56] C.-H. Wang. 'stacked banana three trial.' (accessed 8/2, 2020). Available: https://www.youtube.com/watch?v=oBSjwODO-sg [57] C. R. Qi, L. Yi, H. Su, and L. J. Guibas, 'Pointnet++: Deep hierarchical feature learning on point sets in a metric space,' in Advances in neural information processing systems, 2017, pp. 5099-5108.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/54095	-
dc.description.abstract	本研究的目標是結合RGBD相機、多感測器夾爪(力感測器、電位計、霍夫感測器)、並且結合深度學習技術，在堆疊環境的情況下(如水果籃、玩具盒等)，實現對待抓取的目標物(有被遮擋的現象)進行穩定的夾取任務。視覺相機在工業或自動化生產線上使用的非常普遍，通常用來與機械手臂與機械夾爪搭配使用。在相機中的挑選又以純RGB的相機跟RGBD的相機更為常見，但RGBD的相機與純RGB的相機相比，多了深度的資訊，是更適合機器人使用，也能夠減少一些不必要的演算法的計算時間去得到物體實際的位置。此外，在Eye-to-hand與Eye-in-hand的系統中，本研究採用Eye-in-hand的系統，是更能夠以多元的角度去觀測物體而不是固定視角，讓機器手臂如同人一樣能夠更有彈性的產生不同視角的資料。另外，也能夠在遮蔽情況產生時，藉由不同角度的觀察，找到目標物體。影像定位與姿態辨識的部分。首先是影像定位的步驟，配合RGBD的資訊，能夠藉由Mask R-CNN找出目標物的Mask，並且藉由深度資訊將其部分點雲找出，此外藉由Mask R-CNN在部分遮蔽時也能夠找出目標物的Mask，搭配Eye-in-hand的系統能夠有效的得到物體剩餘的形狀資訊。在夾取姿態演算法的步驟中，引入PointNet在點雲分類上傑出的表現，並且藉由Transfer learning的技術將其改成姿態辨識的框架—Q-PointNet。除了產生出四元數當成抓取姿態以外，在同時也藉由這個框架產生出夾爪的模式(兩指或三指)，以達到讓夾爪能有更好的表現，而在這邊的Q-PointNet其實可以想成是一個形狀的分辨器，而在觀察到不同的部分點雲時，此分辨器會給出一個適當的抓取姿態，尤其在當物體的幾何資訊(點雲資訊)不足時使用，也就是當目標物受到其他物品的遮擋時，仍然能夠產生出好的夾取姿態。結合夾爪上電位計的感測效果，提出與Q-PointNet產生的夾取姿態相關的寬度計算演算法。在電位計使用上，可以讓指頭達到非常準確的角度量測，以用來控制指頭應該到達的位置。而此寬度計算的演算法則是用來計算希望夾爪預先要張開的寬度，以利於機械手臂與夾爪在執行夾取過程時，與其他物體產生碰撞。在準備訓練研究中提出的架構時，總共需要準備兩種數據集，一個是訓練Mask R-CNN使用，另一個是用來訓練Q-PointNet使用。在Mask R-CNN訓練資料集的準備，研究中提出一個可行的解決方案—Auto-labeling，用來快速產生訓練Mask R-CNN的數據集。在Q-PointNet訓練資料集的準備上，研究中則實做了一個介面去做資料集的準備(部分點雲、夾取姿態、模式)。在介面中引入PCA當成是調整夾取姿態用的參考坐標系，並用尤拉角的計算讓介面上可以實時調整標註姿態。此外在介面上也引入ModelNet40資料集來產生模擬部分點雲，以減少資料蒐集上的時間。同時，在點雲資料的處理上，研究中也將常用的處理點雲的方法統整成一個套件包，以方便在點雲資料前處理使用。最後，在實際的機器人實驗中，對影像定位做了實驗去驗證其精度誤差可以在正負4mm以內。另外對於夾取姿態演算法做了不同情境下的夾取實驗，分別是單個物體的夾取、倚靠物的夾取、分散物的夾取、堆疊物的夾取，以驗證Q-PointNet產生的夾取姿態與夾取策略能夠在不同情境下都適用，並且搭配力感測器，讓夾爪達到更穩定的抓取。	zh_TW
dc.description.abstract	The goal of this research is to combine RGBD camera, multi-sensor gripper (force sensor, potentiometer), and deep learning technology to achieve a stable gripping task. In this research, RGBD camera is chosen as our visual information rather than using RGB camera because it provides us with depth information more than just RGB images. The eye-in-hand method adopted in the study is more capable of observing objects from multiple angles than a fixed angle of view, allowing camera to produce data from different angles of view more flexibly like a human. In addition, it can also detect the target object through observation from different angles when the occlusion situation occurs. With respect to the gripping task, localization of object and determination of the grasping pose are the significant problems to be resolved. In this thesis, it proposes deep-learnt grasping algorithm, Q-PointNet, which is capable of determining an adequate strategy for grasping a partially exposed object in a stacked pile. The grasping strategy includes the gripper’s posture and the finger mode, whether two fingers or three fingers. Because our predicted outputs are quaternion and mode, it also explains fully how to utilize the hybrid loss function to reach our goal within the limited training dataset. Combined with the sensing effect of the potentiometer on the gripper, a width calculation algorithm related to the gripping posture generated by Q-PointNet is proposed. Potentiometers installed on the gripper are possible to measure the rotating angle very accurately from the finger and can also let us to determine the position where the finger should reach. Afterwards, with the object width calculation and the setup of open width on the gripper in advance, the robot arm and the gripper won't collide with other objects during the gripping process. For preparation of the dataset in Q-PointNet structure, this thesis reports two possible ways to prepare for the training dataset. In Q-PointNet structure, it can be separated into two steps so two kind of dataset need to be prepared. Firstly, in Mask R-CNN, it usually needs to label image through LabelMe application, which is a laborious work. However, in thesis, it reports a faster way to label the data and it’s called auto-labeling method. Secondly, in Q-PointNet, it also needs to prepare training dataset including grasping pose, mode, and partial point cloud. In order to visualize and accelerate the labeling procedure, this thesis reports a pose annotation system to generate the dataset. In the end, the grasp experiments and their operation flows are presented, and the results show that our approach is suitable for various scenarios. Moreover, with force sensor, it gives more stable grasping when robotic arm is carrying out the task.	en
dc.description.provenance	Made available in DSpace on 2021-06-16T02:39:48Z (GMT). No. of bitstreams: 1 U0001-0408202000225100.pdf: 11586915 bytes, checksum: 17b56b6ad8afdef236873c3728fa4f4b (MD5) Previous issue date: 2020	en
dc.description.tableofcontents	目錄誌謝 I 中文摘要 II ABSTRACT IV 目錄 VI 圖目錄 X 表目錄 XIX 第一章緒論 1 1.1 前言 1 1.2 研究動機 1 1.3 文獻回顧 2 1.4 貢獻 6 1.5 論文架構 9 第二章實驗平台 10 2.1 前言 10 2.2 平台回顧及機電架構 10 2.2.1 夾爪介紹 10 2.2.2 實驗系統架構 15 2.2.3 力感測器更新 22 2.3 影像擷取系統 35 2.3.1 Realsense D435介紹 36 2.3.2 深度相機校正與SDK 模組介紹 37 2.3.3 Eye-in-hand跟Eye-to-hand差異介紹 38 2.3.4 手眼校正流程 39 2.3.5 實驗數據與比較 42 2.4 視覺定位的通訊系統 45 2.4.1 HTTP 工作原理說明 45 2.4.2 RESTful API架構介紹 47 2.4.3 Flask介紹 48 2.4.4 JSON資料格式的傳遞 49 2.4.5 實作架構設計 50 2.4.6 自動化拍照系統 53 2.5 順逆運動學推導 54 2.5.1 DH table定義 55 2.5.2 逆運動學推導 57 2.6 本章小節 62 第三章視覺定位與抓取姿態演算法 63 3.1 視覺框架介紹 63 3.1.1 傳統視覺架構回顧 63 3.1.2 整體視覺框架介紹 67 3.1.3 Tensorflow工具介紹 68 3.2 MASK R-CNN 69 3.2.1 Mask R-CNN架構介紹 71 3.2.2 訓練流程與Labelme tool介紹 73 3.2.3 分散資料與堆疊資料定義 75 3.2.4 訓練資料自動標註法(Auto-labeling) 76 3.2.5 自動標註法實驗結果 81 3.2.6 自動標註法加上遮擋演算法 82 3.2.7 遮擋演算法實驗結果 83 3.2.8 背景色差異實驗結果 85 3.2.9 小節結論 87 3.3 Q-POINTNET架構設計 88 3.3.1 PointNet框架介紹 88 3.3.2 由Mask R-CNN產生物體部分點雲 91 3.3.3 3D重建與蒐集部分點雲自動化 94 3.3.4 Q-PointNet框架 96 3.3.5 訓練結果說明 100 3.4 抓取寬度計算演算法 102 3.4.1 夾取寬度估測演算法介紹 102 3.4.2 藉電位計設定寬度 105 3.5 初步驗證與夾取策略 106 3.6 本章小節 116 第四章點雲處理套件與姿態蒐集系統 117 4.1 前言 117 4.2 點雲處理套件 117 4.2.1 Open3D介紹 118 4.2.2 Q-PointNet資料集準備流程 118 4.2.3 向下取樣的演算法介紹 120 4.2.4 離群值去除(Outlier elimination)演算法 121 4.2.5 Farthest point sampling 122 4.3 主成分分析(PCA) 123 4.3.1 主成分分析演算法介紹 124 4.3.2 SKlearn PCA 介紹 126 4.3.3 用以評估點雲前處理的效果比較 127 4.3.4 PCA當成抓取姿態與其相對夾爪坐標系定義 128 4.4 姿態蒐集系統 131 4.4.1 介面介紹 132 4.4.2 ModelNet40模擬部分點雲 141 4.5 本章小節 142 第五章夾取實驗流程設計與結果 143 5.1 前言 143 5.2 部分點雲定位精度分析 143 5.3 自動化抓取實驗設計 146 5.3.1 自動化抓取流程架構設計 146 5.3.2 實驗結果 150 5.4 本章小結 174 第六章結論與未來展望 176 6.1 結論 176 6.2 未來展望 176 REFERENCES 178
dc.language.iso	zh-TW
dc.subject	點雲處理	zh_TW
dc.subject	通用型夾爪	zh_TW
dc.subject	深度學習架構	zh_TW
dc.subject	Mask R-CNN	zh_TW
dc.subject	PointNet	zh_TW
dc.subject	視覺定位	zh_TW
dc.subject	夾取姿態演算法	zh_TW
dc.subject	力感測器	zh_TW
dc.subject	點雲處理	zh_TW
dc.subject	通用型夾爪	zh_TW
dc.subject	深度學習架構	zh_TW
dc.subject	Mask R-CNN	zh_TW
dc.subject	PointNet	zh_TW
dc.subject	視覺定位	zh_TW
dc.subject	夾取姿態演算法	zh_TW
dc.subject	力感測器	zh_TW
dc.subject	Mask R-CNN	en
dc.subject	RGBD	en
dc.subject	PointNet	en
dc.subject	dexterous hand	en
dc.subject	grasping	en
dc.subject	stacked objects	en
dc.subject	RGBD	en
dc.subject	PointNet	en
dc.subject	Mask R-CNN	en
dc.subject	dexterous hand	en
dc.subject	grasping	en
dc.subject	stacked objects	en
dc.title	Q-PointNet:以RGBD相機和多感測器夾爪結合深度學習技術實現被遮擋物之取件任務	zh_TW
dc.title	Q-PointNet: Intelligent Stacked-Objects Grasping Using a RGBD Sensor and a Dexterous Hand	en
dc.type	Thesis
dc.date.schoolyear	108-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	黃光裕(Kuang-Yuh Huang),連豊力(Feng-Li Lian),顏炳郎(Ping-Lang Yen)
dc.subject.keyword	通用型夾爪,深度學習架構,Mask R-CNN,PointNet,視覺定位,夾取姿態演算法,力感測器,點雲處理,	zh_TW
dc.subject.keyword	grasping,stacked objects,RGBD,PointNet,Mask R-CNN,dexterous hand,	en
dc.relation.page	181
dc.identifier.doi	10.6342/NTU202002335
dc.rights.note	有償授權
dc.date.accepted	2020-08-04
dc.contributor.author-college	工學院	zh_TW
dc.contributor.author-dept	機械工程學研究所	zh_TW
顯示於系所單位：	機械工程學系

文件中的檔案：

檔案	大小	格式
U0001-0408202000225100.pdf 未授權公開取用	11.32 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。