基於點雲辨識堆疊物件之六自由度夾取

林奕安; Yi-An Lin

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/92793

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	李志中	zh_TW
dc.contributor.advisor	Jyh-Jone Lee	en
dc.contributor.author	林奕安	zh_TW
dc.contributor.author	Yi-An Lin	en
dc.date.accessioned	2024-07-01T16:07:46Z	-
dc.date.available	2024-07-02	-
dc.date.copyright	2024-07-01	-
dc.date.issued	2024	-
dc.date.submitted	2024-06-20	-
dc.identifier.citation	[1] Cao, Z., Sheikh, Y., and Banerjee, N. K., “Real-Time Scalable 6dof Pose Estimation for Textureless Objects,” 2016 IEEE International conference on Robotics and Automation (ICRA), May 16 - 21, Stockholm, Sweden. IEEE. pp. 2441-2448, 2016. [2] Kehl, W., Manhardt, F., Tombari, F., Ilic, S., and Navab, N., “Ssd-6d: Making Rgb-Based 3d Detection and 6d Pose Estimation Great Again,” Proceedings of the IEEE international conference on computer vision, Oct 22-29, Venice, Italy. pp. 1521-1529, 2017. [3] Wang, G., Manhardt, F., Tombari, F., and Ji, X., “Gdr-Net: Geometry-Guided Direct Regression Network for Monocular 6d Object Pose Estimation,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun 20-25, Nashville, TN, USA. pp. 16611-16621, 2021. [4] Billings, G. and Johnson-Roberson, M., “Silhonet: An Rgb Method for 6d Object Pose Estimation,” IEEE Robotics and Automation Letters, 4(4), pp. 3727-3734, 2019. [5] Xiang, Y., Schmidt, T., Narayanan, V., and Fox, D., “Posecnn: A Convolutional Neural Network for 6d Object Pose Estimation in Cluttered Scenes,” arXiv preprint arXiv:1711.00199, 2017. [6] Liu, W. et al., “Ssd: Single Shot Multibox Detector,” Computer Vision–ECCV 2016: 14th European Conference, Oct 11-14, Amsterdam, The Netherlands. Springer. pp. 21-37, 2016. [7] 周昱辰, “全自動化深度學習隨機堆疊物件夾取流程之建構,” 碩士論文, 機械工程學系, 國立臺灣大學, 2023年, 2023. [8] He, K., Gkioxari, G., Dollár, P., and Girshick, R., “Mask R-Cnn,” Proceedings of the IEEE international conference on computer vision, Oct 22-29, Venice, Italy. pp. 2961-2969, 2017. [9] Morrison, D., Corke, P., and Leitner, J., “Closing the Loop for Robotic Grasping: A Real-Time, Generative Grasp Synthesis Approach,” arXiv preprint arXiv:1804.05172, 2018. [10] Chen, W., Duan, J., Basevi, H., Chang, H. J., and Leonardis, A., “Pointposenet: Point Pose Network for Robust 6d Object Pose Estimation,” Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Mar 1-5, Snowmass Village, Colorado. pp. 2824-2833, 2020. [11] Duffhauss, F., Demmler, T., and Neumann, G., “Mv6d: Multi-View 6d Pose Estimation on Rgb-D Frames Using a Deep Point-Wise Voting Network,” 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Oct 23–27, Kyoto, Japan. IEEE. pp. 3568-3575, 2022. [12] He, Y., Huang, H., Fan, H., Chen, Q., and Sun, J., “Ffb6d: A Full Flow Bidirectional Fusion Network for 6d Pose Estimation,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun 19-25, Virtual Conference. pp. 3003-3013, 2021. [13] Zeng, L., Lv, W. J., Zhang, X. Y., and Liu, Y. J., “Parametricnet: 6dof Pose Estimation Network for Parametric Shapes in Stacked Scenarios,” 2021 IEEE International Conference on Robotics and Automation (ICRA), May30 - Jun5, Xian, China. IEEE. pp. 772-778, 2021. [14] Hagelskjær, F. and Haugaard, R. L., “Keymatchnet: Zero-Shot Pose Estimation in 3d Point Clouds by Generalized Keypoint Matching,” arXiv preprint arXiv:2303.16102, 2023. [15] Wang, C. et al., “Densefusion: 6d Object Pose Estimation by Iterative Dense Fusion,” Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, Jun 15-20, Long Beach, CA, USA. pp. 3343-3352, 2019. [16] Qi, C. R., Su, H., Mo, K., and Guibas, L. J., “Pointnet: Deep Learning on Point Sets for 3d Classification and Segmentation,” Proceedings of the IEEE conference on computer vision and pattern recognition, Jul 21-26, Honolulu, HI, USA. pp. 652-660, 2017. [17] Dong, Z. et al., “Ppr-Net: Point-Wise Pose Regression Network for Instance Segmentation and 6d Pose Estimation in Bin-Picking Scenarios,” 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Nov 3-8, Venetian Macao, Macau. IEEE. pp. 1773-1780, 2019. [18] Qi, C. R., Yi, L., Su, H., and Guibas, L. J., “Pointnet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space,” Advances in neural information processing systems, 30, 2017. [19] Kumra, S. and Kanan, C., “Robotic Grasp Detection Using Deep Convolutional Neural Networks,” 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Sep 24-28, Vancouver, Canada. IEEE. pp. 769-776, 2017. [20] Chu, F.-J., Xu, R., and Vela, P. A., “Real-World Multiobject, Multigrasp Detection,” IEEE Robotics and Automation Letters, 3(4), pp. 3355-3362, 2018. [21] Guo, D., Sun, F., Kong, T., and Liu, H., “Deep Vision Networks for Real-Time Robotic Grasp Detection,” International Journal of Advanced Robotic Systems, 14(1), pp. 1729881416682706, 2016. [22] Levine, S., Pastor, P., Krizhevsky, A., Ibarz, J., and Quillen, D., “Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection,” The International journal of robotics research, 37(4-5), pp. 421-436, 2018. [23] Zhang, H., Zhou, X., Lan, X., Li, J., Tian, Z., and Zheng, N., “A Real-Time Robotic Grasping Approach with Oriented Anchor Box,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, 51(5), pp. 3014-3025, 2019. [24] 王仁蔚, 李佳蓮, 陳健倫, 王俞程, 鄭詠珈, 李志中, “應用視圖之經驗轉移於機器手臂夾取隨機堆疊物件,” 機械工業雜誌, (461), pp. 39-51, 2021. [25] Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.-A., and Bottou, L., “Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion,” Journal of machine learning research, 11(12), 2010. [26] 曾柏翔, “應用自動學習於實例分割之堆疊物件泛化夾取流程,” 碩士論文, 機械工程學系, 國立臺灣大學, 2023年, 2023. [27] Morrison, D., Corke, P., and Leitner, J., “Learning Robust, Real-Time, Reactive Robotic Grasping,” The International journal of robotics research, 39(2-3), pp. 183-201, 2020. [28] Mousavian, A., Eppner, C., and Fox, D., “6-Dof Graspnet: Variational Grasp Generation for Object Manipulation,” Proceedings of the IEEE/CVF International Conference on Computer Vision, 27Oct - 2Nov, Seoul, South Korea. pp. 2901-2910, 2019. [29] Varley, J., DeChant, C., Richardson, A., Ruales, J., and Allen, P., “Shape Completion Enabled Robotic Grasping,” 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS), Sep 24-28, Vancouver, Canada. IEEE. pp. 2442-2447, 2017. [30] Zhao, B., Zhang, H., Lan, X., Wang, H., Tian, Z., and Zheng, N., “Regnet: Region-Based Grasp Network for End-to-End Grasp Detection in Point Clouds,” 2021 IEEE International Conference on Robotics and Automation (ICRA), May30 - Jun5, Xian, China. IEEE. pp. 13474-13480, 2021. [31] Chalvatzaki, G., Gkanatsios, N., Maragos, P., and Peters, J., “Orientation Attentive Robotic Grasp Synthesis with Augmented Grasp Map Representation,” arXiv preprint arXiv:2006.05123, 2020. [32] Liang, H. et al., “Pointnetgpd: Detecting Grasp Configurations from Point Sets,” 2019 International Conference on Robotics and Automation (ICRA), May 20-24, Montreal, Canada. IEEE. pp. 3629-3635, 2019. [33] Ten Pas, A. and Platt, R., “Using Geometry to Detect Grasp Poses in 3d Point Clouds,” Robotics Research: Volume 1, pp. 307-324, 2018. [34] Wei, W. et al., “Gpr: Grasp Pose Refinement Network for Cluttered Scenes,” 2021 IEEE International Conference on Robotics and Automation (ICRA), May30 - Jun5, Xian, China. IEEE. pp. 4295-4302, 2021. [35] Hosang, J., Benenson, R., and Schiele, B., “Learning Non-Maximum Suppression,” Proceedings of the IEEE conference on computer vision and pattern recognition, Jul 21-26, Honolulu, HI, USA. pp. 4507-4515, 2017. [36] Redmon, J., Divvala, S., Girshick, R., and Farhadi, A., “You Only Look Once: Unified, Real-Time Object Detection,” Proceedings of the IEEE conference on computer vision and pattern recognition, Jun 27-30, Las Vegas, Nevada, USA. pp. 779-788, 2016. [37] Aydin, B. and Singha, S., “Drone Detection Using Yolov5,” Eng, 4(1), pp. 416-433, 2023. [38] Lin, T.-Y. et al., “Microsoft Coco: Common Objects in Context,” Computer Vision–ECCV 2014: 13th European Conference,, Sep 6-12, Zurich, Switzerland. Springer. pp. 740-755, 2014. [39] Kirillov, A. et al., “Segment Anything,” arXiv preprint arXiv:2304.02643, 2023. [40] Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., and Abbeel, P., “Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World,” 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS), Sep 24-28, Vancouver, Canada. IEEE. pp. 23-30, 2017. [41] Brégier, R., Devernay, F., Leyrit, L., and Crowley, J. L., “Symmetry Aware Evaluation of 3d Object Detection and Pose Estimation in Scenes of Many Parts in Bulk,” Proceedings of the IEEE International Conference on Computer Vision Workshops, Oct 22-29, Venice, Italy. pp. 2209-2218, 2017.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/92793	-
dc.description.abstract	隨著人工智慧科技的進步，機械手臂在夾取物件的技術有了顯著提升。對於物件的辨識，從易受環境影響的 RGB-D 影像逐漸轉向能提供更多幾何資訊並且不受光線影響的點雲影像。然而，點雲影像的品質影響深度學習模型預測的優劣，因此需要較昂貴的相機以獲得高品質的點雲。為解決此問題，本研究透過整合不同深度學習模型以提升模型對較經濟型相機所取得影像的預測能力，並透過電腦圖形軟體與目標物件的 CAD 模型來合成訓練資料集，再以自我學習演算法提升模型的表現，最後以少量真實資料進行模型微調。其中，本研究自行開發了一套快速標註方法，以標註少量真實資料，減少前置作業的時間。本研究開發了一套六自由度(6D)夾取堆疊物件的流程，使用經濟型相機Kinect v2拍攝 RGB-D 影像，將 RGB 影像輸入實例切割模型(YOLO-v8)預測物件遮罩及類別，並將遮罩下的深度影像轉換成點雲後輸入姿態估測模型(PPR-Net)預測物件姿態。接著，本研究開發一種6D夾取策略進行夾取框篩選及夾取高度計算，以決定最佳的夾取框及機械手臂夾取姿態。最後，在真實堆疊物件的工作平台上進行夾取實驗，以全自動方式分別進行四種夾取實驗，包括金屬圓管、T型塑膠水管及L型門把的單一種類夾取實驗，以及三種物件的混合夾取分類實驗。實驗結果顯示夾取成功率分別為100%、96.39%、89.89%及94.48%。夾取展示影片如下網址: https://youtu.be/LDHmKGCPOMo?si=eBGcqo6szhjQluYw	zh_TW
dc.description.abstract	With the advancement of artificial intelligence, there has been significant progress in the capabilities of robotic arms in grasping objects. Regarding object recognition, there is a shift from RGB-D images, which are susceptible to environmental influences, towards point clouds that provide more geometric information and are less affected by lighting conditions. However, the quality of point clouds affects the accuracy of deep learning models, necessitating the use of more expensive cameras to obtain high-quality point clouds. To address this issue, this study integrates different deep learning models to enhance the performance of models on images obtained from economical cameras. Additionally, it synthesizes training data by rendering software with CAD models, employs self-learning algorithms to improve the model''s performance, and fine-tunes it with a small amount of real data. In this process, the study developed a rapid annotation method to label the real data, reducing preprocessing time. This study developed a 6-degree-of-freedom (6D) grasping process for stacked objects, utilizing economical cameras to capture RGB-D images. It involves inputting RGB images into an instance segmentation model (YOLO-v8) to predict object masks and categories. The depth images under the masks are then transformed into point clouds and fed into a pose estimation model (PPR-Net) to predict object poses. Finally, the study developed a 6D grasping strategy for frame selection and depth calculation to determine the optimal grasping pose. Experiments were conducted on a real platform in a fully automated manner, performing four types of grasping experiments: single-category grasping of metal circular pipes, T-shaped plastic pipes, and L-shaped door handles, as well as a mixed-category grasping experiment with three types of objects. The results showed success rates of 100%, 96.39%, 89.89%, and 94.48%, respectively. The demonstration video of the grasping can be found at the following URL: https://youtu.be/LDHmKGCPOMo?si=eBGcqo6szhjQluYw	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-07-01T16:07:46Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2024-07-01T16:07:46Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	口試委員會審定書 i 誌謝 ii 摘要 iii Abstract iv 圖次 viii 表次 xi 第1章緒論 1 1.1 前言 1 1.2 文獻回顧 1 1.2.1 物件的姿態估測 2 1.2.2 物件的夾取點預測 6 1.3 研究動機與目的 9 1.4 論文架構 11 第2章實例切割 12 2.1 YOLO 12 2.1.1 YOLO-v1 12 2.1.2 YOLO-v5 13 2.1.3 YOLO-v8 14 2.2 訓練資料集 16 2.2.1 Blender資料生成 17 2.2.2 領域隨機化 18 2.2.3 快速手動標註軟體 20 2.3 模型訓練 20 2.3.1 自動學習演算法 21 2.4 模型預測結果 22 2.4.1 mAP及mAR介紹 22 2.4.2 量化預測結果與分析 24 第3章姿態估測 26 3.1 物件姿態對稱性質 26 3.2 點雲姿態估測 27 3.2.1 PointNet與PointNet++ 27 3.2.2 PPR-Net 29 3.2.3 迭代最近點算法 30 3.3 訓練資料集 31 3.3.1 PyBullet資料生成 31 3.3.2 資料前處理 32 3.4 模型預測結果 33 3.4.1 預測結果分析 33 3.4.2 Kinect v2與Ensenso N45所取點雲比較 34 第4章 6D夾取策略 38 4.1 手眼校正 38 4.2 夾取框標註 39 4.3 夾取框篩選 40 4.3.1 碰撞偵測 41 4.3.2 手臂奇異點判斷 45 4.4 夾取路徑規劃 48 4.4.1 夾取路徑及機械手臂末端姿態 48 4.4.2 動態調整夾取高度 50 第5章系統與驗證 53 5.1 系統介紹 53 5.1.1 系統架構 53 5.1.2 實驗環境 54 5.1.3 夾取流程 55 5.2 夾取驗證 57 5.2.1 實驗流程 57 5.2.2 夾取結果 59 5.3 結果討論 60 第6章結論與未來展望 62 6.1 結論 62 6.2 未來展望 63 參考文獻 65	-
dc.language.iso	zh_TW	-
dc.subject	六自由度夾取	zh_TW
dc.subject	深度學習	zh_TW
dc.subject	姿態估測	zh_TW
dc.subject	亂料夾取	zh_TW
dc.subject	點雲影像	zh_TW
dc.subject	Deep Learning	en
dc.subject	6-DoF Grasp	en
dc.subject	Pose Estimation	en
dc.subject	Point Cloud	en
dc.subject	Random Bin Picking	en
dc.title	基於點雲辨識堆疊物件之六自由度夾取	zh_TW
dc.title	Six-DoF Robotic Grasping of Objects in Clutter via Point Cloud	en
dc.type	Thesis	-
dc.date.schoolyear	112-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	林峻永;施博仁	zh_TW
dc.contributor.oralexamcommittee	Chun-Yeon Lin;Po-Jen Shih	en
dc.subject.keyword	深度學習,六自由度夾取,姿態估測,點雲影像,亂料夾取,	zh_TW
dc.subject.keyword	Deep Learning,6-DoF Grasp,Pose Estimation,Point Cloud,Random Bin Picking,	en
dc.relation.page	69	-
dc.identifier.doi	10.6342/NTU202401235	-
dc.rights.note	同意授權(全球公開)	-
dc.date.accepted	2024-06-20	-
dc.contributor.author-college	工學院	-
dc.contributor.author-dept	機械工程學系	-
dc.date.embargo-lift	2026-07-01	-
顯示於系所單位：	機械工程學系

文件中的檔案：

檔案	大小	格式
ntu-112-2.pdf 此日期後於網路公開 2026-07-01	5.08 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。