以實例切割與夾取點生成卷積類神經網路應用於隨機堆疊物件之分類夾取

Chia-Lien Li; 李佳蓮

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/8352

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	李志中(Jyh-Jone Lee)
dc.contributor.author	Chia-Lien Li	en
dc.contributor.author	李佳蓮	zh_TW
dc.date.accessioned	2021-05-20T00:52:35Z	-
dc.date.available	2025-08-14
dc.date.available	2021-05-20T00:52:35Z	-
dc.date.copyright	2020-09-22
dc.date.issued	2020
dc.date.submitted	2020-08-15
dc.identifier.citation	[1] Cornell University. 'Cornell grasping dataset.' http://pr.cs.cornell.edu/grasping/rect_data/data.php (accessed. [2] H. Karaoguz and P. Jensfelt, 'Object Detection Approach for Robot Grasp Detection,' 2019 International Conference on Robotics and Automation (ICRA), pp. 4953-4959, 2019, doi: 10.1109/ICRA.2019.8793751. [3] J. Ma, W. Shao, H. Ye, L. Wang, H. Wang, Y. Zheng, and X. Xue, 'Arbitrary-Oriented Scene Text Detection via Rotation Proposals,' IEEE transactions on multimedia., vol. 20, no. 11, pp. 3111-3122, 2018, doi: 10.1109/TMM.2018.2818020. [4] D. Morrison, P. Corke, and J. Leitner, 'Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach,' arXiv preprint arXiv:1804.05172, 2018. [5] H. Liang, X. Ma, S. Li, M. Görner, S. Tang, B. Fang, F. Sun, and J. Zhang, 'Pointnetgpd: Detecting grasp configurations from point sets,' in 2019 International Conference on Robotics and Automation (ICRA), 2019: IEEE, pp. 3629-3635. [6] C. R. Qi, H. Su, K. Mo, and L. J. Guibas, 'Pointnet: Deep learning on point sets for 3d classification and segmentation,' in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 652-660. [7] B. Calli, A. Singh, A. Walsman, S. Srinivasa, P. Abbeel, and A. M. Dollar, 'The ycb object and model set: Towards common benchmarks for manipulation research,' in 2015 international conference on advanced robotics (ICAR), 2015: IEEE, pp. 510-517. [8] A. ten Pas and R. Platt, 'Using geometry to detect grasp poses in 3d point clouds,' in Robotics Research: Springer, 2018, pp. 307-324. [9] C. Cortes and V. Vapnik, 'Support-vector networks,' Machine learning, vol. 20, no. 3, pp. 273-297, 1995. [10] L. Pinto and A. Gupta, 'Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours,' in 2016 IEEE International Conference on Robotics and Automation (ICRA), 16-21 May 2016 2016, pp. 3406-3413, doi: 10.1109/ICRA.2016.7487517. [11] A. Zeng, S. Song, K.-T. Yu, E. Donlon, F. R. Hogan, M. Bauza, D. Ma, O. Taylor, M. Liu, and E. Romo, 'Robotic pick-and-place of novel objects in clutter with multi-affordance grasping and cross-domain image matching,' in 2018 IEEE international conference on robotics and automation (ICRA), 2018: IEEE, pp. 1-8. [12] J. Long, E. Shelhamer, and T. Darrell, 'Fully convolutional networks for semantic segmentation,' in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3431-3440. [13] 王仁蔚, '以實例切割與表徵學習應用於機械臂夾取堆疊物件,' in 機械工程學系, ed: 國立台灣大學, 2019. [14] P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol, 'Extracting and composing robust features with denoising autoencoders,' in Proceedings of the 25th international conference on Machine learning, 2008, pp. 1096-1103. [15] J. Mahler, J. Liang, S. Niyaz, M. Laskey, R. Doan, X. Liu, J. A. Ojea, and K. Goldberg, 'Dex-net 2.0: Deep learning to plan robust grasps with synthetic point clouds and analytic grasp metrics,' arXiv preprint arXiv:1703.09312, 2017. [16] K. Bousmalis, A. Irpan, P. Wohlhart, Y. Bai, M. Kelcey, M. Kalakrishnan, L. Downs, J. Ibarz, P. Pastor, and K. Konolige, 'Using simulation and domain adaptation to improve efficiency of deep robotic grasping,' in 2018 IEEE International Conference on Robotics and Automation (ICRA), 2018: IEEE, pp. 4243-4250. [17] C. L. Alan L. Yuille. 'Limitations of Deep Learning for Vision, and How We Might Fix Them.' https://thegradient.pub/the-limitations-of-visual-deep-learning-and-how-we-might-fix-them/ (accessed 2020). [18] K. He, G. Gkioxari, P. Dollár, and R. Girshick, 'Mask r-cnn,' in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2961-2969. [19] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, 'Microsoft coco: Common objects in context,' in European conference on computer vision, 2014: Springer, pp. 740-755. [20] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, 'Imagenet: A large-scale hierarchical image database,' in 2009 IEEE conference on computer vision and pattern recognition, 2009: Ieee, pp. 248-255. [21] A. Krizhevsky and G. Hinton, 'Learning multiple layers of features from tiny images,' 2009. [22] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, 'You only look once: Unified, real-time object detection,' in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779-788. [23] S. Ren, K. He, R. Girshick, and J. Sun, 'Faster r-cnn: Towards real-time object detection with region proposal networks,' in Advances in neural information processing systems, 2015, pp. 91-99. [24] D. Bolya, C. Zhou, F. Xiao, and Y. J. Lee, 'YOLACT: real-time instance segmentation,' in Proceedings of the IEEE International Conference on Computer Vision, 2019, pp. 9157-9166. [25] K. He, X. Zhang, S. Ren, and J. Sun, 'Deep residual learning for image recognition,' in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778. [26] R. S. Zimmermann and J. N. Siems, 'Faster training of Mask R-CNN by focusing on instance boundaries,' Computer Vision and Image Understanding, vol. 188, p. 102795, 2019. [27] R. Girshick, 'Fast r-cnn,' in Proceedings of the IEEE international conference on computer vision, 2015, pp. 1440-1448. [28] M. Danielczuk, M. Matl, S. Gupta, A. Li, A. Lee, J. Mahler, and K. Goldberg, 'Segmenting unknown 3d objects from real depth images using mask r-cnn trained on synthetic data,' in 2019 International Conference on Robotics and Automation (ICRA), 2019: IEEE, pp. 7283-7290.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/8352	-
dc.description.abstract	本研究針對堆疊物件提出一套模組化的分類夾取流程，使用 RGB-D 相機取得物件堆疊的平面以及深度影像，經過實例切割模型(Mask-RCNN)及夾取點生成卷積類神經網路(Generative Grasping Convolutional Neural Network, GG-CNN)，找出該堆疊中的多個夾取點，最後將所有物件的夾取點彙整至堆疊中，根據深度資訊篩選出不會與鄰物干涉的夾取點，並令機器手臂前往夾取。在最初的分割步驟中，本研究選擇Mask R-CNN 對堆疊影像進行實例切割(Instance Segmentation)，將物件從堆疊中逐一分離，取得堆疊中物件的位置以及類別資訊，並加入邊緣損失以取得更精確的邊緣輪廓。第二步驟使用 GG-CNN 對單一物件的深度資訊生成像素級(Pixelwise)的夾取穩定度評分，此模型對於未知物件仍有預測夾取點的能力，因此在增加新的目標物件時，不需再更新此步驟的模型參數。在第三步驟中透過深度影像，結合第一步驟的物件的位置資訊，以及第二步驟的夾取穩定度評分，剔除可能碰撞夾取點，並依據穩定度排序，即為本流程的最後輸出結果。最後，本研究並以一機器臂系統驗證此一流程之可行性，其夾取成功率可達84.3%。	zh_TW
dc.description.abstract	This thesis presents a robotic grasping and classification system for objects in cluttered environments. The system consists of three main parts: (i)instance segmentation, (ii)grasping candidates generation, and (iii)collision avoidance. In the first part, the instance segmentation model, Mask R-CNN, isolates each cluttered object from the scene and is improved to obtain an accurate mask edge. In the second part, Generative Grasping Convolutional Neural Network (GG-CNN) predicts the quality and grasps for every object, which is segmented in the first part. After that, the grasping candidates would be sampled from the pixel-wise prediction of GG-CNN. In the last part, the algorithm selects collision-free grasps from the grasping candidates based on depth information. Finally, a robotic system is presented to illustrate the effectiveness of the process. It is shown that an 84.3% successful rate of grasp can be achieved.	en
dc.description.provenance	Made available in DSpace on 2021-05-20T00:52:35Z (GMT). No. of bitstreams: 1 U0001-3007202021210900.pdf: 4351030 bytes, checksum: 196617dda2657fa7aff0c74e21371c44 (MD5) Previous issue date: 2020	en
dc.description.tableofcontents	口試委員會審定書 i 誌謝 ii 摘要 iii ABSTRACT iv 目錄 v 圖目錄 viii 表目錄 xi 第一章前言 1 1-1背景 1 1-2文獻回顧 2 1-2-1 針對單一物件之夾取點預測 2 1-2-2 針對堆疊物件之夾取點預測 5 1-2-3 虛擬環境之夾取 8 1-3研究目的 11 1-4本文架構 12 第二章物件分割 13 2-1 場景認知 (Scene understanding) 13 2-1-1 影像分類 14 2-1-2 物件定位 14 2-1-3 語意切割 15 2-1-4 實例切割 16 2-1-5 堆疊物件之場景認知 17 2-2 Mask R-CNN 18 2-2-1 特徵擷取 18 2-2-2 區域提案網路 21 2-2-3 RoIAlign 21 2-2-4 遮罩預測分支 22 2-2-5 邊界框預測分支 23 2-3 邊緣損失(Edge Agreement Loss) 24 2-4 訓練資料 26 2-4-1 資料收集 26 2-4-2 標註工具 26 2-4-3 資料標註 27 2-5 模型訓練 28 2-5-1 損失函數 28 2-5-2 預訓練權重 30 2-5-3 資料增強 30 2-5-3 超參數調整 31 2-6 模型預測結果 33 2-6-1 特徵擷取 33 2-6-2 區域提案網路之預測 34 2-6-3 遮罩之預測 35 2-6-4 實例切割結果 36 第三章夾取點生成 37 3-1 夾取點生成卷積類神經網路 37 3-1-1 夾取點之定義 37 3-1-2 模型架構說明 39 3-2 模型訓練 40 3-2-1訓練資料 40 3-2-2 損失函數 42 3-3 模型預測結果 44 3-3-1 Cornell夾取資料集 44 3-3-2 切割之物件 45 3-4 候選夾取點 47 3-5 擴增候選夾取點 48 3-6 夾取點干涉判斷 49 3-7最終夾取點選擇 52 第四章系統與驗證 54 4-1 系統說明 54 4-1-1 系統架構 54 4-1-2 實驗環境 55 4-2 夾取流程驗證 57 4-2-1 實驗流程 57 4-2-2 夾取結果與成功率 58 4-2-3 夾取流程運算時間 58 4-3 邊緣損失驗證 59 第五章結論與未來展望 60 5-1 結論 60 5-2 未來展望 60 參考文獻 62
dc.language.iso	zh-TW
dc.title	以實例切割與夾取點生成卷積類神經網路應用於隨機堆疊物件之分類夾取	zh_TW
dc.title	Robotic Random Bin Picking and Classification System using Instance Segmentation and Generative Grasping Convolutional Neural Network	en
dc.type	Thesis
dc.date.schoolyear	108-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	陳亮嘉(Liang-Chia Chen),林沛群(Pei-Chun Lin)
dc.subject.keyword	機械手臂,堆疊夾取,實例切割,深度學習,	zh_TW
dc.subject.keyword	Robotic Arm,Clutter Grasping,Instance Segmentation,Deep Learning,	en
dc.relation.page	64
dc.identifier.doi	10.6342/NTU202002128
dc.rights.note	同意授權(全球公開)
dc.date.accepted	2020-08-17
dc.contributor.author-college	工學院	zh_TW
dc.contributor.author-dept	機械工程學研究所	zh_TW
dc.date.embargo-lift	2025-08-14	-
顯示於系所單位：	機械工程學系

文件中的檔案：

檔案	大小	格式
U0001-3007202021210900.pdf 此日期後於網路公開 2025-08-14	4.25 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。