增強式自動編碼網路應用於隨機堆疊物件之分類夾取

Jian-Lun Chen; 陳健倫

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/8554

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	李志中(Jyh-Jone Lee)
dc.contributor.author	Jian-Lun Chen	en
dc.contributor.author	陳健倫	zh_TW
dc.date.accessioned	2021-05-20T00:57:21Z	-
dc.date.available	2025-02-01
dc.date.available	2021-05-20T00:57:21Z	-
dc.date.copyright	2021-02-20
dc.date.issued	2021
dc.date.submitted	2021-01-28
dc.identifier.citation	[1] N. Correll, K. E. Bekris, D. Berenson, O. Brock, A. Causo, K. Hauser, K. Okada, A. Rodriguez, J. M. Romano, and P. R. Wurman, 'Analysis and observations from the first amazon picking challenge,' IEEE Transactions on Automation Science and Engineering, vol. 15, no. 1, pp. 172-188, 2016. [2] G. Du, K. Wang, and S. Lian, 'Vision-based robotic grasping from object localization, pose estimation, grasp detection to motion planning: A review,' arXiv preprint arXiv:1905.06658, 2019. [3] 蔡承翰、洪國峰, '模擬器自動圖像生成及自動標註：以深度學習結合機器人隨機堆疊取料為例,' in 機械工業雜誌第449期, July 2020, pp. 28-35. [4] N. Liu, J. Han, and M.-H. Yang, 'Picanet: Learning pixel-wise contextual attention for saliency detection,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 3089-3098. [5] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, 'You only look once: Unified, real-time object detection,' in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779-788. [6] K. He, G. Gkioxari, P. Dollár, and R. Girshick, 'Mask r-cnn,' in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2961-2969. [7] D. G. Lowe, 'Distinctive image features from scale-invariant keypoints,' International journal of computer vision, vol. 60, no. 2, pp. 91-110, 2004. [8] H. Bay, T. Tuytelaars, and L. Van Gool, 'Surf: Speeded up robust features,' in European conference on computer vision, 2006: Springer, pp. 404-417. [9] R. B. Rusu, N. Blodow, and M. Beetz, 'Fast point feature histograms (FPFH) for 3D registration,' in 2009 IEEE international conference on robotics and automation, 2009: IEEE, pp. 3212-3217. [10] A. Aldoma, M. Vincze, N. Blodow, D. Gossow, S. Gedikli, R. B. Rusu, and G. Bradski, 'CAD-model recognition and 6DOF pose estimation using 3D cues,' in 2011 IEEE international conference on computer vision workshops (ICCV workshops), 2011: IEEE, pp. 585-592. [11] S. Zakharov, I. Shugurov, and S. Ilic, 'Dpod: 6d pose object detector and refiner,' in Proceedings of the IEEE International Conference on Computer Vision, 2019, pp. 1941-1950. [12] M. Sundermeyer, Z.-C. Marton, M. Durner, M. Brucker, and R. Triebel, 'Implicit 3d orientation learning for 6d object detection from rgb images,' in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 699-715. [13] J. Vidal, C.-Y. Lin, X. Lladó, and R. Martí, 'A method for 6D pose estimation of free-form rigid objects using point pair features on range data,' Sensors, vol. 18, no. 8, p. 2678, 2018. [14] C. Wang, D. Xu, Y. Zhu, R. Martín-Martín, C. Lu, L. Fei-Fei, and S. Savarese, 'Densefusion: 6d object pose estimation by iterative dense fusion,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 3343-3352. [15] S. Caldera, A. Rassau, and D. Chai, 'Review of deep learning methods in robotic grasp detection,' Multimodal Technologies and Interaction, vol. 2, no. 3, p. 57, 2018. [16] I. Lenz, H. Lee, and A. Saxena, 'Deep learning for detecting robotic grasps,' The International Journal of Robotics Research, vol. 34, no. 4-5, pp. 705-724, 2015. [17] 'Cornell Grasp Dataset.' http://pr.cs.cornell.edu/grasping/rect_data/data.php (accessed December, 4, 2020). [18] J. Redmon and A. Angelova, 'Real-time grasp detection using convolutional neural networks,' in 2015 IEEE International Conference on Robotics and Automation (ICRA), 2015: IEEE, pp. 1316-1322. [19] A. ten Pas, M. Gualtieri, K. Saenko, and R. Platt, 'Grasp pose detection in point clouds,' The International Journal of Robotics Research, vol. 36, no. 13-14, pp. 1455-1473, 2017. [20] F.-J. Chu, R. Xu, and P. A. Vela, 'Real-world multiobject, multigrasp detection,' IEEE Robotics and Automation Letters, vol. 3, no. 4, pp. 3355-3362, 2018. [21] Q. Lu, K. Chenna, B. Sundaralingam, and T. Hermans, 'Planning multi-fingered grasps as probabilistic inference in a learned deep network,' in Robotics Research: Springer, 2020, pp. 455-472. [22] S. Kumra and C. Kanan, 'Robotic grasp detection using deep convolutional neural networks,' in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2017: IEEE, pp. 769-776. [23] K. He, X. Zhang, S. Ren, and J. Sun, 'Deep residual learning for image recognition,' in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778. [24] L. Pinto and A. Gupta, 'Supersizing self-supervision: Learning to grasp from 50k tries and 700 robot hours,' in 2016 IEEE international conference on robotics and automation (ICRA), 2016: IEEE, pp. 3406-3413. [25] A. Krizhevsky, I. Sutskever, and G. E. Hinton, 'Imagenet classification with deep convolutional neural networks,' Communications of the ACM, vol. 60, no. 6, pp. 84-90, 2017. [26] D. Morrison, P. Corke, and J. Leitner, 'Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach,' arXiv preprint arXiv:1804.05172, 2018. [27] D. Morrison, P. Corke, and J. Leitner, 'Learning robust, real-time, reactive robotic grasping,' The International Journal of Robotics Research, vol. 39, no. 2-3, pp. 183-201, 2020. [28] D. Morrison, P. Corke, and J. Leitner, 'Multi-view picking: Next-best-view reaching for improved grasping in clutter,' in 2019 International Conference on Robotics and Automation (ICRA), 2019: IEEE, pp. 8762-8768. [29] S. Kumra, S. Joshi, and F. Sahin, 'Antipodal Robotic Grasping using Generative Residual Convolutional Neural Network,' arXiv preprint arXiv:1909.04810, 2019. [30] J. Mahler, J. Liang, S. Niyaz, M. Laskey, R. Doan, X. Liu, J. A. Ojea, and K. Goldberg, 'Dex-net 2.0: Deep learning to plan robust grasps with synthetic point clouds and analytic grasp metrics,' arXiv preprint arXiv:1703.09312, 2017. [31] K. Bousmalis, A. Irpan, P. Wohlhart, Y. Bai, M. Kelcey, M. Kalakrishnan, L. Downs, J. Ibarz, P. Pastor, and K. Konolige, 'Using simulation and domain adaptation to improve efficiency of deep robotic grasping,' in 2018 IEEE international conference on robotics and automation (ICRA), 2018: IEEE, pp. 4243-4250. [32] Y. Ganin and V. Lempitsky, 'Unsupervised domain adaptation by backpropagation,' in International conference on machine learning, 2015: PMLR, pp. 1180-1189. [33] A. Zeng, S. Song, K.-T. Yu, E. Donlon, F. R. Hogan, M. Bauza, D. Ma, O. Taylor, M. Liu, and E. Romo, 'Robotic pick-and-place of novel objects in clutter with multi-affordance grasping and cross-domain image matching,' in 2018 IEEE international conference on robotics and automation (ICRA), 2018: IEEE, pp. 1-8. [34] 王仁蔚, '以實例切割與表徵學習應用於機械臂夾取堆疊物件,' 碩士論文, 機械工程學研究所, 國立臺灣大學, 2019. [35] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, P.-A. Manzagol, and L. Bottou, 'Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion,' Journal of machine learning research, vol. 11, no. 12, 2010. [36] P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol, 'Extracting and composing robust features with denoising autoencoders,' in Proceedings of the 25th international conference on Machine learning, 2008, pp. 1096-1103. [37] 李佳蓮, '以實例切割與夾取點生成卷積類神經網路應用於隨機堆疊物件之分類夾取,' 碩士論文, 機械工程學研究所, 國立臺灣大學, 2020. [38] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, 'Microsoft coco: Common objects in context,' in European conference on computer vision, 2014: Springer, pp. 740-755. [39] Z.-Q. Zhao, P. Zheng, S.-t. Xu, and X. Wu, 'Object detection with deep learning: A review,' IEEE transactions on neural networks and learning systems, vol. 30, no. 11, pp. 3212-3232, 2019. [40] S. Ghosh, N. Das, I. Das, and U. Maulik, 'Understanding deep learning techniques for image segmentation,' ACM Computing Surveys (CSUR), vol. 52, no. 4, pp. 1-35, 2019. [41] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, and M. Bernstein, 'Imagenet large scale visual recognition challenge,' International journal of computer vision, vol. 115, no. 3, pp. 211-252, 2015. [42] 'The CIFAR-10 dataset.' https://www.cs.toronto.edu/~kriz/cifar.html (accessed December, 29, 2020). [43] K. Simonyan and A. Zisserman, 'Very deep convolutional networks for large-scale image recognition,' arXiv preprint arXiv:1409.1556, 2014. [44] A. Garcia-Garcia, S. Orts-Escolano, S. Oprea, V. Villena-Martinez, and J. Garcia-Rodriguez, 'A review on deep learning techniques applied to semantic segmentation,' arXiv preprint arXiv:1704.06857, 2017. [45] S. Ren, K. He, R. Girshick, and J. Sun, 'Faster r-cnn: Towards real-time object detection with region proposal networks,' in Advances in neural information processing systems, 2015, pp. 91-99. [46] V. Badrinarayanan, A. Kendall, and R. Cipolla, 'Segnet: A deep convolutional encoder-decoder architecture for image segmentation,' IEEE transactions on pattern analysis and machine intelligence, vol. 39, no. 12, pp. 2481-2495, 2017. [47] 'Detectron.' https://github.com/facebookresearch/Detectron (accessed December, 27, 2020). [48] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, 'Feature pyramid networks for object detection,' in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2117-2125. [49] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, 'The pascal visual object classes (voc) challenge,' International journal of computer vision, vol. 88, no. 2, pp. 303-338, 2010. [50] A. Neubeck and L. Van Gool, 'Efficient non-maximum suppression,' in 18th International Conference on Pattern Recognition (ICPR'06), 2006, vol. 3: IEEE, pp. 850-855. [51] T. Hodan, F. Michel, E. Brachmann, W. Kehl, A. GlentBuch, D. Kraft, B. Drost, J. Vidal, S. Ihrke, and X. Zabulis, 'Bop: Benchmark for 6d object pose estimation,' in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 19-34. [52] S. Hinterstoisser, V. Lepetit, S. Ilic, S. Holzer, G. Bradski, K. Konolige, and N. Navab, 'Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes,' in Asian conference on computer vision, 2012: Springer, pp. 548-562. [53] E. Brachmann, A. Krull, F. Michel, S. Gumhold, J. Shotton, and C. Rother, 'Learning 6d object pose estimation using 3d object coordinates,' in European conference on computer vision, 2014: Springer, pp. 536-551. [54] A. Tejani, D. Tang, R. Kouskouridas, and T.-K. Kim, 'Latent-class hough forests for 3d object detection and pose estimation,' in European Conference on Computer Vision, 2014: Springer, pp. 462-477. [55] A. Doumanoglou, R. Kouskouridas, S. Malassiotis, and T.-K. Kim, 'Recovering 6D object pose and predicting next-best-view in the crowd,' in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 3583-3592. [56] T. Hodan, P. Haluza, Š. Obdržálek, J. Matas, M. Lourakis, and X. Zabulis, 'T-LESS: An RGB-D dataset for 6D pose estimation of texture-less objects,' in 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), 2017: IEEE, pp. 880-888. [57] C. Rennie, R. Shome, K. E. Bekris, and A. F. De Souza, 'A dataset for improved rgbd-based object detection and pose estimation for warehouse pick-and-place,' IEEE Robotics and Automation Letters, vol. 1, no. 2, pp. 1179-1185, 2016. [58] 'BOP Challenge 2019.' https://bop.felk.cvut.cz/home/ (accessed December, 27, 2020). [59] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, 'Learning internal representations by error propagation,' California Univ San Diego La Jolla Inst for Cognitive Science, 1985. [60] 'The PASCAL Visual Object Classes.' http://host.robots.ox.ac.uk/pascal/VOC/ (accessed December, 29, 2020). [61] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, 'Ssd: Single shot multibox detector,' in European conference on computer vision, 2016: Springer, pp. 21-37. [62] Z. Wu, C. Shen, and A. v. d. Hengel, 'Bridging category-level and instance-level semantic image segmentation,' arXiv preprint arXiv:1605.06885, 2016. [63] D. P. Kingma and J. Ba, 'Adam: A method for stochastic optimization,' arXiv preprint arXiv:1412.6980, 2014. [64] X. Glorot and Y. Bengio, 'Understanding the difficulty of training deep feedforward neural networks,' in Proceedings of the thirteenth international conference on artificial intelligence and statistics, 2010, pp. 249-256.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/8554	-
dc.description.abstract	本文提出一套改善的隨機堆疊物件分類之夾取流程，藉由物件切割、姿態估測以及夾取點判定，完成在隨機堆疊場景當中夾取與擺放物件的任務。整個流程首先使用深度感知相機獲得隨機堆疊場景，接著將平面影像搭配物件切割系統把場景中的目標物件逐一取出，將物件從堆疊場景取出後便放入姿態估測模型做姿態的預測，同時找尋該姿態對應的夾取資訊，最後搭配深度影像經過夾取點的判定完成本文的夾取流程。本文提出並改善的地方分別為針對目標物件以虛擬相機自動獲得物件各個視角的照片成為姿態資料集，節省對現實資料處理的時間成本；以領域隨機化(Domain Randomization)的方式訓練自動編碼網路(Autoencoder)成為增強式自動編碼網路(Augmented Autoencoder)，避免虛擬與現實環境產生的領域差異(Domain Gap)並作為姿態估測系統；將對應的姿態經過修正的夾取經驗轉移預測夾取資訊，再經過干涉篩選與穩定度的排序，獲得最終的夾取點預測。為了驗證本夾取流程的效果，本文架設實驗環境與機械手臂系統實際運行夾取流程，統計不同流程的夾取成功率與速度並探討本流程的特色與優點。最終本文的夾取流程針對兩樣金屬物件在隨機堆疊場景當中的夾取成功率為89.285%，整體運算的時間為1.128秒。	zh_TW
dc.description.abstract	A process is proposed to improve the robotic grasping and classification system in this thesis, in which the system first uses instance segmentation technique to segment the image of the object in clutter, then proceeded by pose estimation and finally applies collision detection process to output the optimal grasping position for the robot. This thesis focuses on the establishment of the pose estimation process by using the augmented autoencoder which uses a virtual camera to automatically crop the poses of the target object for dataset and contains domain randomization to avoid domain gap between real and synthetic data. In order to verify the effectiveness of the process, a robotic system is set up to perform the random bin picking. It is shown that the success rate of grasping two metal objects in clutter can be up to 89.285% and the computation time is 1.128 seconds.	en
dc.description.provenance	Made available in DSpace on 2021-05-20T00:57:21Z (GMT). No. of bitstreams: 1 U0001-2201202120220500.pdf: 8402215 bytes, checksum: 8af91fa03ab30dc7296b7cb4bd3e261d (MD5) Previous issue date: 2021	en
dc.description.tableofcontents	誌謝 i 摘要 ii ABSTRACT iii 目錄 iv 圖目錄 vii 表目錄 xii 第一章緒論 1 1.1 前言 1 1.2 文獻回顧 2 1.2.1 物件偵測 3 1.2.2 物件姿態估測 4 1.2.3 夾取點估測 6 1.2.4 夾取系統 9 1.3 研究動機與目的 14 1.4 論文架構 16 第二章物件切割 17 2.1 前言 17 2.2 場景認知 17 2.2.1 物件偵測 18 2.2.2 影像切割 20 2.3 Mask R-CNN 22 2.3.1 特徵擷取網路 22 2.3.2 區域提案網路 24 2.3.3 RoIAlign 25 2.3.4 遮罩、邊界框與類別預測 25 2.4 資料集 26 2.4.1 資料收集 27 2.4.2 資料標註 27 2.4.3 資料增強 28 2.5 模型訓練 29 2.5.1 模型參數 29 2.5.2 損失函數 30 2.5.3 訓練參數 30 2.6 模型預測 31 2.6.1 特徵擷取網路預測 31 2.6.2 區域提案網路預測 31 2.6.3 物件切割結果 32 第三章姿態估測與夾取點判定 33 3.1 前言 33 3.2 姿態估測評比 33 3.2.1 BOP基準簡介 34 3.2.2 BOP基準資料集 34 3.2.3 BOP基準評比結果 35 3.3 增強式自動編碼網路 36 3.3.1 自動編碼網路 36 3.3.2 領域隨機化 38 3.3.3 姿態資料庫 40 3.3.4 模型貢獻統整 41 3.4 姿態估測 43 3.4.1 資料集建立 44 3.4.2 模型訓練 45 3.4.3 姿態資料庫建立 46 3.4.4 模型驗證 46 3.4.5 模型預測 47 3.4.6 模型比較 47 3.5 夾取點判定 48 3.5.1 夾取經驗轉移 48 3.5.2 夾取候選擴增 50 3.5.3 夾取干涉篩選 50 3.5.4 最終夾取判定 52 第四章系統與驗證 53 4.1 前言 53 4.2 系統架構 53 4.2.1 控制系統 54 4.2.2 實驗環境 55 4.2.3 夾取流程 56 4.3 流程驗證 57 4.3.1 實驗架設與流程 57 4.3.2 夾取結果與討論 58 第五章結論與未來展望 62 5.1 結論 62 5.2 未來展望 63 參考文獻 64
dc.language.iso	zh-TW
dc.title	增強式自動編碼網路應用於隨機堆疊物件之分類夾取	zh_TW
dc.title	Robotic Random Bin Picking and Classification System Using Augmented Autoencoder	en
dc.type	Thesis
dc.date.schoolyear	109-1
dc.description.degree	碩士
dc.contributor.oralexamcommittee	陳亮嘉(Liang-Chia Chen),林峻永(Chun-Yeon Lin)
dc.subject.keyword	堆疊夾取,實例切割,姿態估測,自動編碼網路,領域隨機化,	zh_TW
dc.subject.keyword	Clutter Grasping,Instance Segmentation,Pose Estimation,Autoencoder,Domain Randomization,	en
dc.relation.page	71
dc.identifier.doi	10.6342/NTU202100129
dc.rights.note	同意授權(全球公開)
dc.date.accepted	2021-01-29
dc.contributor.author-college	工學院	zh_TW
dc.contributor.author-dept	機械工程學研究所	zh_TW
dc.date.embargo-lift	2025-02-01	-
顯示於系所單位：	機械工程學系

文件中的檔案：

檔案	大小	格式
U0001-2201202120220500.pdf 此日期後於網路公開 2025-02-01	8.21 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。