請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/74746完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 李志中(Jyh-Jone Lee) | |
| dc.contributor.author | Jen-Wei Wang | en |
| dc.contributor.author | 王仁蔚 | zh_TW |
| dc.date.accessioned | 2021-06-17T09:06:52Z | - |
| dc.date.available | 2024-12-26 | |
| dc.date.copyright | 2019-12-26 | |
| dc.date.issued | 2019 | |
| dc.date.submitted | 2019-12-20 | |
| dc.identifier.citation | [1] S. Caldera, A. Rassau, and D. Chai, 'Review of Deep Learning Methods in Robotic Grasp Detection,' Multimodal Technologies and Interaction, vol. 2, no. 3, 2018.
[2] I. Lenz, H. Lee, and A. Saxena, 'Deep learning for detecting robotic grasps,' The International Journal of Robotics Research, vol. 34, no. 4-5, pp. 705-724, 2015. [3] J. Mahler et al., 'Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics,' Robotics: Science and Systems, March 01, 2017. [4] L. Pinto and A. Gupta, 'Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours,' in 2016 IEEE International Conference on Robotics and Automation (ICRA), 2016, pp. 3406-3413. [5] F.-J. Chu, R. Xu, P. A. Vela, and A. Letters, 'Real-world multiobject, multigrasp detection,' IEEE Robotics and Automation Letters, vol. 3, no. 4, pp. 3355-3362, 2018. [6] D. Park, Y. Seo, and S. Y. Chun, 'Real-Time, Highly Accurate Robotic Grasp Detection using Fully Convolutional Neural Networks with High-Resolution Images,' presented at the International Conference on Robotics and Automation, 2018. [7] H. Zhang, X. Lan, X. Zhou, and N. Zheng, 'Roi-based robotic grasp detection in object overlapping scenes using convolutional neural network,' presented at the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2018. [8] X. Zhou, X. Lan, H. Zhang, Z. Tian, Y. Zhang, and N. Zheng, 'Fully convolutional grasp detection network with oriented anchor box,' in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2018, pp. 7223-7230: IEEE. [9] J. Redmon and A. Farhadi, 'YOLO9000: Better, Faster, Stronger,' presented at the The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), December 01, 2016, 2016. Available: https://ui.adsabs.harvard.edu/abs/2016arXiv161208242R [10] S. Ren, K. He, R. Girshick, and J. Sun, 'Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,' presented at the The IEEE International Conference on Computer Vision (ICCV), June 01, 2015, 2015. Available: https://ui.adsabs.harvard.edu/abs/2015arXiv150601497R [11] A. Zeng et al., 'Robotic pick-and-place of novel objects in clutter with multi-affordance grasping and cross-domain image matching,' in 2018 IEEE International Conference on Robotics and Automation (ICRA), 2018, pp. 1-8: IEEE. [12] S. Levine, P. Pastor, A. Krizhevsky, J. Ibarz, and D. Quillen, 'Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection,' The International Journal of Robotics Research, vol. 37, no. 4-5, pp. 421-436, 2018. [13] K. Bousmalis et al., 'Using simulation and domain adaptation to improve efficiency of deep robotic grasping,' in 2018 IEEE International Conference on Robotics and Automation (ICRA), 2018, pp. 4243-4250: IEEE. [14] E. Jang, C. Devin, V. Vanhoucke, and S. Levine, 'Grasp2vec: Learning object representations from self-supervised grasping,' presented at the Conference on Robot Learning, 2018. [15] K. Haris, S. N. Efstratiadis, N. Maglaveras, and A. K. Katsaggelos, 'Hybrid image segmentation using watersheds and fast region merging,' IEEE Transactions on image processing, vol. 7, no. 12, pp. 1684-1699, 1998. [16] K. He, G. Gkioxari, P. Dollár, and R. Girshick, 'Mask r-cnn,' in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2961-2969. [17] P. David, D. DeMenthon, R. Duraiswami, and H. Samet, 'Simultaneous pose and correspondence determination using line features,' in 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings., 2003, vol. 2, pp. II-II: IEEE. [18] D. G. Lowe, 'Three-dimensional object recognition from single two-dimensional images,' Artificial Intelligence, vol. 31, no. 3, pp. 355-395, 1987. [19] S. Hinterstoisser et al., 'Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes,' in 2011 international conference on computer vision, 2011, pp. 858-865: IEEE. [20] P. Wohlhart and V. Lepetit, 'Learning descriptors for object recognition and 3d pose estimation,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 3109-3118. [21] M. Sundermeyer, Z.-C. Marton, M. Durner, M. Brucker, and R. Triebel, 'Implicit 3d orientation learning for 6d object detection from rgb images,' in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 699-715. [22] S. Mahendran, H. Ali, and R. Vidal, '3D pose regression using convolutional neural networks,' in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 2174-2182. [23] W. Kehl, F. Manhardt, F. Tombari, S. Ilic, and N. Navab, 'SSD-6D: Making RGB-based 3D detection and 6D pose estimation great again,' in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 1521-1529. [24] A. Bicchi and V. Kumar, 'Robotic grasping and contact: A review,' in Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No. 00CH37065), 2000, vol. 1, pp. 348-353: IEEE. [25] A. Saxena, J. Driemeyer, and A. Y. Ng, 'Robotic grasping of novel objects using vision,' The International Journal of Robotics Research, vol. 27, no. 2, pp. 157-173, 2008. [26] O. Russakovsky et al., 'ImageNet Large Scale Visual Recognition Challenge,' International Journal of Computer Vision, pp. 211-252, September 01, 2014 2014. [27] G. von Zitzewitz, Survey of neural networks in autonomous driving. 2017. [28] W. Liu et al., 'SSD: Single Shot MultiBox Detector,' presented at the European Conference on Computer Vision (ECCV), December 01, 2015, 2015. Available: https://ui.adsabs.harvard.edu/abs/2015arXiv151202325L [29] J. Long, E. Shelhamer, and T. Darrell, 'Fully Convolutional Networks for Semantic Segmentation,' presented at the The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), November 01, 2014, 2014. Available: https://ui.adsabs.harvard.edu/abs/2014arXiv1411.4038L [30] R. Girshick, J. Donahue, T. Darrell, and J. Malik, 'Rich feature hierarchies for accurate object detection and semantic segmentation,' presented at the The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), November 01, 2013, 2013. Available: https://ui.adsabs.harvard.edu/abs/2013arXiv1311.2524G [31] J. A. Suykens and J. Vandewalle, 'Least squares support vector machine classifiers,' Neural Processing Letters, vol. 9, no. 3, pp. 293-300, 1999. [32] R. Girshick, 'Fast R-CNN,' presented at the The IEEE International Conference on Computer Vision (ICCV), April 01, 2015, 2015. Available: https://ui.adsabs.harvard.edu/abs/2015arXiv150408083G [33] T.-Y. Lin et al., 'Microsoft COCO: Common Objects in Context,' presented at the European Conference on Computer Vision (ECCV), May 01, 2014, 2014. Available: https://ui.adsabs.harvard.edu/abs/2014arXiv1405.0312L [34] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, 'The pascal visual object classes (voc) challenge,' International journal of computer vision, vol. 88, no. 2, pp. 303-338, 2010. [35] A. Hore and D. Ziou, 'Image quality metrics: PSNR vs. SSIM,' in 2010 20th International Conference on Pattern Recognition, 2010, pp. 2366-2369: IEEE. [36] A. Neubeck and L. Van Gool, 'Efficient non-maximum suppression,' in 18th International Conference on Pattern Recognition (ICPR'06), 2006, vol. 3, pp. 850-855: IEEE. [37] D. Morrison, P. Corke, and J. Leitner, 'Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach,' Robotics: Science and Systems, 2018. [38] G. E. Hinton and R. S. Zemel, 'Autoencoders, minimum description length and Helmholtz free energy,' in Advances in neural information processing systems, 1994, pp. 3-10. [39] P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol, 'Extracting and composing robust features with denoising autoencoders,' in Proceedings of the 25th international conference on Machine learning, 2008, pp. 1096-1103: ACM. [40] J. Johnson, A. Alahi, and L. Fei-Fei, 'Perceptual losses for real-time style transfer and super-resolution,' in European conference on computer vision, 2016, pp. 694-711: Springer. [41] Z. Zhang and m. intelligence, 'A flexible new technique for camera calibration,' IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, 2000. [42] M. Shah and Robotics, 'Solving the robot-world/hand-eye calibration problem using the Kronecker product,' Journal of Mechanisms and Robotics, vol. 5, no. 3, p. 031007, 2013. [43] T. Hodaň, J. Matas, and Š. Obdržálek, 'On evaluation of 6D object pose estimation,' in European Conference on Computer Vision, 2016, pp. 606-619: Springer. [44] W. Kehl, F. Milletari, F. Tombari, S. Ilic, and N. Navab, 'Deep learning of local RGB-D patches for 3D object detection and 6D pose estimation,' in European Conference on Computer Vision, 2016, pp. 205-220: Springer. [45] J. Vidal, C.-Y. Lin, and R. Martí, '6D pose estimation using an improved method based on point pair features,' in 2018 4th International Conference on Control, Automation and Robotics (ICCAR), 2018, pp. 405-409: IEEE. | |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/74746 | - |
| dc.description.abstract | 本研究提出一套全新的機械臂堆疊物件夾取流程,主要藉由對堆疊物件的實例切割、姿態估測及夾取經驗轉移,提出候選夾取點,最後藉由干涉判斷及穩定度分析,篩選出最好的夾取點。
本流程之前兩個步驟,在實例切割上改進Mask R-CNN網路之資料收集流程以減少標註時間;在姿態估測上以表徵學習(Representation Learning)中之自動化編碼網路,學習到描述姿態的全域特徵向量,此方法為非監督式學習(Unsupervised Learning),因此不需姿態的標註,且解決了兩個問題:(1) 單一視角下姿態定義無法唯一的姿態歧異(Pose Ambiguity)問題;(2) 切割出的影像上有光線、遮擋……等等干擾的問題。此兩個流程皆使用訓練在平面影像之深度學習模型,對於使用一般深度影像品質的相機而言具有優勢,且能達到即時運算的功能。 除此之外,有別於直接在影像上利用深度學習模型提出候選夾取點,本研究採取先做姿態估測再做夾取經驗轉移,具有以下兩點優勢:(1) 可彈性調整與定義夾取區間,增加可行之夾取點數量;(2) 具有較好的穩健性,不易受影像上雜訊或是遮擋的影響。最後,本文利用深度影像提出干涉判斷的演算法,篩選出不會干涉鄰物的最佳夾取點。為了驗證此套流程,本文分別在自行收集的資料和開源資料T-LESS上驗證實例切割與姿態估測的模型,並且以商用RGB-D相機Kinect V2與六軸機械臂建立一套手眼機器人系統,實際上機驗證整套流程的夾取成功率和運算效率。最終在水五金的堆疊上,夾取成功率約為94%,實例切割與姿態估測的運算時間分別為56 msec和2 msec。展示本研究夾取流程的影片提供在https://www.youtube.com/watch?v=wc0pZV6NNFs&feature=youtu.be | zh_TW |
| dc.description.abstract | A novel pipeline for robotic cluttered objects grasping was proposed. In the pipeline, the instance segmentation, pose estimation and grasping experience transferring were developed to obtain feasible grasping candidates and subsequently an algorithm for collision avoidance and stability analysis was established to achieve the optimal grasping point for robot grasping.
In the phase of instance segmentation, the process of data collecting and labeling in Mask R-CNN was improved to reduce the labeling time. In the phase of pose estimation, the auto-encoder, a representation-learning method, was established to learn the global feature vector of object pose. Auto-encoder is an unsupervised-learning method, so it is unnecessary to label object pose. Additionally, this approach solves two problems: (1) definition of object pose is not unique under single view, also known as pose ambiguity, (2) noise on segmented image including light, occlusion…etc. Both of these two processes were developed based on the deep-learning network using RGB images, having the advantages of using inexpensive depth camera and real time processing. Instead of finding grasping candidates directly from deep learning model, the strategy of finding object pose at first and then transferring grasping experience is used. There are two advantages in this strategy as (1) flexibility of adjusting and redefining grasping area such that more feasible grasping points can be obtained, (2) robustness under noise or occlusion on image. In the last phase, an algorithm of collision avoidance was established by using the depth images such that an optimal grasping point without interfering with adjacent objects was proposed. The instance segmentation and pose estimation methods were evaluated on the self-collecting dataset and T-LESS open dataset. An eye-to-hand six-axis robot with Kinect V2 RGB-D camera is used to evaluate the success rate for grasping and the efficiency of the overall pipeline. The grasping results on cluttered metal parts show that the success rate is about 94% and the running time of instance segmentation and pose estimation are respectively 56 msec and 2 msec. To demonstrate our grasping pipeline, a video is provided at https://www.youtube.com/watch?v=wc0pZV6NNFs&feature=youtu.be | en |
| dc.description.provenance | Made available in DSpace on 2021-06-17T09:06:52Z (GMT). No. of bitstreams: 1 ntu-108-R06522620-1.pdf: 4325362 bytes, checksum: dfe105b86bc26bcdd5759996b9ff0189 (MD5) Previous issue date: 2019 | en |
| dc.description.tableofcontents | 誌謝 i
摘要 ii Abstract iii 圖目錄 vii 表目錄 x 第 1 章 緒論 1 1.1 前言 1 1.2 文獻回顧 1 1.2.1 未知物件夾取:不做實例切割 1 1.2.2 已知物夾取:先做實例切割 6 1.3 研究動機與貢獻 9 1.4 論文架構 10 第 2 章 實例切割 12 2.1 實例切割簡介 12 2.2 Mask R-CNN簡介 14 2.2.1 Faster R-CNN 15 2.2.2 Mask R-CNN 17 2.3 資料收集與標註 18 2.4 模型訓練 22 2.5 模型預測 24 2.5.1 階段一:RPN網路輸出ROI 24 2.5.2 階段二:經由ROI Allign後輸出最終結果 25 第 3 章 姿態估測與夾取偵測 26 3.1 姿態估測與夾取偵測的關係 26 3.2 姿態估測方法 27 3.2.1 姿態估測中的問題 28 3.2.2 自動化編碼 30 3.2.3 降噪自動化編碼 31 3.2.4 損失函數的定義 31 3.2.5 資料收集與預處理 33 3.2.6 模型結構和訓練細節 34 3.2.7 視角資料庫建立與視角估測 35 3.3 夾取偵測 36 3.3.1 夾取區間定義與資料標註 36 3.3.2 夾取經驗轉移 37 3.3.3 最終夾取點選擇 38 第 4 章 系統與驗證 41 4.1 系統 41 4.1.1 系統環境 41 4.1.2 系統架設 42 4.2 指標定義 51 4.2.1 平均精準度(Average Precision, AP) 51 4.2.2 VSD (Visible Surface Discrepancy) 52 4.3 實例切割驗證 53 4.4 姿態估測驗證 56 4.4.1 驗證條件 56 4.4.2 模型結構探討 (Ablation study) 58 4.4.3 不同算法間的比較 60 4.5 系統上機驗證 61 第 5 章 結論與未來展望 63 5.1 結論 63 5.2 未來展望 64 參考文獻 66 | |
| dc.language.iso | zh-TW | |
| dc.subject | 堆疊夾取 | zh_TW |
| dc.subject | 實例切割 | zh_TW |
| dc.subject | 姿態估測 | zh_TW |
| dc.subject | 表徵學習 | zh_TW |
| dc.subject | 干涉偵測 | zh_TW |
| dc.subject | 自動化編碼 | zh_TW |
| dc.subject | Representation Learning | en |
| dc.subject | Auto-encoder | en |
| dc.subject | Collision detection | en |
| dc.subject | Clutter Grasping | en |
| dc.subject | Instance Segmentation | en |
| dc.subject | Pose Estimation | en |
| dc.title | 以實例切割與表徵學習應用於機械臂夾取堆疊物件 | zh_TW |
| dc.title | Robot Grasping in Clutter using Instance Segmentation and Representation Learning | en |
| dc.type | Thesis | |
| dc.date.schoolyear | 108-1 | |
| dc.description.degree | 碩士 | |
| dc.contributor.oralexamcommittee | 陳亮嘉(Liang-Chia Chen),黃漢邦(Han-Pang Huang) | |
| dc.subject.keyword | 堆疊夾取,實例切割,姿態估測,表徵學習,干涉偵測,自動化編碼, | zh_TW |
| dc.subject.keyword | Clutter Grasping,Instance Segmentation,Pose Estimation,Representation Learning,Collision detection,Auto-encoder, | en |
| dc.relation.page | 68 | |
| dc.identifier.doi | 10.6342/NTU201904411 | |
| dc.rights.note | 有償授權 | |
| dc.date.accepted | 2019-12-23 | |
| dc.contributor.author-college | 工學院 | zh_TW |
| dc.contributor.author-dept | 機械工程學研究所 | zh_TW |
| 顯示於系所單位: | 機械工程學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-108-1.pdf 未授權公開取用 | 4.22 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
