Please use this identifier to cite or link to this item:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/74746
Title: | 以實例切割與表徵學習應用於機械臂夾取堆疊物件 Robot Grasping in Clutter using Instance Segmentation and Representation Learning |
Authors: | Jen-Wei Wang 王仁蔚 |
Advisor: | 李志中(Jyh-Jone Lee) |
Keyword: | 堆疊夾取,實例切割,姿態估測,表徵學習,干涉偵測,自動化編碼, Clutter Grasping,Instance Segmentation,Pose Estimation,Representation Learning,Collision detection,Auto-encoder, |
Publication Year : | 2019 |
Degree: | 碩士 |
Abstract: | 本研究提出一套全新的機械臂堆疊物件夾取流程,主要藉由對堆疊物件的實例切割、姿態估測及夾取經驗轉移,提出候選夾取點,最後藉由干涉判斷及穩定度分析,篩選出最好的夾取點。
本流程之前兩個步驟,在實例切割上改進Mask R-CNN網路之資料收集流程以減少標註時間;在姿態估測上以表徵學習(Representation Learning)中之自動化編碼網路,學習到描述姿態的全域特徵向量,此方法為非監督式學習(Unsupervised Learning),因此不需姿態的標註,且解決了兩個問題:(1) 單一視角下姿態定義無法唯一的姿態歧異(Pose Ambiguity)問題;(2) 切割出的影像上有光線、遮擋……等等干擾的問題。此兩個流程皆使用訓練在平面影像之深度學習模型,對於使用一般深度影像品質的相機而言具有優勢,且能達到即時運算的功能。 除此之外,有別於直接在影像上利用深度學習模型提出候選夾取點,本研究採取先做姿態估測再做夾取經驗轉移,具有以下兩點優勢:(1) 可彈性調整與定義夾取區間,增加可行之夾取點數量;(2) 具有較好的穩健性,不易受影像上雜訊或是遮擋的影響。最後,本文利用深度影像提出干涉判斷的演算法,篩選出不會干涉鄰物的最佳夾取點。為了驗證此套流程,本文分別在自行收集的資料和開源資料T-LESS上驗證實例切割與姿態估測的模型,並且以商用RGB-D相機Kinect V2與六軸機械臂建立一套手眼機器人系統,實際上機驗證整套流程的夾取成功率和運算效率。最終在水五金的堆疊上,夾取成功率約為94%,實例切割與姿態估測的運算時間分別為56 msec和2 msec。展示本研究夾取流程的影片提供在https://www.youtube.com/watch?v=wc0pZV6NNFs&feature=youtu.be A novel pipeline for robotic cluttered objects grasping was proposed. In the pipeline, the instance segmentation, pose estimation and grasping experience transferring were developed to obtain feasible grasping candidates and subsequently an algorithm for collision avoidance and stability analysis was established to achieve the optimal grasping point for robot grasping. In the phase of instance segmentation, the process of data collecting and labeling in Mask R-CNN was improved to reduce the labeling time. In the phase of pose estimation, the auto-encoder, a representation-learning method, was established to learn the global feature vector of object pose. Auto-encoder is an unsupervised-learning method, so it is unnecessary to label object pose. Additionally, this approach solves two problems: (1) definition of object pose is not unique under single view, also known as pose ambiguity, (2) noise on segmented image including light, occlusion…etc. Both of these two processes were developed based on the deep-learning network using RGB images, having the advantages of using inexpensive depth camera and real time processing. Instead of finding grasping candidates directly from deep learning model, the strategy of finding object pose at first and then transferring grasping experience is used. There are two advantages in this strategy as (1) flexibility of adjusting and redefining grasping area such that more feasible grasping points can be obtained, (2) robustness under noise or occlusion on image. In the last phase, an algorithm of collision avoidance was established by using the depth images such that an optimal grasping point without interfering with adjacent objects was proposed. The instance segmentation and pose estimation methods were evaluated on the self-collecting dataset and T-LESS open dataset. An eye-to-hand six-axis robot with Kinect V2 RGB-D camera is used to evaluate the success rate for grasping and the efficiency of the overall pipeline. The grasping results on cluttered metal parts show that the success rate is about 94% and the running time of instance segmentation and pose estimation are respectively 56 msec and 2 msec. To demonstrate our grasping pipeline, a video is provided at https://www.youtube.com/watch?v=wc0pZV6NNFs&feature=youtu.be |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/74746 |
DOI: | 10.6342/NTU201904411 |
Fulltext Rights: | 有償授權 |
Appears in Collections: | 機械工程學系 |
Files in This Item:
File | Size | Format | |
---|---|---|---|
ntu-108-1.pdf Restricted Access | 4.22 MB | Adobe PDF |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.