Q-PointNet:以RGBD相機和多感測器夾爪結合深度學習技術 實現被遮擋物之取件任務

Chi-Heng Wang; 王啟恆

Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/54095

Title:	Q-PointNet:以RGBD相機和多感測器夾爪結合深度學習技術實現被遮擋物之取件任務 Q-PointNet: Intelligent Stacked-Objects Grasping Using a RGBD Sensor and a Dexterous Hand
Authors:	Chi-Heng Wang 王啟恆
Advisor:	林沛群(Pei-Chun Lin peichunlin@ntu.edu.tw )
Keyword:	通用型夾爪,深度學習架構,Mask R-CNN,PointNet,視覺定位,夾取姿態演算法,力感測器,點雲處理, grasping,stacked objects,RGBD,PointNet,Mask R-CNN,dexterous hand,
Publication Year :	2020
Degree:	碩士
Abstract:	本研究的目標是結合RGBD相機、多感測器夾爪(力感測器、電位計、霍夫感測器)、並且結合深度學習技術，在堆疊環境的情況下(如水果籃、玩具盒等)，實現對待抓取的目標物(有被遮擋的現象)進行穩定的夾取任務。視覺相機在工業或自動化生產線上使用的非常普遍，通常用來與機械手臂與機械夾爪搭配使用。在相機中的挑選又以純RGB的相機跟RGBD的相機更為常見，但RGBD的相機與純RGB的相機相比，多了深度的資訊，是更適合機器人使用，也能夠減少一些不必要的演算法的計算時間去得到物體實際的位置。此外，在Eye-to-hand與Eye-in-hand的系統中，本研究採用Eye-in-hand的系統，是更能夠以多元的角度去觀測物體而不是固定視角，讓機器手臂如同人一樣能夠更有彈性的產生不同視角的資料。另外，也能夠在遮蔽情況產生時，藉由不同角度的觀察，找到目標物體。影像定位與姿態辨識的部分。首先是影像定位的步驟，配合RGBD的資訊，能夠藉由Mask R-CNN找出目標物的Mask，並且藉由深度資訊將其部分點雲找出，此外藉由Mask R-CNN在部分遮蔽時也能夠找出目標物的Mask，搭配Eye-in-hand的系統能夠有效的得到物體剩餘的形狀資訊。在夾取姿態演算法的步驟中，引入PointNet在點雲分類上傑出的表現，並且藉由Transfer learning的技術將其改成姿態辨識的框架—Q-PointNet。除了產生出四元數當成抓取姿態以外，在同時也藉由這個框架產生出夾爪的模式(兩指或三指)，以達到讓夾爪能有更好的表現，而在這邊的Q-PointNet其實可以想成是一個形狀的分辨器，而在觀察到不同的部分點雲時，此分辨器會給出一個適當的抓取姿態，尤其在當物體的幾何資訊(點雲資訊)不足時使用，也就是當目標物受到其他物品的遮擋時，仍然能夠產生出好的夾取姿態。結合夾爪上電位計的感測效果，提出與Q-PointNet產生的夾取姿態相關的寬度計算演算法。在電位計使用上，可以讓指頭達到非常準確的角度量測，以用來控制指頭應該到達的位置。而此寬度計算的演算法則是用來計算希望夾爪預先要張開的寬度，以利於機械手臂與夾爪在執行夾取過程時，與其他物體產生碰撞。在準備訓練研究中提出的架構時，總共需要準備兩種數據集，一個是訓練Mask R-CNN使用，另一個是用來訓練Q-PointNet使用。在Mask R-CNN訓練資料集的準備，研究中提出一個可行的解決方案—Auto-labeling，用來快速產生訓練Mask R-CNN的數據集。在Q-PointNet訓練資料集的準備上，研究中則實做了一個介面去做資料集的準備(部分點雲、夾取姿態、模式)。在介面中引入PCA當成是調整夾取姿態用的參考坐標系，並用尤拉角的計算讓介面上可以實時調整標註姿態。此外在介面上也引入ModelNet40資料集來產生模擬部分點雲，以減少資料蒐集上的時間。同時，在點雲資料的處理上，研究中也將常用的處理點雲的方法統整成一個套件包，以方便在點雲資料前處理使用。最後，在實際的機器人實驗中，對影像定位做了實驗去驗證其精度誤差可以在正負4mm以內。另外對於夾取姿態演算法做了不同情境下的夾取實驗，分別是單個物體的夾取、倚靠物的夾取、分散物的夾取、堆疊物的夾取，以驗證Q-PointNet產生的夾取姿態與夾取策略能夠在不同情境下都適用，並且搭配力感測器，讓夾爪達到更穩定的抓取。 The goal of this research is to combine RGBD camera, multi-sensor gripper (force sensor, potentiometer), and deep learning technology to achieve a stable gripping task. In this research, RGBD camera is chosen as our visual information rather than using RGB camera because it provides us with depth information more than just RGB images. The eye-in-hand method adopted in the study is more capable of observing objects from multiple angles than a fixed angle of view, allowing camera to produce data from different angles of view more flexibly like a human. In addition, it can also detect the target object through observation from different angles when the occlusion situation occurs. With respect to the gripping task, localization of object and determination of the grasping pose are the significant problems to be resolved. In this thesis, it proposes deep-learnt grasping algorithm, Q-PointNet, which is capable of determining an adequate strategy for grasping a partially exposed object in a stacked pile. The grasping strategy includes the gripper’s posture and the finger mode, whether two fingers or three fingers. Because our predicted outputs are quaternion and mode, it also explains fully how to utilize the hybrid loss function to reach our goal within the limited training dataset. Combined with the sensing effect of the potentiometer on the gripper, a width calculation algorithm related to the gripping posture generated by Q-PointNet is proposed. Potentiometers installed on the gripper are possible to measure the rotating angle very accurately from the finger and can also let us to determine the position where the finger should reach. Afterwards, with the object width calculation and the setup of open width on the gripper in advance, the robot arm and the gripper won't collide with other objects during the gripping process. For preparation of the dataset in Q-PointNet structure, this thesis reports two possible ways to prepare for the training dataset. In Q-PointNet structure, it can be separated into two steps so two kind of dataset need to be prepared. Firstly, in Mask R-CNN, it usually needs to label image through LabelMe application, which is a laborious work. However, in thesis, it reports a faster way to label the data and it’s called auto-labeling method. Secondly, in Q-PointNet, it also needs to prepare training dataset including grasping pose, mode, and partial point cloud. In order to visualize and accelerate the labeling procedure, this thesis reports a pose annotation system to generate the dataset. In the end, the grasp experiments and their operation flows are presented, and the results show that our approach is suitable for various scenarios. Moreover, with force sensor, it gives more stable grasping when robotic arm is carrying out the task.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/54095
DOI:	10.6342/NTU202002335
Fulltext Rights:	有償授權
Appears in Collections:	機械工程學系

Files in This Item:

File	Size	Format
U0001-0408202000225100.pdf Restricted Access	11.32 MB	Adobe PDF

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets