請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/92793
標題: | 基於點雲辨識堆疊物件之六自由度夾取 Six-DoF Robotic Grasping of Objects in Clutter via Point Cloud |
作者: | 林奕安 Yi-An Lin |
指導教授: | 李志中 Jyh-Jone Lee |
關鍵字: | 深度學習,六自由度夾取,姿態估測,點雲影像,亂料夾取, Deep Learning,6-DoF Grasp,Pose Estimation,Point Cloud,Random Bin Picking, |
出版年 : | 2024 |
學位: | 碩士 |
摘要: | 隨著人工智慧科技的進步,機械手臂在夾取物件的技術有了顯著提升。對於物件的辨識,從易受環境影響的 RGB-D 影像逐漸轉向能提供更多幾何資訊並且不受光線影響的點雲影像。然而,點雲影像的品質影響深度學習模型預測的優劣,因此需要較昂貴的相機以獲得高品質的點雲。為解決此問題,本研究透過整合不同深度學習模型以提升模型對較經濟型相機所取得影像的預測能力,並透過電腦圖形軟體與目標物件的 CAD 模型來合成訓練資料集,再以自我學習演算法提升模型的表現,最後以少量真實資料進行模型微調。其中,本研究自行開發了一套快速標註方法,以標註少量真實資料,減少前置作業的時間。
本研究開發了一套六自由度(6D)夾取堆疊物件的流程,使用經濟型相機Kinect v2拍攝 RGB-D 影像,將 RGB 影像輸入實例切割模型(YOLO-v8)預測物件遮罩及類別,並將遮罩下的深度影像轉換成點雲後輸入姿態估測模型(PPR-Net)預測物件姿態。接著,本研究開發一種6D夾取策略進行夾取框篩選及夾取高度計算,以決定最佳的夾取框及機械手臂夾取姿態。最後,在真實堆疊物件的工作平台上進行夾取實驗,以全自動方式分別進行四種夾取實驗,包括金屬圓管、T型塑膠水管及L型門把的單一種類夾取實驗,以及三種物件的混合夾取分類實驗。實驗結果顯示夾取成功率分別為100%、96.39%、89.89%及94.48%。夾取展示影片如下網址: https://youtu.be/LDHmKGCPOMo?si=eBGcqo6szhjQluYw With the advancement of artificial intelligence, there has been significant progress in the capabilities of robotic arms in grasping objects. Regarding object recognition, there is a shift from RGB-D images, which are susceptible to environmental influences, towards point clouds that provide more geometric information and are less affected by lighting conditions. However, the quality of point clouds affects the accuracy of deep learning models, necessitating the use of more expensive cameras to obtain high-quality point clouds. To address this issue, this study integrates different deep learning models to enhance the performance of models on images obtained from economical cameras. Additionally, it synthesizes training data by rendering software with CAD models, employs self-learning algorithms to improve the model''s performance, and fine-tunes it with a small amount of real data. In this process, the study developed a rapid annotation method to label the real data, reducing preprocessing time. This study developed a 6-degree-of-freedom (6D) grasping process for stacked objects, utilizing economical cameras to capture RGB-D images. It involves inputting RGB images into an instance segmentation model (YOLO-v8) to predict object masks and categories. The depth images under the masks are then transformed into point clouds and fed into a pose estimation model (PPR-Net) to predict object poses. Finally, the study developed a 6D grasping strategy for frame selection and depth calculation to determine the optimal grasping pose. Experiments were conducted on a real platform in a fully automated manner, performing four types of grasping experiments: single-category grasping of metal circular pipes, T-shaped plastic pipes, and L-shaped door handles, as well as a mixed-category grasping experiment with three types of objects. The results showed success rates of 100%, 96.39%, 89.89%, and 94.48%, respectively. The demonstration video of the grasping can be found at the following URL: https://youtu.be/LDHmKGCPOMo?si=eBGcqo6szhjQluYw |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/92793 |
DOI: | 10.6342/NTU202401235 |
全文授權: | 同意授權(全球公開) |
顯示於系所單位: | 機械工程學系 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-112-2.pdf 此日期後於網路公開 2026-07-01 | 5.08 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。