應用跨注意力機制於弱監督式3D點雲分割

楊証琨; Cheng-Kun Yang

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88160

標題:	應用跨注意力機制於弱監督式3D點雲分割 Applying Cross-attention Mechanism for Weakly Supervised Point Cloud Segmentation
作者:	楊証琨 Cheng-Kun Yang
指導教授:	莊永裕 Yung-Yu Chuang
共同指導教授:	林彥宇 Yen-Yu Lin
關鍵字:	跨注意力機制,3D點雲分割,弱監督式學習, Cross-attention,3D point cloud segmentation,Weakly supervised learning,
出版年 :	2023
學位:	博士
摘要:	3D 點雲分割可提供幾何空間以及語意等豐富資訊，對於像是室內場景理解、機器人或是自動駕駛等任務中扮演重要的應用角色。近年來由於深度神經網路的進步以及大量標註資料的建立，點雲分割模型已能展現出精準的結果，提供實務上應用的可行性。然而，精準的深度神經網路往往需要大量且細緻的標註資料進行訓練，而一個大型室內場景的點雲資料集，需要超過數百小時的人工標註時間才能夠完成，如此高昂的成本使得點雲分割的實際應用變得更加困難。本篇博士論文引入了弱監督式學習的方法，以降低模型對於標註資料的需求，同時保持可接受的精度水準。為了彌補弱監督標註的不足，我們應用了跨注意力機制，探討跨點雲之間的關係，挖掘出額外的監督損失提供模型訓練。為此，本篇論文開發出三個方法，並運用各種不同的弱監督標註來訓練點雲分割模型。針對第一個方法，僅需提供若干個包含相同物體類別的點雲，無需知道物體的類別也不需任何點的標註，我們的模型即可分割出屬於物體的點。為更進一步提升分割表現，在第二個方法中，我們使用場景級標註或稀疏點的等弱監督標註，並運用多實例學習 (multiple instance learning) 探討成對點雲之間的對應關係，藉此產生出額外的監督訊號，來訓練出高效的點雲分割模型。最後一個方法，我們利用 2D 影像與 3D 影像的互補性，引入 2D 影像的資訊，在僅有場景級的弱監督標註下，透過我們提出的交織式解碼器，有效結合 2D 影像與 3D 點雲各自的優勢，得到更好的點雲分割效果。經由多個公開資料集的實驗驗證與消融性實驗，我們的實驗結果表明，點雲分割模型在弱監督式的標註下，透過跨注意力機制來提供額外的監督訊號，依然可以提供更加的模型表現。我們提出的方法可廣泛適用於各種形態的弱監督標註，實驗效果均優於當時其他弱監督式學習的競爭方法，並且有效的降低點雲分割模型的應用成本。 3D point cloud segmentation provides rich information about geometric space and semantics, playing a crucial role in tasks such as scene understanding and autonomous driving. In recent years, point cloud segmentation models based on neural networks show promising results. However, deep neural networks often require vast annotated training data, posing challenges for practical applications of point cloud segmentation. This doctoral thesis introduces weakly supervised learning to alleviate the issue of high annotation cost. To compensate for the lack of supervision, we apply the cross-attention mechanism to explore relationships across point clouds and mine additional supervisory signals for model training. Consequently, this thesis develops three frameworks and utilizes various types of weak annotations to train point cloud segmentation models. The first method requires only several point clouds containing the same object category, without the need for explicit object class labels or point-level annotations, to segment the points belonging to the object. To further enhance segmentation performance, the second method leverages scene-level annotations or sparse point annotations, incorporating multiple instance learning to explore relationships between pairs of point clouds. Lastly, we incorporate 2D image information by introducing an interlaced decoder that effectively combines the strengths of 2D images and 3D point clouds, yielding improved point cloud segmentation results under scene-level supervision. Experimental results demonstrate that the proposed methods in this thesis are widely applicable to various forms of weak supervision, effectively reducing the cost associated with point cloud segmentation applications.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88160
DOI:	10.6342/NTU202301432
全文授權:	同意授權(全球公開)
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-111-2.pdf	7.31 MB	Adobe PDF	檢視/開啟

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。