基於空間感知卷積及形狀補全應用於擴增實境之三維語意分割

Yun-Chih Guo; 郭耘志

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/8111

標題:	基於空間感知卷積及形狀補全應用於擴增實境之三維語意分割 3D Semantic Segmentation based on Spatial-aware Convolution and Shape Completion for Augmented Reality Applications
作者:	Yun-Chih Guo 郭耘志
指導教授:	傅立成(Li-Chen Fu)
關鍵字:	場景語意分割,稀疏卷積網路,空間感知卷積,補全網路,擴增實境, Scene Semantic Segmentation,Sparse Convolutional Network,Spatial-aware convolution,Completion Network,Augment Reality,
出版年 :	2021
學位:	碩士
摘要:	三維室內場景語意分割在電腦視覺中是一項十分熱門的研究項目，對於許多應用程序而言，準確了解場景中每個點的類別非常重要，受益於深度學習的發展，已經有許多基於體素和點的神經網絡被提出來解決語義分割問題。但是，大多數都沒有充分考慮空間結構的資訊。本論文的目的是希望能夠提出一個系統，利用帶有顏色資訊的場景點雲對整個室內場景做語意分割。基於空間資料的稀疏性，我們設計了一個新穎的空間感知稀疏卷積運算。我們使用物體存在的空間資訊編碼作為額外的特徵，並使用自我注意力機制有效整合不同資訊；此外，我們引入了補全網路對分割網路的結果做修正，使場景中每種物體可以得到更加合裡和完整的形狀。透過以上兩點方法，我們建立準確的場景語意分割網路，獲得整個場景的屬性分類。在實驗的部分，我們使用兩個公開的資料集進行定量和定性的分析，首先我們對不同配置的模型進行比較，證明提出方法的有效性；其次將與其他最先進的方法進行比較，證明提出方法的優越性；最後則是一個實際應用的分析，說明具體的應用性。我們期望提出的三維場景語意分割系統能夠為實際應用提供準確而迅速的結果。 3D semantic segmentation of indoor scenes is a popular research topic in the field of computer vision. For many applications, it is very important to know exactly what category each point in the scene belongs to. Benefiting from the development of deep learning, many neural networks based on voxels and points have been proposed to solve semantic segmentation problems. However, most of them don't fully consider the information of the spatial structure. current voxel-based sparse convolutional neural networks can effectively extract the 3D features in the space. However, it assumes that the feature in empty space is zero, causing the information loss of the spatial structure. In this thesis, we propose a system that uses scene point clouds with color information to semantically segment an entire indoor scene. Based on the sparsity of spatial data, we design a novel spatial-aware sparse convolution operation. We encode the spatial information of the object's existence as an additional feature and use the self-attention mechanism to effectively aggregate features. In addition, we introduce a completion network to refine the results from the segmentation network, so that each object in the scene can get a more reasonable and complete shape. Through the above two methods, we build an accurate scene semantic segmentation network to obtain the semantic information of the entire scene. In the experimental part, we use two public datasets to perform quantitative and qualitative analysis. We compare our results with other state-of-the-art methods to prove the superiority of the method. Also, we examine our models with different configurations to assure the effectiveness of the proposed method. Finally, An real application is introduced to demonstrate our work. We expect that the proposed 3D scene semantic segmentation system can provide accurate and fast results for practical applications.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/8111
DOI:	10.6342/NTU202100720
全文授權:	同意授權(全球公開)
電子全文公開日期:	2024-03-01
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
U0001-1702202112242500.pdf	13.67 MB	Adobe PDF	檢視/開啟

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。