請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/55895
標題: | 結合語義分割優化模型與二維光達點雲之物件辨識與測距 Object Recognition and Ranging Using Enhanced Image Semantic Segmentation Model and 2D-Lidar Point Cloud |
作者: | Chun-Wei Tseng 曾俊為 |
指導教授: | 李綱(Kang Li) |
關鍵字: | 語義分割,深度卷積神經網路, semantic segmentation,deep convolution neural network, |
出版年 : | 2020 |
學位: | 碩士 |
摘要: | 隨著深度學習於語義分割任務上的發展,其多數的深度神經網路模型都是利用多張高規的GPU卡於公開的資料集下進行訓練和評估,使其模型在語義分割上取得較佳的準確率,但高昂貴的硬體設備成了實際上模型訓練和應用的最大問題,因此為了能夠將語義分割於單張低規的GPU卡或是嵌入式板子下實際應用,本研究提出了一套全新的語義分割神經網路架構。此全新的語義分割神經網路為編碼解碼架構,主要分為兩大部分,首先為編碼的部分採用全新的卷積神經網路Dilated CSPDarkNet,此卷積神經網路是基於yolo-v4中使用的CSPDarkNet下進行延伸,接著為了解決當前多數的語義分割神經網路均會面對到兩大問題,分別為類間不清晰(Inter-class Inconsistency)和類別內不一致(Intra-class Inconsistency),此神經網路架構於解碼中加入基於自注意力機制下的全局語義和局部語義資訊模塊進行圖像語義資訊的強化。 本研究所提出的語義分割神經網路,相比於其他的語義分割模型,其模型參數量降低了大約50%左右,計算量降低了35%左右,並於公開的Pascal VOC資料集的Mean IoU達到 %,同時將其模型實際應用於單張Nvidia GTX 1060的GPU卡的硬體設備上,最終達到6.5fps的推論時間。本研究除了語義分割神經網路的改善外,將此分割圖像與二維光達的距離資訊進行結合,構成一套應用於智慧型載具中的感知系統。 關鍵字:語義分割、深度卷積神經網路 With the development of semantic segmentation which use deep learning, most of these deep neural networks train and evaluate on open dataset by multiple high level GPU card. These neural networks obtain the higher accuracy and break the new record on semantic segmentation. However, these high level hardware system become the most challenging problem on application. In order to apply semantic segmentation model on lower level GPU card or embedded board, this study proposed a new semantic segmentation architecture, which is encoder-decoder based model. Encoder use the new convolution neural network, Dilated CSPDarkNet53, proposed by this study. Dilated CSPDarkNet53 is the extension of CSPDarkNet53 which is used in yolo-v4. Most existing methods of semantic segmentation still suffer from two challenges: intra-class inconsistency and inter-class indistinction. To tackle these two problem, we proposed a global context module and local context module which base on self attention mechanism enhance the image semantic of feature map. With the development of semantic segmentation which use deep learning, most of these deep neural networks train and evaluate on open dataset by multiple high level GPU card. These neural networks obtain the higher accuracy and break the new record on semantic segmentation. However, these high level hardware system become the most challenging problem on application. In order to apply semantic segmentation model on lower level GPU card or embedded board, this study proposed a new semantic segmentation architecture, which is encoder-decoder based model. Encoder use the new convolution neural network, Dilated CSPDarkNet53, proposed by this study. Dilated CSPDarkNet53 is the extension of CSPDarkNet53 which is used in yolo-v4. Most existing methods of semantic segmentation still suffer from two challenges: intra-class inconsistency and inter-class indistinction. To tackle these two problem, we proposed a global context module and local context module which base on self attention mechanism enhance the image semantic of feature map. The semantic segmentation model which is proposed by this study compare with other existing model. The amount of model parameters reduce about 50%, and the amount of model computation reduce about 35%. Based on our proposed model, we achieve performance 77.63% mean IoU on Pascal VOC dataset. Then we apply this model on Nvidia GTX 1060 card. Finally we get 6.5 fps inference time. In addition to improving the semantic segmentation model, this study combine segmentation image and 2D Lidar distance information to construct a laser-projected-camera perception system which can be used on automated vehicle. Keywords: semantic segmentation, deep convolution neural network |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/55895 |
DOI: | 10.6342/NTU202001983 |
全文授權: | 有償授權 |
顯示於系所單位: | 機械工程學系 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
U0001-2807202017210300.pdf 目前未授權公開取用 | 11.1 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。