應用於自動駕駛基於點雲及圖片之3D物件偵測網路

Dza-Shiang Hong; 洪子翔

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/78624

標題:	應用於自動駕駛基於點雲及圖片之3D物件偵測網路 CrossFusion Net: 3D Object Detection Based on RGB Images and Point Clouds in Autonomous Driving
作者:	Dza-Shiang Hong 洪子翔
指導教授:	傅立成(Li-Chen Fu)
關鍵字:	深度學習,卷積神經網路,物件偵測,3D物件偵測, Deep Learning,3D Object Detection,Autonomous driving,
出版年 :	2019
學位:	碩士
摘要:	近年來，先進駕駛輔助系統(ADAS) 隨著相關技術的迅速發展逐漸成熟，而3D物件偵測在先進駕駛輔助系統中扮演著不可或缺的角色，有了3D物件偵測的幫助，先進駕駛輔助系統可以準確的定位出物件在3D空間中的位置從而進行後續的輔助判斷。另外也因為先進駕駛輔助系統的發展，相關的硬體設備也日益完備，現今不少先進駕駛輔助系統的研究用車是同時裝有多種感測器，像是相機、光達(LiDAR) 以及慣性測量單位(IMU) 等，有鑒於不同的感測器在不同的感測項目各自有其先天性的極限存在，如何在不同的感測器中集各家所長以達到更好的偵測效果便成為了一個問題。本研究中，首先分析最先進的3D物件偵測方法。這些方法多數基於深度學習和類神經卷積網路以習得更具有代表性的特徵進行預測，但是並未有效率地加以利用不同感測器之間的特徵。基於此觀察，本研究提出一種新穎的3D物件偵測網路，它包含了一個利用圖片和點雲的特徵之間的空間關係作為依據的特徵融合機制，旨在提高3D物件偵測之準確性和穩定性。本研究以KITTI數據集評估所提出的方法，該資料集是目前公認最具挑戰性和權威性的3D物件偵測數據集。評估結果顯示，本研究的平均準確度超過 80%。 In recent years, accurate 3D detection plays an important role in a lot of applications. Autonomous driving, for instance, is one of typical representatives. This thesis aims to design an accurate 3D on-road vehicles detector that takes both LiDAR point clouds and RGB images as inputs due to the fact that both LiDAR and camera have their own merits. We design a novel end-to-end learnable architecture, CrossFusion Net, that exploits feature from both LiDAR point clouds as well as RGB images through a hierarchical fusion structure. Specifically, CrossFusion Net utilizes bird’s eye view(BEV) of point clouds through projection. Besides, we fuse these two feature maps of different shapes through the newly introduced CrossFusion layer. Our proposed CrossFusion layer transform one feature map to the shape of another based on the spatial relationship between the two feature maps. Additionally, we apply attention map on the transformmed feature map and the original one to automatically decide the importance of the two respective feature maps. Experiments on the challenging KITTI[1] car 3D detection benchmark and BEV benchmark show that our approach outperforms other state-of-the-art methods in mean average precision(mAP). Furthermore, our network learns an effective representation in perception of circumstances via RGB feature maps and BEV feature maps.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/78624
DOI:	10.6342/NTU201903208
全文授權:	有償授權
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-108-R06922096-1.pdf 目前未授權公開取用	2.81 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。