深度引導跨視角多目相機三維物體檢測

曾靖渝; Ching-Yu Tseng

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88102

標題:	深度引導跨視角多目相機三維物體檢測 CrossDTR: Cross-view and Depth-guided Transformers for 3D Object Detection
作者:	曾靖渝 Ching-Yu Tseng
指導教授:	陳文進 Wen-Chin Chen
關鍵字:	電腦視覺,自駕車,物件偵測, Computer Vision,Autonomous Driving,Object Detection,
出版年 :	2023
學位:	碩士
摘要:	為了在自駕車中以低成本實現準確的三維物體檢測，許多多目相機方法被提出來解決單相機方法中的遮擋問題。然而，由於缺乏準確的深度估計，現有的多目相機方法通常會在深度方向的射線上因難以檢測的小型物體（如行人）而預測多個邊界框，導致召回率極低。此外，直接將通常由大型網絡結構組成的深度預測模塊應用於現有的多目相機方法，無法滿足自駕車應用的即時預測要求。為了解決這些問題，我們提出了用於深度引導跨視角多目相機三維物體檢測（CrossDTR）。首先，我們設計了輕量級的「深度預測器」，以在監督過程中生成精確的物體稀疏深度圖和低維深度嵌入向量，而無需額外的深度數據集來監督。其次，我們開發了一個「深度引導跨視角多目變換器」，用於融合來自不同相機視角的深度嵌入和影像特徵，並生成三維邊界框。廣泛的實驗表明，我們的方法在行人檢測方面總共超過現有的多目相機方法10％，在整體平均精度（mAP）和標准化檢測得分（NDS）指標方面超過約3％。此外，計算分析顯示，我們的方法比先前的方法快5倍。我們的代碼將在https://github.com/sty61010/CrossDTR 公開提供。 To achieve accurate 3D object detection at a low cost for autonomous driving, many multi-camera methods have been proposed and solved the occlusion problem of monocular approaches. However, due to the lack of accurate estimated depth, existing multi-camera methods often generate multiple bounding boxes along a ray of depth direction for difficult small objects such as pedestrians, resulting in an extremely low recall. Furthermore, directly applying depth prediction modules to existing multi-camera methods, generally composed of large network architectures, cannot meet the real-time requirements of self-driving applications. To address these issues, we propose Cross-view and Depth-guided Transformers for 3D Object Detection, CrossDTR. First, our lightweight \textit{depth predictor} is designed to produce precise object-wise sparse depth maps and low-dimensional depth embeddings without extra depth datasets during supervision. Second, a \textit{cross-view depth-guided transformer} is developed to fuse the depth embeddings as well as image features from cameras of different views and generate 3D bounding boxes. Extensive experiments demonstrated that our method hugely surpassed existing multi-camera methods by 10 percent in pedestrian detection and about 3 percent in overall mAP and NDS metrics. Also, computational analyses showed that our method is 5 times faster than prior approaches. Our codes will be made publicly available at https://github.com/sty61010/CrossDTR.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88102
DOI:	10.6342/NTU202301047
全文授權:	同意授權(全球公開)
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-111-2.pdf	5.84 MB	Adobe PDF	檢視/開啟

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。