請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/99384| 標題: | HintOcc:透過空間感知與動態類別平衡提升鳥瞰圖轉三維的佔據預測 HintOcc: Enhancing BEV-to-3D Reconstruction in Occupancy Prediction with Spatial-Awareness and Dynamic Class Balancing |
| 作者: | 李哲維 Che-Wei Lee |
| 指導教授: | 傅立成 Li-Chen Fu |
| 關鍵字: | 深度學習,電腦視覺,三維佔據網路,類別不平衡,鳥瞰圖, Deep Learning,Computer Vision,3D Occupancy Prediction,Class Imbalance,Bird’s-Eye View, |
| 出版年 : | 2025 |
| 學位: | 碩士 |
| 摘要: | 隨著自動駕駛的快速發展,三維感知已成為智慧汽車的核心能力。基於攝影機的系統必須準確理解環境的結構和語義,以支援運動規劃和決策等下游任務。在複雜動態的交通場景中,如何高效且準確地偵測物體,是確保行車安全的重要挑戰。
傳統方法利用三維體素表示進行編碼和解碼。雖然能達到良好性能,但常伴隨龐大的記憶體消耗與高昂的計算成本。 為提升效率,部分研究將三維體素空間投影至二維鳥瞰圖表示,有效降低記憶體開銷並維持合理性能。然而,反向過程中,即自BEV特徵重建回三維空間時,常因缺乏垂直訊息而導致準確率下降。 此外,現有的三維佔用率資料集普遍存在嚴重的類別不平衡問題,對於如行人與摩托車等低出現頻率但對安全至關重要的目標,其表徵能力明顯不足。 針對上述挑戰,我們提出HintOcc,一個高效的三維佔據預測框架,旨在強化BEV特徵的三維重建能力,並改善類別不平衡的問題。它能夠增強二維鳥瞰圖特徵到三維體素空間重建的能力,同時提升現實世界數據集中代表性不足類別的性能。首先,我們引入了二維垂直視圖分支,以結構提示形式提供高度訊息,輔助網路對二維至三維特徵的重建過程。其次,我們採用可變形深度可分離頭進行空間自適應解碼,同時降低參數開銷。最後,我們提出了一種批次動態加權策略,根據每個訓練批次中類別的出現情況,自適應地強化稀有類別的學習。 我們在 Occ3D-NuScenes 基準上評估了我們的方法。實驗結果表明,在相當的計算限制條件下,HintOcc 的表現優於現有方法,並且與基準模型相比,它提高了代表性不足類別的準確率。消融研究進一步驗證了每個組件在增強基於二維鳥瞰圖特徵的三維佔用預測方面的有效性。 With the rapid advancement of autonomous driving technologies, 3D visual perception has become a critical component in the design of intelligent vehicles. Given the inherent complexity of real-world driving environments, camera-based 3D perception systems must provide accurate and comprehensive spatial and semantic understanding to support downstream tasks such as motion planning and decision-making. Accurately and efficiently identifying objects is vital for ensuring safety in complex and dynamic driving scenes. Traditional approaches utilize 3D voxel representations for both encoding and decoding. Despite their notable performance, these methods often result in excessive memory consumption and high computational complexity. To improve efficiency, several works project the 3D voxel space into a 2D bird's-eye view (BEV) representation, reducing memory consumption while preserving performance. Nonetheless, the inverse process—lifting 2D BEV features back into 3D—often lacks vertical information, leading to degraded accuracy. On the other hand, existing 3D occupancy datasets often suffer from severe class imbalance, where less frequent but safety-critical objects—such as pedestrians and motorcycles—are significantly underrepresented compared to dominant classes like drivable areas and vegetation. To enhance the performance on 2D BEV-to-3D reconstruction and improve class imbalance problem in 3D occupancy prediction, we propose HintOcc, an efficient framework that enhances the ability of BEV to 3D reconstruction while improving the performance of under-represented classes of real-world dataset. First, we introduces 2D vertical-view branch that provides structural hints to enrich height information for 2D to 3D reconstruction. Second, we employs a deformable depth-separable head for spatially adaptive decoding while reducing parameter overhead. Finally, we proposes a batch-wise dynamic weighting strategy that adaptively emphasizes rare classes based on the classes present within each batch. We evaluate our method on the Occ3D-NuScenes benchmark. Experimental results show that HintOcc outperforms existing methods under comparable computational constraints and improves accuracy for underrepresented classes compared to baseline model. Ablation studies further verify the effectiveness of each component in enhancing BEV-based 3D occupancy prediction. |
| URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/99384 |
| DOI: | 10.6342/NTU202503090 |
| 全文授權: | 同意授權(限校園內公開) |
| 電子全文公開日期: | 2030-07-30 |
| 顯示於系所單位: | 資訊工程學系 |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-113-2.pdf 未授權公開取用 | 5 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
