請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88093完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 徐宏民 | zh_TW |
| dc.contributor.advisor | Winston H. Hsu | en |
| dc.contributor.author | 陳義榮 | zh_TW |
| dc.contributor.author | Yi-Rong Chen | en |
| dc.date.accessioned | 2023-08-08T16:16:01Z | - |
| dc.date.available | 2023-11-09 | - |
| dc.date.copyright | 2023-08-08 | - |
| dc.date.issued | 2023 | - |
| dc.date.submitted | 2023-07-12 | - |
| dc.identifier.citation | [1] G. Brazil and X. Liu. M3drpn: Monocular 3d region proposal network for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9287–9296, 2019.
[2] G. Brazil, G. PonsMoll, X. Liu, and B. Schiele. Kinematic 3d object detection in monocular video. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXIII 16, pages 135–152. Springer, 2020. [3] F. Chabot, M. Chaouch, J. Rabarisoa, C. Teuliere, and T. Chateau. Deep manta: A coarse-to-fine many-task network for joint 2d and 3d vehicle analysis from monocular image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017. [4] X. Chen, K. Kundu, Y. Zhu, A. G. Berneshawi, H. Ma, S. Fidler, and R. Urtasun. 3d object proposals for accurate object class detection. Advances in neural information processing systems, 28, 2015. [5] Y. Chen, L. Tai, K. Sun, and M. Li. Monopair: Monocular 3d object detection using pairwise spatial relationships. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12093–12102, 2020. [6] H. Fu, M. Gong, C. Wang, K. Batmanghelich, and D. Tao. Deep ordinal regression network for monocular depth estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2002–2011, 2018. [7] A. Geiger, P. Lenz, and R. Urtasun. Are we ready for autonomous driving? the kitti vision benchmark suite. In Conference on Computer Vision and Pattern Recognition (CVPR), 2012. [8] K. He, G. Gkioxari, P. Dollár, and R. Girshick. Mask rcnn. In Proceedings of the IEEE international conference on computer vision, pages 2961–2969, 2017. [9] J. Huang, G. Huang, Z. Zhu, Y. Ye, and D. Du. Bevdet: Highperformance multicamera 3d object detection in bird-eye-view. arXiv preprint arXiv:2112.11790, 2021. [10] K.C. Huang, T.H. Wu, H.T. Su, and W. H. Hsu. Monodtr: Monocular 3d object detection with depth-aware transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4012–4021, 2022. [11] J. Ku, A. Harakeh, and S. L. Waslander. In defense of classical image processing: Fast depth completion on the cpu. In 2018 15th Conference on Computer and Robot Vision (CRV), pages 16–22. IEEE, 2018. [12] A. H. Lang, S. Vora, H. Caesar, L. Zhou, J. Yang, and O. Beijbom. Pointpillars: Fast encoders for object detection from point clouds. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12697–12705, 2019. [13] Y. Li, Z. Ge, G. Yu, J. Yang, Z. Wang, Y. Shi, J. Sun, and Z. Li. Bevdepth: Acquisition of reliable depth for multiview 3d object detection. arXiv preprint arXiv:2206.10092, 2022. [14] T.Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, pages 2980–2988, 2017. [15] Y. Lu, X. Ma, L. Yang, T. Zhang, Y. Liu, Q. Chu, J. Yan, and W. Ouyang. Geometry uncertainty projection network for monocular 3d object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3111–3121, 2021. [16] X. Ma, S. Liu, Z. Xia, H. Zhang, X. Zeng, and W. Ouyang. Rethinking pseudo-lidar representation. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIII 16, pages 311–327. Springer, 2020. [17] X. Ma, Y. Zhang, D. Xu, D. Zhou, S. Yi, H. Li, and W. Ouyang. Delving into localization errors for monocular 3d object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4721–4730, 2021. [18] A. Mousavian, D. Anguelov, J. Flynn, and J. Kosecka. 3d bounding box estimation using deep learning and geometry. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 7074–7082, 2017. [19] D. Park, R. Ambrus, V. Guizilini, J. Li, and A. Gaidon. Is pseudo-lidar needed for monocular 3d object detection? In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3142–3152, 2021. [20] Z. Qin and X. Li. Monoground: Detecting monocular 3d objects from the ground. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3793–3802, 2022. [21] C. Reading, A. Harakeh, J. Chae, and S. L. Waslander. Categorical depth distribution network for monocular 3d object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8555–8564, 2021. [22] T. Roddick, A. Kendall, and R. Cipolla. Orthographic feature transform for monocular 3d object detection. arXiv preprint arXiv:1811.08188, 2018. [23] S. Shi, C. Guo, L. Jiang, Z. Wang, J. Shi, X. Wang, and H. Li. Pvrcnn: Pointvoxel feature set abstraction for 3d object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10529–10538, 2020. [24] S. Shi, X. Wang, and H. Li. Pointrcnn: 3d object proposal generation and detection from point cloud. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 770–779, 2019. [25] A. Simonelli, S. R. Bulo, L. Porzi, M. LópezAntequera, and P. Kontschieder. Disentangling monocular 3d object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1991–1999, 2019. [26] Y. Tang, S. Dorn, and C. Savani. Center3d: Center-based monocular 3d object detection with joint depth understanding. In Pattern Recognition: 42nd DAGM German Conference, DAGM GCPR 2020, Tübingen, Germany, September 28–October 1, 2020, Proceedings, pages 289–302. Springer, 2021. [27] Z. Tian, C. Shen, H. Chen, and T. He. Fcos: Fully convolutional one-stage object detection. In Proceedings of the IEEE/CVF international conference on computer vision, pages 9627–9636, 2019. [28] C.Y. Tseng, Y.R. Chen, H.Y. Lee, T.H. Wu, W.C. Chen, and W. Hsu. Crossdtr: Crossview and depth-guided transformers for 3d object detection. arXiv preprint arXiv:2209.13507, 2022. [29] T. Wang, X. Zhu, J. Pang, and D. Lin. Fcos3d: Fully convolutional one-stage monocular 3d object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 913–922, 2021. [30] Y. Wang, W.L. Chao, D. Garg, B. Hariharan, M. Campbell, and K. Q. Weinberger. Pseudolidar from visual depth estimation: Bridging the gap in 3d object detection for autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8445–8453, 2019. [31] Y. Wang, V. C. Guizilini, T. Zhang, Y. Wang, H. Zhao, and J. Solomon. Detr3d: 3d object detection from multiview images via 3dto2d queries. In Conference on Robot Learning, pages 180–191. PMLR, 2022. [32] X. Weng and K. Kitani. Monocular 3d object detection with pseudo-lidar point cloud. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pages 0–0, 2019. [33] T. W. Xianpeng Liu, Nan Xue. Learning auxiliary monocular contexts helps monocular 3d object detection. In 36th AAAI Conference on Artificial Intelligence (AAAI), February 2022. [34] F. Yu, D. Wang, E. Shelhamer, and T. Darrell. Deep layer aggregation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2403– 2412, 2018. [35] R. Zhang, H. Qiu, T. Wang, X. Xu, Z. Guo, Y. Qiao, P. Gao, and H. Li. Monodetr: Depthaware transformer for monocular 3d object detection. arXiv preprint arXiv:2203.13310, 2022. [36] Y. Zhang, J. Lu, and J. Zhou. Objects are different: Flexible monocular 3d object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3289–3298, 2021. | - |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88093 | - |
| dc.description.abstract | 單目三維物體檢測在利用深度資訊後取得了重大進展,然而,由於深度預測尚不精確,其性能表現仍然與LiDAR 方法有巨大差距。我們認為此缺陷源自於常用的基於像素的深度圖損失函數,這種損失函數先天上會使近物體與遠物體有不平衡的損失權重。為了解決這些問題,我們提出MonoHBD (Monocular Hierarchical Balanced Depth),一個使用分層式架構的綜合的解決方案。我們結合深度分桶與深度偏移量,設計出分層式深度結構(Hierarchical Depth Map) 來提升物體的定位精度。通過使用RoIAlign,我們的平衡深度擷取器(Balanced Depth Extractor) 利用了相機內外參數以考慮幾何關係,同時捕捉了場景層級與物體層級的深度資訊。此外,我們還提出了一個嶄新的深度圖損失函數,解決了不同距離的物體不同損失權重的問題。我們提出的模型在KITTI 三維物體偵測排行榜上取得了最先進的結果,並且我們的模型支援實時檢測。我們進行了大量的消融研究已證明我們方法的有效性。 | zh_TW |
| dc.description.abstract | Monocular 3D object detection has seen significant advancements with the incorporation of depth information. However, there remains a considerable performance gap compared to LiDAR-based methods, largely due to inaccurate depth estimation. We argue that this issue stems from the commonly used pixel-wise depth map loss, which inherently creates the imbalance of loss weighting between near and distant objects. To address these challenges, we propose MonoHBD (Monocular Hierarchical Balanced Depth), a comprehensive solution with the hierarchical mechanism. We introduce the Hierarchical Depth Map (HDM) structure that incorporates depth bins and depth offsets to enhance the localization accuracy for objects. Leveraging RoIAlign, our Balanced Depth Extractor (BDE) module captures both scene-level depth relationships and object-specific depth characteristics while considering the geometry properties through the inclusion of camera calibration parameters. Furthermore, we propose a novel depth map loss that regularizes object-level depth features to mitigate imbalanced loss propagation. Our model reaches state-of-the-art results on the KITTI 3D object detection benchmark while supporting real-time detection. Excessive ablation studies are also conducted to prove the efficacy of our proposed modules. | en |
| dc.description.provenance | Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-08-08T16:16:01Z No. of bitstreams: 0 | en |
| dc.description.provenance | Made available in DSpace on 2023-08-08T16:16:01Z (GMT). No. of bitstreams: 0 | en |
| dc.description.tableofcontents | Verification Letter from the Oral Examination Committee i
Acknowledgements iii 摘要 v Abstract vii Contents ix List of Figures xi List of Tables xiii Chapter 1 Introduction 1 Chapter 2 Related Work 5 2.1 Image-only Monocular 3D Object Detection 5 2.2 Depth-guided Monocular 3D Object Detection 6 Chapter 3 Method 7 3.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.2 Overview and Architecture . . . . . . . . . . . . . . . . . . . . . . . . 7 3.3 Hierarchical Depth Map . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.4 Balanced Depth Extractor . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.5 3D Detection Head and Loss Function . . . . . . . . . . . . . . . . . . . 11 Chapter 4 Experiment 15 4.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.2 Main Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 4.3 Ablation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 Chapter 5 Conclusion 21 References 23 Appendix A — Supplementary Materials 29 A.1 Details of Depth-aware Transformer . . . . . . . . . . . . . . . . . . 29 A.2 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . 29 A.3 Additional Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 | - |
| dc.language.iso | en | - |
| dc.subject | 單目三維物體檢測 | zh_TW |
| dc.subject | 自動駕駛汽車 | zh_TW |
| dc.subject | autonomous driving | en |
| dc.subject | monocular 3D object detection | en |
| dc.title | 透過分層式平衡深度重新檢視單目三維物體檢測中之深度引導方法 | zh_TW |
| dc.title | Revisiting Depth-guided Methods for Monocular 3D Object Detection by Hierarchical Balanced Depth | en |
| dc.type | Thesis | - |
| dc.date.schoolyear | 111-2 | - |
| dc.description.degree | 碩士 | - |
| dc.contributor.oralexamcommittee | 陳文進;陳奕廷;葉梅珍 | zh_TW |
| dc.contributor.oralexamcommittee | Wen-Chin Chen;Yi-Ting Chen;Mei-Chen Yeh | en |
| dc.subject.keyword | 單目三維物體檢測,自動駕駛汽車, | zh_TW |
| dc.subject.keyword | monocular 3D object detection,autonomous driving, | en |
| dc.relation.page | 33 | - |
| dc.identifier.doi | 10.6342/NTU202301013 | - |
| dc.rights.note | 同意授權(全球公開) | - |
| dc.date.accepted | 2023-07-13 | - |
| dc.contributor.author-college | 電機資訊學院 | - |
| dc.contributor.author-dept | 資訊工程學系 | - |
| 顯示於系所單位: | 資訊工程學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-111-2.pdf | 6.61 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
