惡劣天氣條件下基於關聯注意力機制融合雷達和光達進行物件偵測

陳柏維; Bo-Wei Chen

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/90134

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	李明穗	zh_TW
dc.contributor.advisor	Ming-Sui Lee	en
dc.contributor.author	陳柏維	zh_TW
dc.contributor.author	Bo-Wei Chen	en
dc.date.accessioned	2023-09-22T17:33:23Z	-
dc.date.available	2023-11-09	-
dc.date.copyright	2023-09-22	-
dc.date.issued	2023	-
dc.date.submitted	2023-08-11	-
dc.identifier.citation	X.Bai, Z.Hu,X.Zhu,Q.Huang,Y.Chen,H.Fu,andC.-L.Tai. Transfusion: Robust lidar-camera fusion for 3d object detection with transformers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1090 1099, 2022. D.Barnes,M.Gadd,P.Murcutt,P.Newman,andI.Posner. Theoxfordradarrobotcar dataset: A radar extension to the oxford robotcar dataset. In 2020 IEEE International Conference on Robotics and Automation (ICRA), pages 6433–6438. IEEE, 2020. A. Barrera, C. Guindel, J. Beltrán, and F. García. Birdnet+: End-to-end 3d object detection in lidar bird＇s eye view. In 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC), pages 1–6. IEEE, 2020. H. Caesar, V. Bankiti, A. H. Lang, S. Vora, V. E. Liong, Q. Xu, A. Krishnan, Y. Pan, G. Baldan, and O. Beijbom. nuscenes: A multimodal dataset for autonomous driv ing. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11621–11631, 2020. S.-Y. Chu and M.-S. Lee. Mt-detr: Robust end-to-end multimodal detection with confidence fusion. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 5252–5261, 2023. O.-R. A. D. O. Committee. Taxonomy and Definitions for Terms Related to Driving Automation Systems for On-Road Motor Vehicles, sep 2016. L. Dai, H. Liu, H. Tang, Z. Wu, and P. Song. Ao2-detr: Arbitrary-oriented ob ject detection transformer. IEEE Transactions on Circuits and Systems for Video Technology, 2022. Z. Ge, S. Liu, F. Wang, Z. Li, and J. Sun. Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430, 2021. R. Girshick. Fast r-cnn. In Proceedings of the IEEE international conference on computer vision, pages 1440–1448, 2015. J. Ku, M. Mozifian, J. Lee, A. Harakeh, and S. L. Waslander. Joint 3d proposal gen eration and object detection from view aggregation. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1–8. IEEE, 2018. A. H. Lang, S. Vora, H. Caesar, L. Zhou, J. Yang, and O. Beijbom. Pointpillars: Fast encoders for object detection from point clouds. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12697–12705, 2019. P. Li, P. Wang, K. Berntorp, and H. Liu. Exploiting temporal relations on radar perception for autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17071–17080, 2022. Y. Li, A. W. Yu, T. Meng, B. Caine, J. Ngiam, D. Peng, J. Shen, Y. Lu, D. Zhou, Q. V. Le, et al. Deepfusion: Lidar-camera deep fusion for multi-modal 3d object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17182–17191, 2022. T. Liang, H. Xie, K. Yu, Z. Xia, Z. Lin, Y. Wang, T. Tang, B. Wang, and Z. Tang. Bevfusion: Asimpleandrobustlidar-camera fusion framework. AdvancesinNeural Information Processing Systems, 35:10421–10434, 2022. T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, pages 2980–2988, 2017. I. Loshchilov and F. Hutter. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017. J. Mao, Y. Xue, M. Niu, H. Bai, J. Feng, X. Liang, H. Xu, and C. Xu. Voxel transformer for 3d object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3164–3173, 2021. A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019. C. R. Qi, H. Su, K. Mo, and L. J. Guibas. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 652–660, 2017. C. R. Qi, L. Yi, H. Su, and L. J. Guibas. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Advances in neural information processing systems, 30, 2017. G. Qian, Y. Li, H. Peng, J. Mai, H. Hammoud, M. Elhoseiny, and B. Ghanem. Point next: Revisiting pointnet++ with improved training and scaling strategies. Advances in Neural Information Processing Systems, 35:23192–23204, 2022. K. Qian, S. Zhu, X. Zhang, and L. E. Li. Robust multimodal vehicle detection in foggy weather using complementary lidar and radar signals. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 444 453, 2021. C. Reading, A. Harakeh, J. Chae, and S. L. Waslander. Categorical depth distribu tion network for monocular 3d object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8555–8564, 2021. J. Redmon, S. Divvala, R. Girshick, and A. Farhadi. You only look once: Unified, real-time object detection. InProceedingsoftheIEEEconferenceoncomputervision and pattern recognition, pages 779–788, 2016. J. Redmon and A. Farhadi. Yolo9000: better, faster, stronger. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7263–7271, 2017. J. Redmon and A. Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. S. Ren, K. He, R. Girshick, and J. Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28, 2015. M. Sheeny, E. De Pellegrin, S. Mukherjee, A. Ahrabian, S. Wang, and A. Wallace. Radiate: A radar dataset for automotive perception in bad weather. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 1–7. IEEE, 2021. S. Shi, C. Guo, L. Jiang, Z. Wang, J. Shi, X. Wang, and H. Li. Pv-rcnn: Point-voxel feature set abstraction for 3d object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10529–10538, 2020. X. Shi, Q. Ye, X. Chen, C. Chen, Z. Chen, and T.-K. Kim. Geometry-based distance decomposition for monocular 3d object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 15172–15181, 2021. P. Sun, H. Kretzschmar, X. Dotiwalla, A. Chouard, V. Patnaik, P. Tsui, J. Guo, Y. Zhou, Y. Chai, B. Caine, et al. Scalability in perception for autonomous driv ing: Waymoopendataset. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2446–2454, 2020. Z. Tian, C. Shen, H. Chen, and T. He. Fcos: Fully convolutional one-stage object detection. In Proceedings of the IEEE/CVF international conference on computer vision, pages 9627–9636, 2019. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017. T.Wang,X.Zhu,J.Pang,andD.Lin. Fcos3d: Fullyconvolutionalone-stagemonoc ular 3d object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 913–922, 2021. Y. Wang, V. C. Guizilini, T. Zhang, Y. Wang, H. Zhao, and J. Solomon. Detr3d: 3d object detection from multi-view images via 3d-to-2d queries. In Conference on Robot Learning, pages 180–191. PMLR, 2022. Z. Wang, W. Zhan, and M. Tomizuka. Fusing bird＇s eye view lidar point cloud and front view camera image for 3d object detection. In 2018 IEEE intelligent vehicles symposium (IV), pages 1–6. IEEE, 2018. Z. Yang, Y. Sun, S. Liu, and J. Jia. 3dssd: Point-based 3d single stage object de tector. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11040–11048, 2020. S.Zhang,C.Chi,Y.Yao,Z.Lei,andS.Z.Li. Bridgingthegapbetweenanchor-based and anchor-free detection via adaptive training sample selection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9759 9768, 2020. Y. Zhou, L. Liu, H. Zhao, M. López-Benítez, L. Yu, and Y. Yue. Towards deep radar perception for autonomous driving: Datasets, methods, and challenges. Sensors, 22(11):4208, 2022. Y.ZhouandO.Tuzel. Voxelnet: End-to-end learning for point cloud based 3d object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4490–4499, 2018. X. Zhu, W. Su, L. Lu, B. Li, X. Wang, and J. Dai. Deformable detr: Deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159, 2020.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/90134	-
dc.description.abstract	隨著深度學習技術的不斷發展，物件偵測的準確性也日益提高。自動駕駛 Level 5 的實現已經近在眼前。在良好的天氣條件下，物件偵測的平均精確度可以高達百分之八十五以上。然而，天氣並非時時都理想，有時候會下雨、起霧，甚至下雪，這種惡劣天氣會大幅降低物件偵測的準確性。傳統的感測器，如攝像頭和LiDAR，都容易受到惡劣天氣的影響。因此，我們採用RADAR和LiDAR的融合來進行物件偵測。RADAR在惡劣環境下不受影響，但會產生許多噪點雲。因此，我們需要使用LiDAR作為輔助，因為LiDAR 能提供精確的環境點雲信息，有助於減少虛擬偵測。我們使用注意力機制來融合LiDAR和RADAR的特徵。同時，我們提出了特徵選取模塊（Feature Selection Module），解決了注意力機制中關注權重的問題。此外，我們還提出了關聯融合模塊（Associative Feature Fusion Module），充分利用注意力機制選取的特徵。通過實驗證明，我們提出的模型優於目前最先進的 RADAR和LiDAR模型。	zh_TW
dc.description.abstract	With the continuous development of deep learning technology, the accuracy of object detection has been steadily improving. The realization of Level 5 autonomous driving is within reach. In favorable weather conditions, the average accuracy of object detection can reach over 85 percent. However, the weather is not always ideal, and conditions such as rain, fog, and even snow can significantly reduce the accuracy of object detection. Traditional sensors like cameras and LiDAR are susceptible to the influence of harsh weather conditions. Therefore, we adopt a fusion of RADAR and LiDAR for object de tection. RADAR is unaffected by adverse environmental conditions but introduces a lot of noisy point clouds. Hence, we utilize LiDAR as an auxiliary sensor because it provides accurate environmental point cloud information, which helps mitigate ghost detection. We employ an attention mechanism to fuse the features from LiDAR and RADAR. Additionally, we propose a Feature Selection Module to address the issue of attention weights in the attention mechanism. Furthermore, we introduce an Associative Feature Fusion Module to fully utilize the selected features from the attention mechanism. Through experiments, we demonstrate that our proposed model outperforms the state-of-the-art RADAR and LiDAR models.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-09-22T17:33:23Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2023-09-22T17:33:23Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	Verification Letter from the Oral Examination Committee i Acknowledgements ii 摘要 iii Abstract iv Contents vi List of Figures viii List of Tables x Chapter 1 Introduction 1 Chapter 2 Related Work 6 2.1bUnimodal Sensor Detection 6 2.1.1 Radar-Only Dtection 7 2.1.2 LiDAR-Only Detection 8 2.2 Multimodal Sensor Fusion Detection 9 2.2.1 Lidar and Camera Fusion Detection 10 2.2.2 Lidar and Radar Fusion Detection 10 Chapter 3 Method 12 3.1Framework Overview 13 3.2Feature Selection Module 14 3.2.1Feature Spatial Selection 14 3.2.2Feature Channel Selection 14 3.3Associative Feature Fusion Module 15 Chapter4 Experiments 17 4.1Dataset Description 17 4.2Implement Detail 18 4.2.1Evaluation Metrics 18 4.3Comparison with Only Radar 19 4.4Comparison with Only LiDAR 19 4.5Comparison with Radar and LiDAR 20 4.6Ablation Study 21 4.6.1Comparison with Other Feature Fusion Operation 21 4.6.2Ablation of Model Components 22 4.7Discussion 23 Chapter5 Conclusion 29 References 31	-
dc.language.iso	en	-
dc.subject	多模態物件偵測	zh_TW
dc.subject	深度學習	zh_TW
dc.subject	基於注意力機制進行特徵融合	zh_TW
dc.subject	deep learning	en
dc.subject	multimodal object detection	en
dc.subject	feature fusion based on attention mechanism	en
dc.title	惡劣天氣條件下基於關聯注意力機制融合雷達和光達進行物件偵測	zh_TW
dc.title	Fusion of Radar and LiDAR Using Associative Mechanism for Object Detection in Adverse Weather Conditions	en
dc.type	Thesis	-
dc.date.schoolyear	111-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	葉梅珍;李界羲	zh_TW
dc.contributor.oralexamcommittee	Mei-Chen Yeh;Li-Jie Xi	en
dc.subject.keyword	深度學習,多模態物件偵測,基於注意力機制進行特徵融合,	zh_TW
dc.subject.keyword	deep learning,multimodal object detection,feature fusion based on attention mechanism,	en
dc.relation.page	36	-
dc.identifier.doi	10.6342/NTU202302360	-
dc.rights.note	同意授權(全球公開)	-
dc.date.accepted	2023-08-12	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	資訊網路與多媒體研究所	-
顯示於系所單位：	資訊網路與多媒體研究所

文件中的檔案：

檔案	大小	格式
ntu-111-2.pdf	5.12 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。