基於卡爾曼濾波之條件檢測和注意力機制的多物件追蹤演算法

李振勳; Zhen-Xun Lee

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/94258

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	丁建均	zh_TW
dc.contributor.advisor	Jian-Jiun Ding	en
dc.contributor.author	李振勳	zh_TW
dc.contributor.author	Zhen-Xun Lee	en
dc.date.accessioned	2024-08-15T16:29:06Z	-
dc.date.available	2024-08-16	-
dc.date.copyright	2024-08-15	-
dc.date.issued	2024	-
dc.date.submitted	2024-08-08	-
dc.identifier.citation	[1] S. Ren, K. He, R. Girshick, and J. Sun, "Faster R-CNN: Towards real-time object detection with region proposal networks," IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137-1149, June 2016. [2] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You only look once: Unified, real-time object detection," in IEEE Conf. Computer Vision and Pattern Recognition, pp. 779-788, 2016. [3] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Y. Fu, and A. C. Berg, "SSD: Single shot multibox detector," in European Conf. Computer Vision, pp. 21-37, 2016. [4] A. Bewley, Z. Ge, L. Ott, F. Ramos, and B. Upcroft, "Simple online and realtime tracking," in IEEE Int. Conf. Image Processing, pp. 3464-3468, 2016. [5] Wojke, N., Bewley, A., & Paulus, D. (2017, September). Simple online and realtime tracking with a deep association metric. In 2017 IEEE international conference on image processing (ICIP) (pp. 3645-3649). IEEE. [6] Zhou, X., Koltun, V., & Krähenbühl, P. (2020, August). Tracking objects as points. In European conference on computer vision (pp. 474-490). Cham: Springer International Publishing. [7] Sun, P., Cao, J., Jiang, Y., Zhang, R., Xie, E., Yuan, Z., ... & Luo, P. (2020). Transtrack: Multiple object tracking with transformer. arXiv preprint arXiv:2012.15460. [8] Meinhardt, T., Kirillov, A., Leal-Taixe, L., & Feichtenhofer, C. (2022). Trackformer: Multi-object tracking with transformers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 8844-8854). [9] Zhang, Y., Wang, C., Wang, X., Zeng, W., & Liu, W. (2021). Fairmot: On the fairness of detection and re-identification in multiple object tracking. International journal of computer vision, No. 129, pp. 3069-3087. [10] Zhang, Y., Sun, P., Jiang, Y., Yu, D., Weng, F., Yuan, Z., ... & Wang, X. (2022, October). Bytetrack: Multi-object tracking by associating every detection box. In European conference on computer vision (pp. 1-21). Cham: Springer Nature Switzerland. [11] Pang, J., Qiu, L., Li, X., Chen, H., Li, Q., Darrell, T., & Yu, F. (2021). Quasi-dense similarity learning for multiple object tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 164-173). [12] Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., & Tian, Q. (2015). Scalable person re-identification: A benchmark. In Proceedings of the IEEE international conference on computer vision (pp. 1116-1124). [13] Miao, J., Wu, Y., Liu, P., Ding, Y., & Yang, Y. (2019). Pose-guided feature alignment for occluded person re-identification. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 542-551). [14] Sun, Y., Zheng, L., Yang, Y., Tian, Q., & Wang, S. (2018). Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline). In Proceedings of the European conference on computer vision (ECCV) (pp. 480-496). [15] Yu, H. X., Wu, A., & Zheng, W. S. (2017). Cross-view asymmetric metric learning for unsupervised person re-identification. In Proceedings of the IEEE international conference on computer vision (pp. 994-1002). [16] Paisitkriangkrai, S., Shen, C., & Van Den Hengel, A. (2015). Learning to rank in person re-identification with metric ensembles. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1846-1855). [17] Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International journal of computer vision, 60, pp. 91-110. [18] Kalman, R. E. (1960). A new approach to linear filtering and prediction problems. Journal of Basic Engineering, 82(1), 35-45. doi:10.1115/1.3662552 [19] Geiger, A., Lenz, P., & Urtasun, R. (2012, June). Are we ready for autonomous driving? the kitti vision benchmark suite. In 2012 IEEE conference on computer vision and pattern recognition (pp. 3354-3361). IEEE. [20] Milan, A., Leal-Taixé, L., Reid, I., Roth, S., & Schindler, K. (2016). MOT16: A benchmark for multi-object tracking. arXiv preprint arXiv:1603.00831. [21] Gelb, A. (1974). Applied Optimal Estimation. MIT Press. [22] Julier, S. J., & Uhlmann, J. K. (1997, July). New extension of the Kalman filter to nonlinear systems. In Signal processing, sensor fusion, and target recognition VI (Vol. 3068, pp. 182-193). Spie. [23] Bernardin, K., & Stiefelhagen, R. (2008). Evaluating multiple object tracking performance: the clear mot metrics. EURASIP Journal on Image and Video Processing, 2008, 1-10. [24] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., & Zagoruyko, S. (2020, August). End-to-end object detection with transformers. In European conference on computer vision (pp. 213-229). Cham: Springer International Publishing. [25] Zhang, H., Li, F., Liu, S., Zhang, L., Su, H., Zhu, J., ... & Shum, H. Y. (2022). Dino: Detr with improved denoising anchor boxes for end-to-end object detection. arXiv preprint arXiv:2203.03605. [26] Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., & Tian, Q. (2019). Centernet: Keypoint triplets for object detection. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 6569-6578). [27] Chaabane, M., Zhang, P., Beveridge, J. R., & O'Hara, S. (2021). Deft: Detection embeddings for tracking. arXiv preprint arXiv:2102.02267. [28] J. Cao, J. Pang, X. Weng, R. Khirodkar and K. Kitani. "Observation-centric sort: Rethinking sort for robust multi-object tracking," in IEEE/CVF Conf. Computer Vision and Pattern Recognition, pp. 9686-9696, 2023. [29] Yi, K., Luo, K., Luo, X., Huang, J., Wu, H., Hu, R., & Hao, W. (2024, March). Ucmctrack: Multi-object tracking with uniform camera motion compensation. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 38, No. 7, pp. 6702-6710). [30] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/94258	-
dc.description.abstract	隨著自動駕駛技術的蓬勃發展,多物件追蹤的需求日益迫切。本研究提出兩種創新方法以應對複雜道路環境下的挑戰。首先,我們提出基於關鍵點的多物件追蹤方法。該方法使用無錨點的關鍵點檢測,降低計算資源需求,同時通過改良的深度學習模型架構提取穩健特徵。我們引入背景抑制方法降低誤判,並結合前後幀信息擷取物件運動特徵。針對遮蔽和過曝情況,我們創新性地結合卡爾曼濾波器與檢測器,根據檢測信心動態調整追蹤策略。實驗結果表明,該方法在KITTI追蹤資料集上達到了92.51%的MOTA,優於現有方法,同時保持了約18 FPS的實時性能。其次,針對密集複雜場景,我們提出基於注意力機制的追蹤方法。該方法結合transformer架構的DINO檢測器和多頭注意力機制,有效捕捉長期物件關聯。我們還引入重識別機制,增強長時間遮蔽後的追蹤能力。在MOT17數據集上,該方法達到74.8%的MOTA,特別適合處理複雜密集場景。兩種方法各具優勢,為不同場景下的多物件追蹤提供了有效解決方案。前者在計算效率和通用性方面表現優異,後者則在處理複雜場景和長期依賴關係方面更具優勢。	zh_TW
dc.description.abstract	With the rapid development of autonomous driving technology, the demand for efficient multiple object tracking has become increasingly urgent. This study proposes two innovative methods to address the challenges in complex road environments. First, we introduce a keypoint-based multiple object tracking method. This approach utilizes anchor-free keypoint detection to reduce computational resources while extracting robust features through an improved deep learning model architecture. We implement a background suppression technique to minimize false detections and incorporate information from adjacent frames to capture object motion characteristics. To address occlusion and overexposure scenarios, we innovatively combine a Kalman filter with the detector, dynamically adjusting the tracking strategy based on detection confidence. Experimental results demonstrate that this method achieves a MOTA of 92.51% on the KITTI tracking dataset, outperforming existing methods while maintaining real-time performance at approximately 18 FPS. Second, targeting dense and complex scenarios, we propose an attention-based tracking method. This approach integrates a DINO detector with transformer architecture and a multi-head attention mechanism, effectively capturing long-term object associations. We also incorporate a re-identification mechanism to enhance tracking capabilities after prolonged occlusions. On the MOT17 dataset, this method achieves a MOTA of 74.8%, particularly excelling in handling complex, dense scenarios. Both methods offer unique advantages, providing effective solutions for multiple object tracking in various scenarios. The former excels in computational efficiency and versatility, while the latter demonstrates superior performance in handling complex scenes and long-term dependencies. Future research will explore combining the strengths of both methods to further enhance multiple object tracking performance.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-08-15T16:29:06Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2024-08-15T16:29:06Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	誌謝 i 中文摘要 ii ABSTRACT iii CONTENTS v LIST OF FIGURES vii LIST OF TABLES viii Chapter 1 Introduction 1 Chapter 2 Related Works 4 2.1 Tracking-by-Detection 4 2.2 Tracking-by-Attention 5 2.3 Hybrid and Recent Innovative Approaches 6 2.4 Re-identification in MOT 6 2.5 Summary and Future Trends 8 Chapter 3 Proposed Method – Point-based Jointly Detection-and-Tracking 10 3.1 Motivation 10 3.2 Architecture of proposed method 13 3.3 Experiments 22 3.4 Implementation Details 29 Chapter 4 Proposed Method -Tracking-by-Attention 31 4.1 Motivation 31 4.2 Architecture of our proposed method 32 4.3 Experiments 43 4.4 Implementation Details 51 Chapter 5 Conclusion 54 5.1 Summary of Contributions 54 5.2 Key Findings 55 5.3 Limitations 55 5.4 Future Work 55 REFERENCE 57	-
dc.language.iso	en	-
dc.title	基於卡爾曼濾波之條件檢測和注意力機制的多物件追蹤演算法	zh_TW
dc.title	Multiple Object Tracking Algorithm with Kalman Filter Aided Conditioned Detection and Attention Mechanism	en
dc.type	Thesis	-
dc.date.schoolyear	112-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	郭景明;許文良;余執彰	zh_TW
dc.contributor.oralexamcommittee	Jing-Ming Guo;Wen-Liang Hsue;Chih-Chang Yu	en
dc.subject.keyword	多物件追蹤,特徵擷取,關鍵點偵測,卡爾曼濾波,背景抑制,注意力機制,電腦視覺,	zh_TW
dc.subject.keyword	Multiple Object Tracking,Feature Extraction,Keypoint Detection,Kalman Filter,Background Suppression,Attention Mechanism,Computer Vision,	en
dc.relation.page	60	-
dc.identifier.doi	10.6342/NTU202404018	-
dc.rights.note	同意授權(全球公開)	-
dc.date.accepted	2024-08-12	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	電信工程學研究所	-
顯示於系所單位：	電信工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-112-2.pdf	24.49 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。