Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電信工程學研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/94258
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor丁建均zh_TW
dc.contributor.advisorJian-Jiun Dingen
dc.contributor.author李振勳zh_TW
dc.contributor.authorZhen-Xun Leeen
dc.date.accessioned2024-08-15T16:29:06Z-
dc.date.available2024-08-16-
dc.date.copyright2024-08-15-
dc.date.issued2024-
dc.date.submitted2024-08-08-
dc.identifier.citation[1] S. Ren, K. He, R. Girshick, and J. Sun, "Faster R-CNN: Towards real-time object detection with region proposal networks," IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137-1149, June 2016.
[2] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You only look once: Unified, real-time object detection," in IEEE Conf. Computer Vision and Pattern Recognition, pp. 779-788, 2016.
[3] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Y. Fu, and A. C. Berg, "SSD: Single shot multibox detector," in European Conf. Computer Vision, pp. 21-37, 2016.
[4] A. Bewley, Z. Ge, L. Ott, F. Ramos, and B. Upcroft, "Simple online and realtime tracking," in IEEE Int. Conf. Image Processing, pp. 3464-3468, 2016.
[5] Wojke, N., Bewley, A., & Paulus, D. (2017, September). Simple online and realtime tracking with a deep association metric. In 2017 IEEE international conference on image processing (ICIP) (pp. 3645-3649). IEEE.
[6] Zhou, X., Koltun, V., & Krähenbühl, P. (2020, August). Tracking objects as points. In European conference on computer vision (pp. 474-490). Cham: Springer International Publishing.
[7] Sun, P., Cao, J., Jiang, Y., Zhang, R., Xie, E., Yuan, Z., ... & Luo, P. (2020). Transtrack: Multiple object tracking with transformer. arXiv preprint arXiv:2012.15460.
[8] Meinhardt, T., Kirillov, A., Leal-Taixe, L., & Feichtenhofer, C. (2022). Trackformer: Multi-object tracking with transformers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 8844-8854).
[9] Zhang, Y., Wang, C., Wang, X., Zeng, W., & Liu, W. (2021). Fairmot: On the fairness of detection and re-identification in multiple object tracking. International journal of computer vision, No. 129, pp. 3069-3087.
[10] Zhang, Y., Sun, P., Jiang, Y., Yu, D., Weng, F., Yuan, Z., ... & Wang, X. (2022, October). Bytetrack: Multi-object tracking by associating every detection box. In European conference on computer vision (pp. 1-21). Cham: Springer Nature Switzerland.
[11] Pang, J., Qiu, L., Li, X., Chen, H., Li, Q., Darrell, T., & Yu, F. (2021). Quasi-dense similarity learning for multiple object tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 164-173).
[12] Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., & Tian, Q. (2015). Scalable person re-identification: A benchmark. In Proceedings of the IEEE international conference on computer vision (pp. 1116-1124).
[13] Miao, J., Wu, Y., Liu, P., Ding, Y., & Yang, Y. (2019). Pose-guided feature alignment for occluded person re-identification. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 542-551).
[14] Sun, Y., Zheng, L., Yang, Y., Tian, Q., & Wang, S. (2018). Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline). In Proceedings of the European conference on computer vision (ECCV) (pp. 480-496).
[15] Yu, H. X., Wu, A., & Zheng, W. S. (2017). Cross-view asymmetric metric learning for unsupervised person re-identification. In Proceedings of the IEEE international conference on computer vision (pp. 994-1002).
[16] Paisitkriangkrai, S., Shen, C., & Van Den Hengel, A. (2015). Learning to rank in person re-identification with metric ensembles. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1846-1855).
[17] Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International journal of computer vision, 60, pp. 91-110.
[18] Kalman, R. E. (1960). A new approach to linear filtering and prediction problems. Journal of Basic Engineering, 82(1), 35-45. doi:10.1115/1.3662552
[19] Geiger, A., Lenz, P., & Urtasun, R. (2012, June). Are we ready for autonomous driving? the kitti vision benchmark suite. In 2012 IEEE conference on computer vision and pattern recognition (pp. 3354-3361). IEEE.
[20] Milan, A., Leal-Taixé, L., Reid, I., Roth, S., & Schindler, K. (2016). MOT16: A benchmark for multi-object tracking. arXiv preprint arXiv:1603.00831.
[21] Gelb, A. (1974). Applied Optimal Estimation. MIT Press.
[22] Julier, S. J., & Uhlmann, J. K. (1997, July). New extension of the Kalman filter to nonlinear systems. In Signal processing, sensor fusion, and target recognition VI (Vol. 3068, pp. 182-193). Spie.
[23] Bernardin, K., & Stiefelhagen, R. (2008). Evaluating multiple object tracking performance: the clear mot metrics. EURASIP Journal on Image and Video Processing, 2008, 1-10.
[24] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., & Zagoruyko, S. (2020, August). End-to-end object detection with transformers. In European conference on computer vision (pp. 213-229). Cham: Springer International Publishing.
[25] Zhang, H., Li, F., Liu, S., Zhang, L., Su, H., Zhu, J., ... & Shum, H. Y. (2022). Dino: Detr with improved denoising anchor boxes for end-to-end object detection. arXiv preprint arXiv:2203.03605.
[26] Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., & Tian, Q. (2019). Centernet: Keypoint triplets for object detection. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 6569-6578).
[27] Chaabane, M., Zhang, P., Beveridge, J. R., & O'Hara, S. (2021). Deft: Detection embeddings for tracking. arXiv preprint arXiv:2102.02267.
[28] J. Cao, J. Pang, X. Weng, R. Khirodkar and K. Kitani. "Observation-centric sort: Rethinking sort for robust multi-object tracking," in IEEE/CVF Conf. Computer Vision and Pattern Recognition, pp. 9686-9696, 2023.
[29] Yi, K., Luo, K., Luo, X., Huang, J., Wu, H., Hu, R., & Hao, W. (2024, March). Ucmctrack: Multi-object tracking with uniform camera motion compensation. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 38, No. 7, pp. 6702-6710).
[30] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/94258-
dc.description.abstract隨著自動駕駛技術的蓬勃發展,多物件追蹤的需求日益迫切。本研究提出兩種創新方法以應對複雜道路環境下的挑戰。
首先,我們提出基於關鍵點的多物件追蹤方法。該方法使用無錨點的關鍵點檢測,降低計算資源需求,同時通過改良的深度學習模型架構提取穩健特徵。我們引入背景抑制方法降低誤判,並結合前後幀信息擷取物件運動特徵。針對遮蔽和過曝情況,我們創新性地結合卡爾曼濾波器與檢測器,根據檢測信心動態調整追蹤策略。實驗結果表明,該方法在KITTI追蹤資料集上達到了92.51%的MOTA,優於現有方法,同時保持了約18 FPS的實時性能。
其次,針對密集複雜場景,我們提出基於注意力機制的追蹤方法。該方法結合transformer架構的DINO檢測器和多頭注意力機制,有效捕捉長期物件關聯。我們還引入重識別機制,增強長時間遮蔽後的追蹤能力。在MOT17數據集上,該方法達到74.8%的MOTA,特別適合處理複雜密集場景。
兩種方法各具優勢,為不同場景下的多物件追蹤提供了有效解決方案。前者在計算效率和通用性方面表現優異,後者則在處理複雜場景和長期依賴關係方面更具優勢。
zh_TW
dc.description.abstractWith the rapid development of autonomous driving technology, the demand for efficient multiple object tracking has become increasingly urgent. This study proposes two innovative methods to address the challenges in complex road environments.
First, we introduce a keypoint-based multiple object tracking method. This approach utilizes anchor-free keypoint detection to reduce computational resources while extracting robust features through an improved deep learning model architecture. We implement a background suppression technique to minimize false detections and incorporate information from adjacent frames to capture object motion characteristics. To address occlusion and overexposure scenarios, we innovatively combine a Kalman filter with the detector, dynamically adjusting the tracking strategy based on detection confidence. Experimental results demonstrate that this method achieves a MOTA of 92.51% on the KITTI tracking dataset, outperforming existing methods while maintaining real-time performance at approximately 18 FPS.
Second, targeting dense and complex scenarios, we propose an attention-based tracking method. This approach integrates a DINO detector with transformer architecture and a multi-head attention mechanism, effectively capturing long-term object associations. We also incorporate a re-identification mechanism to enhance tracking capabilities after prolonged occlusions. On the MOT17 dataset, this method achieves a MOTA of 74.8%, particularly excelling in handling complex, dense scenarios.
Both methods offer unique advantages, providing effective solutions for multiple object tracking in various scenarios. The former excels in computational efficiency and versatility, while the latter demonstrates superior performance in handling complex scenes and long-term dependencies. Future research will explore combining the strengths of both methods to further enhance multiple object tracking performance.
en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-08-15T16:29:06Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2024-08-15T16:29:06Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontents誌謝 i
中文摘要 ii
ABSTRACT iii
CONTENTS v
LIST OF FIGURES vii
LIST OF TABLES viii
Chapter 1 Introduction 1
Chapter 2 Related Works 4
2.1 Tracking-by-Detection 4
2.2 Tracking-by-Attention 5
2.3 Hybrid and Recent Innovative Approaches 6
2.4 Re-identification in MOT 6
2.5 Summary and Future Trends 8
Chapter 3 Proposed Method – Point-based Jointly Detection-and-Tracking 10
3.1 Motivation 10
3.2 Architecture of proposed method 13
3.3 Experiments 22
3.4 Implementation Details 29
Chapter 4 Proposed Method -Tracking-by-Attention 31
4.1 Motivation 31
4.2 Architecture of our proposed method 32
4.3 Experiments 43
4.4 Implementation Details 51
Chapter 5 Conclusion 54
5.1 Summary of Contributions 54
5.2 Key Findings 55
5.3 Limitations 55
5.4 Future Work 55
REFERENCE 57
-
dc.language.isoen-
dc.title基於卡爾曼濾波之條件檢測和注意力機制的多物件追蹤演算法zh_TW
dc.titleMultiple Object Tracking Algorithm with Kalman Filter Aided Conditioned Detection and Attention Mechanismen
dc.typeThesis-
dc.date.schoolyear112-2-
dc.description.degree碩士-
dc.contributor.oralexamcommittee郭景明;許文良;余執彰zh_TW
dc.contributor.oralexamcommitteeJing-Ming Guo;Wen-Liang Hsue;Chih-Chang Yuen
dc.subject.keyword多物件追蹤,特徵擷取,關鍵點偵測,卡爾曼濾波,背景抑制,注意力機制,電腦視覺,zh_TW
dc.subject.keywordMultiple Object Tracking,Feature Extraction,Keypoint Detection,Kalman Filter,Background Suppression,Attention Mechanism,Computer Vision,en
dc.relation.page60-
dc.identifier.doi10.6342/NTU202404018-
dc.rights.note同意授權(全球公開)-
dc.date.accepted2024-08-12-
dc.contributor.author-college電機資訊學院-
dc.contributor.author-dept電信工程學研究所-
顯示於系所單位:電信工程學研究所

文件中的檔案:
檔案 大小格式 
ntu-112-2.pdf24.49 MBAdobe PDF檢視/開啟
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved