深度增強追蹤器減輕多目標遮擋和同質外觀問題於室內追蹤

劉正仁; CHENG-JEN LIU

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/85863

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	林宗男	zh_TW
dc.contributor.advisor	Tsung-Nan Lin	en
dc.contributor.author	劉正仁	zh_TW
dc.contributor.author	CHENG-JEN LIU	en
dc.date.accessioned	2023-03-19T23:26:51Z	-
dc.date.available	2023-11-10	-
dc.date.copyright	2022-09-27	-
dc.date.issued	2022	-
dc.date.submitted	2002-01-01	-
dc.identifier.citation	[1] R. T. Collins, A. J. Lipton, T. Kanade, H. Fujiyoshi, D. Duggins, Y. Tsin, D. Tolliver, N. Enomoto, O. Hasegawa, P. Burt, et al., “A system for video surveillance and monitoring,” VSAM final report, vol. 2000, no. 1-68, p. 1, 2000. [2] J. Levinson, J. Askeland, J. Becker, J. Dolson, D. Held, S. Kammel, J. Z. Kolter, D. Langer, O. Pink, V. Pratt, et al., “Towards fully autonomous driving: Systems and algorithms,” in 2011 IEEE intelligent vehicles symposium (IV), pp. 163–168, IEEE, 2011. [3] T. Zhang, B. Ghanem, and N. Ahuja, “Robust multi-object tracking via cross-domain contextual information for sports video analysis,” in 2012 ieee international conference on acoustics, speech and signal processing (icassp), pp. 985–988, IEEE, 2012. [4] S. Albawi, T. A. Mohammed, and S. Al-Zawi, “Understanding of a convolutional neural network,” in 2017 International Conference on Engineering and Technology (ICET), pp. 1–6, Ieee, 2017. [5] C. Ma, Y. Li, F. Yang, Z. Zhang, Y. Zhuang, H. Jia, and X. Xie, “Deep association: End-to-end graph-based learning for multiple object tracking with conv-graph neural network,” in Proceedings of the 2019 on International Conference on Multimedia Retrieval, pp. 253–261, 2019. [6] S. Sun, N. Akhtar, H. Song, A. Mian, and M. Shah, “Deep affinity network for multiple object tracking,” IEEE transactions on pattern analysis and machine intelligence, vol. 43, no. 1, pp. 104–119, 2019. [7] P. Chu and H. Ling, “Famnet: Joint learning of feature, affinity and multi-dimensional assignment for online multiple object tracking,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6172–6181, 2019. [8] A. Milan, L. Leal-Taix ́e, I. Reid, S. Roth, and K. Schindler, “Mot16: A benchmark for multi-object tracking,” arXiv preprint arXiv:1603.00831, 2016. [9] A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the kitti vision benchmark suite,” in 2012 IEEE conference on computer vision and pattern recognition, pp. 3354–3361, IEEE, 2012. [10] L. Wen, D. Du, Z. Cai, Z. Lei, M.-C. Chang, H. Qi, J. Lim, M.-H. Yang, and S. Lyu, “Ua-detrac: A new benchmark and protocol for multi-object detection and tracking,” Computer Vision and Image Understanding, vol. 193, p. 102907, 2020. [11] F. Lourenc ̧o and H. Araujo, “Intel realsense sr305, d415 and l515: Experimental evaluation and comparison of depth estimation.,” in VISIGRAPP (4: VISAPP), pp. 362–369, 2021. [12] C.-J. Liu and T.-N. Lin, “Det: Depth-enhanced tracker to mitigate severe occlusion and homogeneous appearance problems for indoor multiple-object tracking,” IEEE Access, vol. 10, pp. 8287–8304, 2022. [13] J. Ferryman and A. Shahrokni, “Pets2009: Dataset and challenge,” in 2009 Twelfth IEEE international workshop on performance evaluation of tracking and surveillance, pp. 1–6, IEEE, 2009. [14] P. Zhu, L. Wen, D. Du, X. Bian, H. Ling, Q. Hu, H. Wu, Q. Nie, H. Cheng, C. Liu, et al., “Visdrone-vdt2018: The vision meets drone video detection and tracking challenge results,” in Proceedings of the European Conference on Computer Vision(ECCV) Workshops, pp. 0–0, 2018. [15] M. Fabbri, F. Lanzi, S. Calderara, A. Palazzi, R. Vezzani, and R. Cucchiara, “Learning to detect and track visible and occluded body joints in a virtual world,” in Proceedings of the European conference on computer vision (ECCV), pp. 430–446, 2018. [16] P. Dendorfer, H. Rezatofighi, A. Milan, J. Shi, D. Cremers, I. Reid, S. Roth, K. Schindler, and L. Leal-Taix ́e, “Mot20: A benchmark for multi object tracking in crowded scenes,” arXiv preprint arXiv:2003.09003, 2020. [17] P. Sun, H. Kretzschmar, X. Dotiwalla, A. Chouard, V. Patnaik, P. Tsui, J. Guo, Y. Zhou, Y. Chai, B. Caine, et al., “Scalability in perception for autonomous driving: Waymo open dataset,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2446–2454, 2020. [18] P. Wang, X. Huang, X. Cheng, D. Zhou, Q. Geng, and R. Yang, “The apolloscape open dataset for autonomous driving and its application,” IEEE transactions on pattern analysis and machine intelligence, 2019. [19] W. Brendel, M. Amer, and S. Todorovic, “Multiobject tracking as maximum weight independent set,” in CVPR 2011, pp. 1273–1280, IEEE, 2011. [20] M. Conforti, G. Cornu ́ejols, G. Zambelli, et al., Integer programming, vol. 271. Springer, 2014. [21] L. Zhang, Y. Li, and R. Nevatia, “Global data association for multi-object tracking using network flows,” in 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8, IEEE, 2008. [22] G. Bras ́o and L. Leal-Taix ́e, “Learning a neural solver for multiple object tracking,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6247–6257, 2020. [23] L. R. Ford Jr, “Network flow theory,” tech. rep., Rand Corp Santa Monica Ca, 1956. [24] K. Xu, W. Hu, J. Leskovec, and S. Jegelka, “How powerful are graph neural networks?,” arXiv preprint arXiv:1810.00826, 2018. [25] R. Jonker and T. Volgenant, “Improving the hungarian assignment algorithm,” Operations Research Letters, vol. 5, no. 4, pp. 171–175, 1986. [26] I. Papakis, A. Sarkar, and A. Karpatne, “Gcnnmatch: Graph convolutional neural networks for multi-object tracking via sinkhorn normalization,” arXiv preprint arXiv:2010.00067, 2020. [27] H. Lee, I. Kim, and D. Kim, “Van: Versatile affinity network for end-to-end online multi-object tracking,” in Proceedings of the Asian Conference on Computer Vision, 2020. [28] F. Yang, W. Choi, and Y. Lin, “Exploit all the layers: Fast and accurate cnn object detector with scale dependent pooling and cascaded rejection classifiers,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2129–2137, 2016. [29] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, “Ssd: Single shot multibox detector,” in European conference on computer vision, pp. 21–37, Springer, 2016. [30] J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,” arXiv preprint arXiv:1804.02767, 2018. [31] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” Advances in neural information processing systems, vol. 28, pp. 91–99, 2015. [32] P. Bergmann, T. Meinhardt, and L. Leal-Taixe, “Tracking without bells and whistles,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 941–951, 2019. [33] R. J. Meinhold and N. D. Singpurwalla, “Understanding the kalman filter,” The American Statistician, vol. 37, no. 2, pp. 123–127, 1983. [34] N. Wojke, A. Bewley, and D. Paulus, “Simple online and realtime tracking with a deep association metric,” in 2017 IEEE international conference on image processing (ICIP), pp. 3645–3649, IEEE, 2017. [35] W. Feng, Z. Hu, W. Wu, J. Yan, and W. Ouyang, “Multi-object tracking with multiple cues and switcher-aware classification,” arXiv preprint arXiv:1901.06129, 2019. [36] Q. Chu, W. Ouyang, H. Li, X. Wang, B. Liu, and N. Yu, “Online multi-object tracking using cnn-based single object tracker with spatial-temporal attention mechanism,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 4836–4845, 2017. [37] A. Bewley, Z. Ge, L. Ott, F. Ramos, and B. Upcroft, “Simple online and realtime tracking,” in 2016 IEEE international conference on image processing (ICIP), pp. 3464–3468, IEEE, 2016. [38] B. Sekachev, N. Manovich, M. Zhiltsov, A. Zhavoronkov, D. Kalinin, B. Hoff, TOsmanov, D. Kruchinin, A. Zankevich, DmitriySidnev, M. Markelov, Johannes222, M. Chenuet, a andre, telenachos, A. Melnikov, J. Kim, L. Ilouz, N. Glazov, Priya4607, R. Tehrani, S. Jeong, V. Skubriev, S. Yonekura, vugia truong, zliang7, lizhming, and T. Truong, “opencv/cvat: v1.1.0,” Aug. 2020. [39] R. Ranftl, K. Lasinger, D. Hafner, K. Schindler, and V. Koltun, “Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer,” arXiv preprint arXiv:1907.01341, 2019. [40] K. He, G. Gkioxari, P. Doll ́ar, and R. Girshick, “Mask r-cnn,” in Proceedings of the IEEE international conference on computer vision, pp. 2961–2969, 2017. [41] D. Sun, X. Yang, M.-Y. Liu, and J. Kautz, “Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8934–8943, 2018. [42] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016. [43] F. Schroff, D. Kalenichenko, and J. Philbin, “Facenet: A unified embedding for face recognition and clustering,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 815–823, 2015. [44] J. Luiten, A. Osep, P. Dendorfer, P. Torr, A. Geiger, L. Leal-Taix ́e, and B. Leibe, “Hota: A higher order metric for evaluating multi-object tracking,” International journal of computer vision, vol. 129, no. 2, pp. 548–578, 2021. [45] K. Bernardin, A. Elbs, and R. Stiefelhagen, “Multiple object tracking performance metrics and evaluation in a smart room environment,” in Sixth IEEE International Workshop on Visual Surveillance, in conjunction with ECCV, vol. 90, Citeseer, 2006. [46] E. Ristani, F. Solera, R. Zou, R. Cucchiara, and C. Tomasi, “Performance measures and a data set for multi-target, multi-camera tracking,” in European conference on computer vision, pp. 17–35, Springer, 2016. [47] R. Girshick, F. Iandola, T. Darrell, and J. Malik, “Deformable part models are convolutional neural networks,” in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 437–446, 2015. [48] F. Yu, W. Li, Q. Li, Y. Liu, X. Shi, and J. Yan, “Poi: Multiple object tracking with high performance detection and appearance feature,” in European Conference on Computer Vision, pp. 36–42, Springer, 2016. [49] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll ́ar, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in European conference on computer vision, pp. 740–755, Springer, 2014. [50] S. M. H. Miangoleh, S. Dille, L. Mai, S. Paris, and Y. Aksoy, “Boosting monocular depth estimation models to high-resolution via content-adaptive multi-resolution merging,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9685–9694, 2021. [51] D. Bolya, C. Zhou, F. Xiao, and Y. J. Lee, “Yolact: Real-time instance segmentation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9157–9166, 2019. [52] V. Sovrasov, “Flops counter for convolutional networks in pytorch framework,” 2019. [53] J. Yin, W. Wang, Q. Meng, R. Yang, and J. Shen, “A unified object motion and affinity model for online multi-object tracking,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6768–6777, 2020. [54] Z. Wang, L. Zheng, Y. Liu, Y. Li, and S. Wang, “Towards real-time multi-object tracking,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XI 16, pp. 107–122, Springer, 2020. [55] C. Long, A. Haizhou, Z. Zijie, and S. Chong, “Real-time multiple people tracking with deeply learned candidate selection and person re-identification,” in ICME, 2018. [56] E. Mazor, A. Averbuch, Y. Bar-Shalom, and J. Dayan, “Interacting multiple model methods in target tracking: a survey,” IEEE Transactions on aerospace and electronic systems, vol. 34, no. 1, pp. 103–123, 1998.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/85863	-
dc.description.abstract	多目標追蹤長期以來一直是人們感興趣的議題，因為它在許多計算機視覺應用中發揮著重要作用。現有研究多為戶外追蹤設計，如影像監控和自動駕駛。然而，戶外追蹤場景中物體的行為不能完全反映室內追蹤環境中的挑戰。在戶外追蹤場景中，行人和車輛通常在一條簡單的直線路徑上均勻地從一個地方移動到另一個地方，且行人的外觀通常是差異巨大的。相比之下，在室內場景中，例如舞蹈編排表演，舞者的動態行為導致嚴重的遮擋，類似的表演服裝呈現出同質的外觀問題。室內追蹤中的這些嚴重遮擋和同質外觀問題導致現有追蹤器的性能明顯下降。在本文中，我們提出了一個深度增強的多目標追蹤框架和語義匹配策略與場景感知相結合親和力測量方法可顯著減輕遮擋和同質外觀的問題。此外，我們引入了室內追蹤數據集並增加了現有基準數據集的多樣性用於室內追蹤評估。我們設計實驗去評估我們的追蹤器和現有的追蹤器，在我們提出的室內追蹤數據集和最新的 MOT17 和 MOT20 測試數據集上，我們的方法始終如一在令人信服的 HOTA 指標上優於其他追蹤器。相較於實驗中第二好的追蹤器 DeepSORT 相比，我們提出的追蹤器大大降低身份轉換的數量將近 20\% 在我們提出的室內追蹤數據集中。	zh_TW
dc.description.abstract	Multiple-object tracking has long been a topic of interest since it plays an important role in many computer vision applications. Existing works are mostly designed for outdoor tracking, such as video surveillance and autonomous driving. However, the behaviors of objects in outdoor tracking scenarios do not fully reflect the tracking challenges in indoor tracking environments. In outdoor tracking scenarios, pedestrians and vehicles usually move uniformly from place to place on a simple straight path, and target appearances are usually different. In contrast, in indoor scenarios, such as choreographed performances, the dynamic behaviors of dancers lead to severe occlusions, and similar costumes present a homogeneous appearance problem. These severe occlusion and homogeneous appearance problems in indoor tracking lead to noticeable degradation in the performance of existing works. In this paper, we propose a depth-enhanced tracking-by-detection framework and a semantic matching strategy combined with a scene-aware affinity measurement method to mitigate occlusion and homogeneous appearance problems significantly. In addition, we introduce an indoor tracking dataset and increase the diversity of existing benchmark datasets for indoor tracking evaluation. We conduct experiments on both the proposed indoor tracking dataset and the latest MOT benchmarks, MOT17 and MOT20. The experimental results show that our method consistently outperforms other works on the convincing HOTA metric across the benchmarks and greatly reduces the number of identity switches by 20% compared to that of the second-best tracker, DeepSORT, in our proposed indoor MOT benchmark dataset.	en
dc.description.provenance	Made available in DSpace on 2023-03-19T23:26:51Z (GMT). No. of bitstreams: 1 U0001-2309202208262200.pdf: 12533643 bytes, checksum: 0313ff8e8e0eb2d07f4bc6fc7fb46f9a (MD5) Previous issue date: 2022	en
dc.description.tableofcontents	中文摘要 i Abstract iii 1 Introduction 1 2 MOT Related Works 5 2.1 MOT Benchmark Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 MOT Trackers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3 Proposed Dataset - NTU-MOTD 9 3.1 Dataset Collection Environment . . . . . . . . . . . . . . . . . . . . . . 9 3.2 Dataset Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.3 Ground-Truth Annotation . . . . . . . . . . . . . . . . . . . . . . . . . . 13 4 Proposed Method - Depth Enhanced Tracker 15 4.1 Extending the Tracking Space to Solve the Severe Occlusion Problem in Indoor Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 4.2 Scene-Aware Spatial Feature Selection and Appearance Feature Extraction 19 4.3 Semantic Matching Strategy for Solving the Homogeneous Appearance Problem in Indoor Tracking . . . . . . . . . . . . . . . . . . . . . . . . . 20 5 Experiments 25 5.1 Evaluation Dataset for Indoor Tracking . . . . . . . . . . . . . . . . . . 25 5.2 Evaluation Dataset for Outdoor Tracking . . . . . . . . . . . . . . . . . . 25 5.3 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 5.4 Ablation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 5.4.1 Tracking with different object detectors . . . . . . . . . . . . . . 27 5.4.2 Tracking with different depth estimation models . . . . . . . . . 28 5.4.3 Depth extraction with or without segmentation masks . . . . . . . 29 5.4.4 Tracking with different matching strategies . . . . . . . . . . . . 31 5.4.5 Tracking with different matching thresholds . . . . . . . . . . . . 31 5.4.6 Tracking with or without semantic matching strategy . . . . . . . 32 5.4.7 Tracking with different transition policies of finite-state machine . 32 5.4.8 Training scene detector with different types of input images . . . 34 5.4.9 Tracking with or without scene-aware affinity measurement . . . 36 5.4.10 Computational complexity of the proposed tracker . . . . . . . . 36 5.5 MOT Benchmark Evaluation . . . . . . . . . . . . . . . . . . . . . . . . 38 6 Conclusion 43 6.1 Limitations and Future Work . . . . . . . . . . . . . . . . . . . . . . . . 43 6.2 Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 Bibliography 45	-
dc.language.iso	zh_TW	-
dc.subject	物件追蹤	zh_TW
dc.subject	Multiple Object Tracking	en
dc.title	深度增強追蹤器減輕多目標遮擋和同質外觀問題於室內追蹤	zh_TW
dc.title	DET: Depth Enhanced Tracker to Mitigate Severe Occlusion and Homogeneous Appearance Problems for Indoor Multiple-Object Tracking	en
dc.type	Thesis	-
dc.date.schoolyear	111-1	-
dc.description.degree	碩士	-
dc.contributor.author-orcid	0000-0001-9753-4209
dc.contributor.advisor-orcid	林宗男(0000-0001-5659-1194)
dc.contributor.oralexamcommittee	鄧惟中;陳俊良;沈上翔	zh_TW
dc.contributor.oralexamcommittee	Wei-Chung Teng;Jiann-Liang Chen;Shan-Hsiang Shen	en
dc.subject.keyword	物件追蹤,	zh_TW
dc.subject.keyword	Multiple Object Tracking,	en
dc.relation.page	50	-
dc.identifier.doi	10.6342/NTU202203879	-
dc.rights.note	同意授權(全球公開)	-
dc.date.accepted	2022-09-26	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	電機工程學系	-
dc.date.embargo-lift	2022-09-27	-
顯示於系所單位：	電機工程學系

文件中的檔案：

檔案	大小	格式
ntu-111-1.pdf	12.24 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。