基於相機網路的時空約束條件下之多目標多相機追蹤監視系統

劉安陞; An-Sheng Liu

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/99606

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	傅立成	zh_TW
dc.contributor.advisor	Li-Chen Fu	en
dc.contributor.author	劉安陞	zh_TW
dc.contributor.author	An-Sheng Liu	en
dc.date.accessioned	2025-09-17T16:07:21Z	-
dc.date.available	2025-09-18	-
dc.date.copyright	2025-09-17	-
dc.date.issued	2025	-
dc.date.submitted	2025-08-12	-
dc.identifier.citation	[1] Yuanlu Xu, Xiaobai Liu, Yang Liu, and Song-Chun Zhu. "Multi-view people tracking via hierarchical trajectory composition." In Proceedings of the IEEE conference on computer vision and pattern recognition, 2016. [2] Tatjana Chavdarova, Pierre Baqué, Stéphane Bouquet, Andrii Maksai, Cijo Jose, Timur Bagautdinov, Louis Lettry, Pascal Fua, Luc Van Gool, and François Fleuret. "Wildtrack: A multi-camera hd dataset for dense unscripted pedestrian detection." In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5030–5039, 2018. [3] An-Sheng Liu, Chiung-Tao Chen, Shih-Ting Hsu, and Li-Chen Fu. "Generalizable person re-identification with mixture of experts and domain invariance attention." Journal of Visual Communication and Image Representation (Under Review), 2025. [4] D. Reid. "An algorithm for tracking multiple targets." IEEE Transactions on Automatic Control, 24(6):843–854, 1979. [5] R Cherdchusakulchai, S Phimsiri, V Trairattanapa, S Tungjitnob, W Kudisthalert, P Kiawjak, E Thamwiwatthana, P Borisuitsawat, T Tosawadi, P Choppradit, K Mahakijdechachai, S Vatathanavaro, W Saetan, and V Suttichaya. "Online multi-camera people tracking with spatial-temporal mechanism and anchor-feature hierarchical clustering." In 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2024. [6] S Sakaguchi, M Amagasaki, M Kiyama, and T Okamoto. "Multi-Camera people tracking with Spatio-Temporal and group considerations." IEEE Access, 12:36066–36073, 2024. [7] C Y Yang, H W Huang, P K Kim, Z Jiang, K J Kim, C I Huang, H Du, and J N Hwang. "An online approach and evaluation method for tracking people across cameras in extremely long video sequence." In 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2024. [8] Yunhao Du, Junfeng Wan, Yanyun Zhao, Binyu Zhang, Zhihang Tong, and Junhao Dong. "Giaotracker: A comprehensive framework for mcmot with global information and optimizing strategies in visdrone 2021." In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, pages 2809–2819, October 2021. [9] Yuanlu Xu, Xiaobai Liu, Lei Qin, and Song-Chun Zhu. "Cross-View people tracking by Scene-Centered Spatio-Temporal parsing." Proceedings of the AAAI Conference on Artificial Intelligence, 31(1), 2017. [10] S Bergia, G Rizzo, M Gaetani, M Chiesa, and F Dominici. "ByteReID: An efficient online multi camera Multi-Person tracking system." In 2024 IEEE 18th International Conference on Application of Information and Communication Technologies (AICT), 2024. [11] A Specker. "OCMCTrack: Online Multi-Target Multi-Camera tracking with corrective matching cascade." In 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2024. [12] Y Gao, W Wu, A Liu, Q Liang, and J Hu. "Multi-Target Multi-Camera tracking with Spatial-Temporal network." In 2023 7th International Symposium on Computer Science and Intelligent Control (ISCSIC), 2023. [13] X Zhang and E Izquierdo. "Real-Time Multi-Target Multi-Camera tracking with Spatial-Temporal information." In 2019 IEEE Visual Communications and Image Processing (VCIP), 2019. [14] Yuhang He, Xing Wei, Xiaopeng Hong, Weiwei Shi, and Yihong Gong. "Multi-target multi-camera tracking by tracklet-to-target assignment." IEEE Transactions on Image Processing, 29:5191–5205, 2020. [15] Philipp Köhl, Andreas Specker, Arne Schumann, and Jürgen Beyerer. "The mta dataset for multi target multi camera pedestrian tracking by weighted distance aggregation." In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 4489–4498, 2020. [16] Y. Sun, L. Zheng, Y. Yang, Q. Tian, and S. Wang. "Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline)." In Proc. Eur. Conf. Comput. Vis. (ECCV), pages 501–518, 2018. [17] F. Zheng, C. Deng, X. Sun, X. Jiang, X. Guo, Z. Yu, F. Huang, and R. Ji. "Pyramidal person re-identification via multi-loss dynamic training." In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pages 8506–8514, 2019. [18] M. M. Kalayeh, E. Basaran, M. Gökmen, M. E. Kamasak, and M. Shah. "Human semantic parsing for person re-identification." In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pages 1062–1071, 2018. [19] S. Choi, T. Kim, M. Jeong, H. Park, and C. Kim. "Meta batch-instance normalization for generalizable person re-identification." In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pages 3425–3435, 2021. [20] Y.-F. Zhang, Z. Zhang, D. Li, Z. Jia, L. Wang, and T. Tan. "Learning domain invariant representations for generalizable person re-identification." IEEE Transactions on Image Processing, 32:509–523, 2023. [21] Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. "Rich feature hierarchies for accurate object detection and semantic segmentation." In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 580–587, 2014. [22] Ross Girshick. "Fast r-cnn." In 2015 IEEE International Conference on Computer Vision (ICCV), pages 1440–1448, 2015. [23] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. "Faster r-cnn: Towards real time object detection with region proposal networks." Advances in neural information processing systems, 28, 2015. [24] J Redmon. "You only look once: Unified, real-time object detection." In Proceedings of the IEEE conference on computer vision and pattern recognition, 2016. [25] Rejin Varghese and Sambath M. "Yolov8: A novel object detection algorithm with enhanced performance and robustness." In 2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS), pages 1–6, 2024. [26] Zheng Ge, Songtao Liu, Feng Wang, Zeming Li, and Jian Sun. "Yolox: Exceeding yolo series in 2021." arXiv preprint arXiv:2107.08430, 2021. [27] Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. "End-to-end object detection with transformers." In European conference on computer vision, pages 213–229. Springer, 2020. [28] Shangliang Xu, Xinxin Wang, Wenyu Lv, Qinyao Chang, Cheng Cui, Kaipeng Deng, Guanzhong Wang, Qingqing Dang, Shengyu Wei, Yuning Du, et al. "Pp-yoloe: An evolved version of yolo." arXiv preprint arXiv:2203.16250, 2022. [29] Mingzhan Yang, Guangxin Han, Bin Yan, Wenhua Zhang, Jinqing Qi, Huchuan Lu, and Dong Wang. "Hybrid-sort: Weak cues matter for online multi-object tracking." In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 6504–6512, 2024. [30] Hanjing Ye, Jieting Zhao, Yu Zhan, Weinan Chen, Li He, and Hong Zhang. "Person re-identification for robot person following with online continual learning." IEEE Robotics and Automation Letters, 9(11):9151–9158, 2024. [31] Min Young Lee, Christina Dao Wen Lee, Jianghao Li, and Marcelo H. Ang. "Dino mot: 3d multi-object tracking with visual foundation model for pedestrian reidentification using visual memory mechanism." IEEE Robotics and Automation Letters, 10(2):1202–1208, 2025. [32] T. Gong, K. Chen, L. Zhang, and J. Wang. "Debiased contrastive curriculum learning for progressive generalizable person re-identification." IEEE Transactions on Circuits and Systems for Video Technology, 33(10):5947–5958, Oct 2023. [33] S. He, H. Luo, P. Wang, F. Wang, H. Li, and W. Jiang. "Transreid: Transformer-based object re-identification." In Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), pages 15013–15022, 2021. [34] H.-M. Hu, W. Fang, B. Li, and Q. Tian. "An adaptive multi-projection metric learning for person re-identification across non-overlapping cameras." IEEE Transactions on Circuits and Systems for Video Technology, 29(9):2809–2821, Sep 2019. [35] H. Tan, X. Liu, Y. Bian, H. Wang, and B. Yin. "Incomplete descriptor mining with elastic loss for person re-identification." IEEE Transactions on Circuits and Systems for Video Technology, 32(1):160–171, Jan 2022. [36] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby. "An image is worth 16x16 words: Transformers for image recognition at scale." In Proc. Int. Conf. Learn. Represent. (ICLR), 2021. [37] H. Ni, Y. Li, L. Gao, H. T. Shen, and J. Song. "Part-aware transformer for generalizable person re-identification." In Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), pages 11280–11289, 2023. [38] Z. Cao, G. Hidalgo, T. Simon, S. Wei, and Y. Sheikh. "Openpose: Realtime multiperson 2d pose estimation using part affinity fields." IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(1):172–186, 2021. [39] A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W. Lo, P. Dollár, and R. Girshick. "Segment anything." arXiv preprint arXiv:2304.02643, 2023. [40] Betty J Mohler, William B Thompson, Sarah H Creem-Regehr, Herbert L Pick, and William H Warren. "Visual flow influences gait transition speed and preferred walking speed." Experimental brain research, 181:221–228, 2007. [41] Robert V Levine and Ara Norenzayan. "The pace of life in 31 countries." Journal of cross-cultural psychology, 30(2):178–205, 1999. [42] Henry J Ralston. "Energy-speed relation and optimal speed during level walking." Internationale Zeitschrift für Angewandte Physiologie Einschliesslich Arbeitsphysiologie, 17(4):277–283, 1958. [43] Harold W. Kuhn. "The hungarian method for the assignment problem." Naval Research Logistics Quarterly, 2:83–97, 1955. [44] M.L. Menéndez, J.A. Pardo, L. Pardo, and M.C. Pardo. "The jensen-shannon divergence." Journal of the Franklin Institute, 334(2):307–318, 1997. [45] James M. Joyce. "Kullback-Leibler Divergence." In Springer Berlin Heidelberg, Berlin, Heidelberg, pages 720–722, 2011. [46] E. Ristani, F. Solera, R. S. Zou, R. Cucchiara, and C. Tomasi. "Performance measures and a data set for multi-target, multi-camera tracking." arXiv preprint arXiv:1609.01775, 2016. [47] L. Zheng, L. Shen, L. Tian, S. Wang, J. Wang, and Q. Tian. "Scalable person reidentification: A benchmark." In Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), pages 1116–1124, 2015. [48] Longhui Wei, Shiliang Zhang, Wen Gao, and Qi Tian. "Person transfer gan to bridge domain gap for person re-identification." In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pages 79–88, 2018. [49] Wei Li, Rui Zhao, Tong Xiao, and Xiaogang Wang. "Deep-reid: Deep filter pairing neural network for person re-identification." In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pages 152–159, 2014. [50] J. Deng, W. Dong, R. Socher, L. J. Li, K. Li, and L. Fei-Fei. "Imagenet: A large-scale hierarchical image database." In 2009 IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pages 248–255, 2009. [51] Yunpeng Gong. "A general multi-modal data learning method for person reidentification." arXiv preprint arXiv:2101.08533, 2021. [52] X. Pan, P. Luo, J. Shi, and X. Tang. "Two at once: Enhancing learning and generalization capacities via ibn-net." In Proc. Eur. Conf. Comput. Vis. (ECCV), pages 464–479, 2018. [53] Shengcai Liao and Ling Shao. "Interpretable and generalizable person reidentification with query-adaptive convolution and temporal lifting." In Proc. Eur. Conf. Comput. Vis. (ECCV), pages 456–474, 2020. [54] K. Zhou, Y. Yang, A. Cavallaro, and T. Xiang. "Learning generalisable omni-scale representations for person re-identification." IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(9):5056–5069, 2021. [55] X. Jin, C. Lan, W. Zeng, Z. Chen, and L. Zhang. "Style normalization and restitution for generalizable person re-identification." In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pages 3143–3152, 2020. [56] Yuyang Zhao, Zhun Zhong, Fengxiang Yang, Zhiming Luo, Yaojin Lin, Shaozi Li, and Nicu Sebe. "Learning to generalize unseen domains via memory-based multisource meta-learning for person re-identification." In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pages 6277–6286, 2021. [57] Wentao Tan, Changxing Ding, Pengfei Wang, Mingming Gong, and Kui Jia. "Style interleaved learning for generalizable person re-identification." IEEE Transactions on Multimedia, 26:1600–1612, 2023. [58] Lei Qi, Lei Wang, Yinghuan Shi, and Xin Geng. "A novel mix-normalization method for generalizable multi-source person re-identification." IEEE Transactions on Multimedia, 25:4856–4867, 2023. [59] Haishun Du, Linbing He, Jiangtao Guo, and Jieru Li. "Meta separation–fusion for generalizable person re-identification." Knowledge-Based Systems, 284:111224, 2024. [60] Lei Zhang, Zhipu Liu, Wensheng Zhang, and David Zhang. "Style uncertainty based self-paced meta learning for generalizable person re-identification." IEEE Transactions on Image Processing, 32:2107–2119, 2023. [61] Jia Sun, Yanfeng Li, Luyifu Chen, Houjin Chen, and Minjun Wang. "Dualistic disentangled meta-learning model for generalizable person re-identification." IEEE Transactions on Information Forensics and Security, 2024. [62] Yingchun Guo, Xinsheng Dou, Ye Zhu, and Xinyao Wang. "Domain generalization person re-identification via style adaptation learning." International Journal of Machine Learning and Cybernetics, 15(10):4733–4746, 2024. [63] Zining Chen, Weiqiu Wang, Zhicheng Zhao, Fei Su, Aidong Men, and Yuan Dong. "Cluster-instance normalization: A statistical relation-aware normalization for generalizable person re-identification." IEEE Transactions on Multimedia, 26:3554–3566, 2024. [64] S. Shankar, V. Piratla, S. Chakrabarti, S. Chaudhuri, P. Jyothi, and S. Sarawagi. "Generalizing across domains via cross-gradient training." arXiv preprint arXiv:1804.10745, 2018. [65] K. Zhou, Y. Yang, T. Hospedales, and T. Xiang. "Learning to generate novel domains for domain generalization." In Proc. Eur. Conf. Comput. Vis. (ECCV), pages 561–578, 2020. [66] H. Ni, J. Song, X. Luo, F. Zheng, W. Li, and H. T. Shen. "Meta distribution alignment for generalizable person re-identification." In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pages 2487–2496, 2022. [67] Qian Zhao, Wentao Yu, and Tangyu Ji. "Style elimination and information restitution for generalizable person re-identification." Journal of Visual Communication and Image Representation, 98:104048, 2024. [68] Wanru Peng, Houjin Chen, Yanfeng Li, and Jia Sun. "Invariance learning under uncertainty for single domain generalization person re-identification." IEEE Transactions on Instrumentation and Measurement, 2024. [69] Jia Sun, Yanfeng Li, Luyifu Chen, Houjin Chen, and Wanru Peng. "Multiple integration model for single-source domain generalizable person re-identification." Journal of Visual Communication and Image Representation, 98:104037, 2024. [70] Kai Lv, Haobo Chen, Chuyang Zhao, Kai Tu, Junru Chen, Yadong Li, Boxun Li, and Youfang Lin. "Style variable and irrelevant learning for generalizable person reidentification." ACM Transactions on Multimedia Computing, Communications and Applications, 20(9):1–22, 2024. [71] Amran Bhuiyan, Jimmy Xiangji Huang, and Aijun An. "Igmg: Instance-guided multi granularity for domain generalizable person re-identification." Computer Vision and Image Understanding, 240:103905, 2024. [72] Zhi Han, Peng Wu, Xiaoming Zhang, Renjie Xu, and Jinfeng Li. "Cross intra-identity instance transformer for generalizable person re-identification." IEEE Access, 2024. [73] Hongchen Tan, Kaiqiang Xu, Pingping Tao, and Xiuping Liu. "Adversarial perturbation and defense for generalizable person re-identification." Neural Networks, 186:107287, 2025. [74] S Woo, K Park, I Shin, M Kim, and I S Kweon. "MTMMC: A Large-Scale Real World Multi-Modal camera tracking benchmark." In 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024. [75] L. Leal-Taixé, A. Milan, I. Reid, S. Roth, and K. Schindler. "MOTChallenge 2015: Towards a benchmark for multi-target tracking." arXiv:1504.01942 [cs], April 2015. arXiv: 1504.01942. [76] A. Milan, L. Leal-Taixé, I. Reid, S. Roth, and K. Schindler. "MOT16: A benchmark for multi-object tracking." arXiv:1603.00831 [cs], March 2016. arXiv: 1603.00831. [77] P. Dendorfer, H. Rezatofighi, A. Milan, J. Shi, D. Cremers, I. Reid, S. Roth, K. Schindler, and L. Leal-Taixé. "Mot20: A benchmark for multi object tracking in crowded scenes." arXiv:2003.09003[cs], March 2020. arXiv: 2003.09003. [78] Keni Bernardin and Rainer Stiefelhagen. "Evaluating multiple object tracking performance: the clear mot metrics." J. Image Video Process., 2008, January 2008. [79] Hao Luo, Youzhi Gu, Xingyu Liao, Shenqi Lai, and Wei Jiang. "Bag of tricks and a strong baseline for deep person re-identification." In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 1487–1495, 2019. [80] Jiangmiao Pang, Linlu Qiu, Xia Li, Haofeng Chen, Qi Li, Trevor Darrell, and Fisher Yu. "Quasi-dense similarity learning for multiple object tracking." In IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2021. [81] Yuanlu Xu, Xiaobai Liu, Lei Qin, and Song-Chun Zhu. "Cross-view people tracking by scene-centered spatio-temporal parsing." In Proceedings of the AAAI conference on artificial intelligence, volume 31, 2017. [82] Yuhang He, Xing Wei, Xiaopeng Hong, Weiwei Shi, and Yihong Gong. "Multi-target multi-camera tracking by tracklet-to-target assignment." IEEE Transactions on Image Processing, 29:5191–5205, 2020. [83] Kha Gia Quach, Pha Nguyen, Huu Le, Thanh-Dat Truong, Chi Nhan Duong, Minh Triet Tran, and Khoa Luu. "Dyglip: A dynamic graph model with link prediction for accurate multi-camera multiple object tracking." In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 13784–13793, 2021. [84] Cheng-Che Cheng, Min-Xuan Qiu, Chen-Kuo Chiang, and Shang-Hong Lai. "Rest: A reconfigurable spatial-temporal graph model for multi-camera multi-object tracking." In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10051–10060, 2023. [85] S Bergia, G Rizzo, M Gaetani, M Chiesa, and F Dominici. "ByteReID: An efficient online multi camera Multi-Person tracking system." In 2024 IEEE 18th International Conference on Application of Information and Communication Technologies (AICT), 2024. [86] Jonah Ong, Ba-Tuong Vo, Ba-Ngu Vo, Du Yong Kim, and Sven Nordholm. "A bayesian filter for multi-view 3d multi-object tracking with occlusion handling." IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(5):2246–2263, 2020. [87] Quanzeng You and Hao Jiang. "Real-time 3d deep multi-camera tracking." arXiv preprint arXiv:2003.11753, 2020. [88] Torben Teepe, Philipp Wolters, Johannes Gilg, Fabian Herzog, and Gerhard Rigoll. "Earlybird: early-fusion for multi-view tracking in the bird’s eye view." In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 102–111, 2024. [89] Martin Engilberge, Weizhe Liu, and Pascal Fua. "Multi-view tracking using weakly supervised human motion prediction." In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 1582–1592, 2023. [90] Torben Teepe, Philipp Wolters, Johannes Gilg, Fabian Herzog, and Gerhard Rigoll. "Lifting multi-view detection and tracking to the bird’s eye view." In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 667–676, 2024.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/99606	-
dc.description.abstract	隨著公共空間中監控網絡的部署日益普及，對高效多目標多相機追蹤（MTMCT）系統的需求也隨之增長。儘管單相機追蹤技術已取得進展，但在複雜環境中，個體跨非重疊相機視角的關聯仍面臨外觀、光照和視角多變性等重大挑戰。為應對這些挑戰，本論文提出了一個穩健且自動化的MTMCT框架為監控系統的發展提供了一個全新的方向。本研究提出了一系列針對性方法以克服現有技術的限制。其一是一個基於視覺轉換器(visual transformer)的外貌特徵提取模型，透過影像語意分割、域不變權重層、以及專家混合器來達到剃除域相關的無用視覺資訊。因此即便在沒有目標域的微調訓練下，我們的系統仍能夠穩健地提取被追蹤目標的外貌描述特徵。其二是在跨鏡階段通過設計可行性 (feasibility) 和可達性 (reachability) 等概念的時空約束，該框架在保持精度的同時可大量排除不滿足時空物理限制之追蹤假設對，進而提高追蹤之正確性。此外，我們還設計了擴散加權來描述追蹤目標在盲區內行動時的情況。最後，跟蹤器優化階段採用了雙層結構，首先進行軌跡片段與跟蹤器的高效關聯，然後通過精煉模組對跟蹤器進行進一步處理，解決了多外觀表示可能引發的多餘跟蹤器問題，同時也減少身份切換和提升跟蹤穩定性。在實驗部份，針對目標外觀表徵和相似度計算方面，我們在人物重識別(Person Re-ID)標準數據集上進行了測試，並與其他人物重識別模型進行了比較，證明了我們所提出的MoE-DIA transformer可以有效的提取出用於分辨不同人員的外表特徵。並且在多來源域泛化測試以及跨域訓練測試中，均展現了我們的外貌特徵提取有超越現有域泛化演算法的特徵表現力，此結果更進一步證明此模型可以很好的部屬於各類人員追蹤與監視系統。而針對多目標多相機追蹤系統成效之評估，在數個多目標多相機追蹤數據集上的結果，展現出所提出的追蹤框架優於現有方法的卓越性能，在MOTA和IDF1等關鍵指標上取得了顯著的進步。這些結果突顯了其在真實世界場景中進行穩健跨相機追蹤的潛力。同時也與其他學術上現有的多目標多相機追蹤前沿演算法，在目前常見之公開數據集如CAMPUS以及WILDTRACK做量化比較。其結果呈現出即便在各式環境干擾與相機組態下，我們的系統在減少身份切換和提升跟蹤穩定性方面表現出色，充分證明了其在多相機場景中的實用性和可靠性。	zh_TW
dc.description.abstract	The demand for effective multiple-target multiple-camera tracking (MTMCT) systems has grown alongside the increasing deployment of surveillance networks in public spaces. Although advances in single-camera tracking have been made, significant challenges remain in associating individuals across non-overlapping camera views, particularly in complex environments characterized by high variability in appearance, lighting, and perspective. To address these challenges, this dissertation presents a robust and automated MTMCT framework, offering a new direction for the development of modern surveillance systems. This research introduces a series of targeted methodologies to overcome the limitations of existing techniques. A novel appearance feature extraction model, called "MoE-DIA transformer," based on a visual transformer is proposed, which incorporates semantic segmentation, the domain-invariant attention layer (MIA), and the Mixture of Experts (MoE) architecture. This design effectively prunes domain-specific, irrelevant visual information, enabling the system to robustly extract descriptive appearance features without requiring fine-tuning on the target domain. Spatial-temporal constraints, defined by concepts such as feasibility and reachability, are leveraged to significantly reduce incorrect tracking hypotheses, thereby enhancing accuracy. Furthermore, diffusion weighting is designed to model the movement of subjects within the blind spots. Finally, the tracker optimization phase employs a two-tiered structure: it first efficiently associates tracklets with trackers, then a refinement module processes these trackers to resolve issues of redundancy caused by multiple appearance representations. This approach significantly reduces identity switches and improves tracking stability. In our experimental evaluation, the proposed appearance model, the MoE-DIA transformer, is benchmarked on standard person re-identification (Re-ID) datasets. It demonstrates superior feature expressiveness over existing domain generalization algorithms in both multi-source domain generalization and cross-domain training tests, confirming its ability to extract discriminative features for person identification. When the complete MTMCT system is evaluated, it shows superior performance compared to state-of-the-art methods on public datasets such as MTMMC, CAMPUS, and WILDTRACK. The framework achieves significant improvements in key metrics like Multiple Object Tracking Accuracy (MOTA), Multiple Object Tracking Precision (MOTP), and Identification F1 Score (IDF1). These results underscore its effectiveness in reducing identity switches and enhancing tracking stability, proving its practicality and reliability in real-world scenarios with various environmental interferences and camera configurations.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-09-17T16:07:21Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2025-09-17T16:07:21Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	Contents 口試委員審定書........................................... i 致謝................................................... ii 摘要................................................... iv Abstract.............................................. vi Contents.............................................. ix List of Figures....................................... xii List of Tables........................................ xiv Chapter 1 Introduction................................ 1 1.1 Motivation........................................ 2 1.2 Contributions..................................... 3 1.3 Organization of the Dissertation.................. 5 Chapter 2 Related Work and Problem Definition ........ 7 2.1 Related Research.................................. 7 2.1.1 Person Re-identification(Re-ID)................. 8 2.1.2 Cluster-based Approach.......................... 10 2.1.3 Tracking-based Approach......................... 11 2.1.4 Spatial and Temporal Information................ 12 2.2 Problem Formulation and Notations................. 13 Chapter 3 MTMCT System and Re-ID Appearance Feature.... 16 3.1 System Overview................................... 17 3.2 Intra-Camera Stage................................ 18 3.2.1 Human Detection................................. 19 3.2.2 Tracklet Generation Algorithm................... 20 3.3 Re-ID Appearance.................................. 22 3.3.1 MoE-DIA Re-ID Representation.................... 22 3.3.2 Semantic-Aware Mask Generator................... 23 3.3.3 Mixture of Experts Domain Invariant Attention Transformer.... 28 3.3.4 Training for Re-ID Model ....................... 33 Chapter 4 Cross-Camera Tracking Stage................. 35 4.1 Tracking Pairing Hypothesis....................... 37 4.1.1 Homography Projection........................... 37 4.1.2 Hypothesis Generator............................ 39 4.2 Affinity Computing................................ 41 4.2.1 Re-ID Feature and Appearance Similarity......... 42 4.2.2 Feasibility Analysis............................ 44 4.2.3 Diffusion Weighting............................. 49 4.2.4 Affinity Scoring................................ 51 4.3 Tracker Optimization.............................. 52 4.3.1 Tracklet-Tracker Associating.................... 52 4.3.2 Tracker-Tracker Refinement...................... 55 4.4 Computing Complexity Analysis for MTMCT System.... 58 Chapter 5 Experiment Results ..................... 61 5.1 Experimental Results on Re-ID Appearance Feature.... 61 5.1.1 Datasets and Settings....................... 61 5.1.2 Multi-Source DG Re-ID....................... 63 5.1.3 Cross-Domain Re-ID.......................... 63 5.1.4 Ablation Studies............................ 66 5.2 Experiments on MTMCT System................... 69 5.2.1 Datasets.................................... 69 5.2.2 Evaluation Metrics.......................... 72 5.2.3 Implementation Detail....................... 74 5.2.4 Experiment Result on MTMMC Dataset.......... 75 5.2.5 Experiment Result on CAMPUS Dataset......... 76 5.2.6 Experiment Result on WILDTRACK Dataset...... 76 5.2.7 Testing Result in real environment.......... 78 Chapter 6 Conclusions and Discussion ............. 85 6.1 Conclusions................................... 85 6.2 Discussion.................................... 86 References........................................ 88	-
dc.language.iso	en	-
dc.subject	多相機	zh_TW
dc.subject	多相機追蹤	zh_TW
dc.subject	監視系統	zh_TW
dc.subject	時空約束條件	zh_TW
dc.subject	域不變外貌特徵	zh_TW
dc.subject	Surveillance System	en
dc.subject	Multi-Target Multi-Camera Tracking	en
dc.subject	Domain Invariance Appearance Feature	en
dc.subject	Spatial-Temporal Constraint	en
dc.subject	MTMCT	en
dc.title	基於相機網路的時空約束條件下之多目標多相機追蹤監視系統	zh_TW
dc.title	The Multi-Target Multi-Camera Tracking Surveillance System based on Spatial-Temporal Constraints from RGB Camera Network	en
dc.type	Thesis	-
dc.date.schoolyear	113-2	-
dc.description.degree	博士	-
dc.contributor.oralexamcommittee	蘇木春;李明穗;張文中;陳永耀;連豊力;陸敬互;黃正民;黃世勳	zh_TW
dc.contributor.oralexamcommittee	Mu-Chun Su;Ming-Sui Lee;Wen-Chung Chang;Yung-Yaw Chen;Feng-Li Lian;Ching-Hu Lu;Cheng-Ming Huang;Shih-Shinh Huang	en
dc.subject.keyword	多相機,多相機追蹤,監視系統,時空約束條件,域不變外貌特徵,	zh_TW
dc.subject.keyword	Multi-Target Multi-Camera Tracking,MTMCT,Surveillance System,Spatial-Temporal Constraint,Domain Invariance Appearance Feature,	en
dc.relation.page	100	-
dc.identifier.doi	10.6342/NTU202504138	-
dc.rights.note	同意授權(全球公開)	-
dc.date.accepted	2025-08-14	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	電機工程學系	-
dc.date.embargo-lift	2028-07-31	-
顯示於系所單位：	電機工程學系

文件中的檔案：

檔案	大小	格式
ntu-113-2.pdf 此日期後於網路公開 2028-07-31	26.35 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。