基於相機網路的時空約束條件下之多目標多相機追蹤監視系統

劉安陞; An-Sheng Liu

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/99606

標題:	基於相機網路的時空約束條件下之多目標多相機追蹤監視系統 The Multi-Target Multi-Camera Tracking Surveillance System based on Spatial-Temporal Constraints from RGB Camera Network
作者:	劉安陞 An-Sheng Liu
指導教授:	傅立成 Li-Chen Fu
關鍵字:	多相機,多相機追蹤,監視系統,時空約束條件,域不變外貌特徵, Multi-Target Multi-Camera Tracking,MTMCT,Surveillance System,Spatial-Temporal Constraint,Domain Invariance Appearance Feature,
出版年 :	2025
學位:	博士
摘要:	隨著公共空間中監控網絡的部署日益普及，對高效多目標多相機追蹤（MTMCT）系統的需求也隨之增長。儘管單相機追蹤技術已取得進展，但在複雜環境中，個體跨非重疊相機視角的關聯仍面臨外觀、光照和視角多變性等重大挑戰。為應對這些挑戰，本論文提出了一個穩健且自動化的MTMCT框架為監控系統的發展提供了一個全新的方向。本研究提出了一系列針對性方法以克服現有技術的限制。其一是一個基於視覺轉換器(visual transformer)的外貌特徵提取模型，透過影像語意分割、域不變權重層、以及專家混合器來達到剃除域相關的無用視覺資訊。因此即便在沒有目標域的微調訓練下，我們的系統仍能夠穩健地提取被追蹤目標的外貌描述特徵。其二是在跨鏡階段通過設計可行性 (feasibility) 和可達性 (reachability) 等概念的時空約束，該框架在保持精度的同時可大量排除不滿足時空物理限制之追蹤假設對，進而提高追蹤之正確性。此外，我們還設計了擴散加權來描述追蹤目標在盲區內行動時的情況。最後，跟蹤器優化階段採用了雙層結構，首先進行軌跡片段與跟蹤器的高效關聯，然後通過精煉模組對跟蹤器進行進一步處理，解決了多外觀表示可能引發的多餘跟蹤器問題，同時也減少身份切換和提升跟蹤穩定性。在實驗部份，針對目標外觀表徵和相似度計算方面，我們在人物重識別(Person Re-ID)標準數據集上進行了測試，並與其他人物重識別模型進行了比較，證明了我們所提出的MoE-DIA transformer可以有效的提取出用於分辨不同人員的外表特徵。並且在多來源域泛化測試以及跨域訓練測試中，均展現了我們的外貌特徵提取有超越現有域泛化演算法的特徵表現力，此結果更進一步證明此模型可以很好的部屬於各類人員追蹤與監視系統。而針對多目標多相機追蹤系統成效之評估，在數個多目標多相機追蹤數據集上的結果，展現出所提出的追蹤框架優於現有方法的卓越性能，在MOTA和IDF1等關鍵指標上取得了顯著的進步。這些結果突顯了其在真實世界場景中進行穩健跨相機追蹤的潛力。同時也與其他學術上現有的多目標多相機追蹤前沿演算法，在目前常見之公開數據集如CAMPUS以及WILDTRACK做量化比較。其結果呈現出即便在各式環境干擾與相機組態下，我們的系統在減少身份切換和提升跟蹤穩定性方面表現出色，充分證明了其在多相機場景中的實用性和可靠性。 The demand for effective multiple-target multiple-camera tracking (MTMCT) systems has grown alongside the increasing deployment of surveillance networks in public spaces. Although advances in single-camera tracking have been made, significant challenges remain in associating individuals across non-overlapping camera views, particularly in complex environments characterized by high variability in appearance, lighting, and perspective. To address these challenges, this dissertation presents a robust and automated MTMCT framework, offering a new direction for the development of modern surveillance systems. This research introduces a series of targeted methodologies to overcome the limitations of existing techniques. A novel appearance feature extraction model, called "MoE-DIA transformer," based on a visual transformer is proposed, which incorporates semantic segmentation, the domain-invariant attention layer (MIA), and the Mixture of Experts (MoE) architecture. This design effectively prunes domain-specific, irrelevant visual information, enabling the system to robustly extract descriptive appearance features without requiring fine-tuning on the target domain. Spatial-temporal constraints, defined by concepts such as feasibility and reachability, are leveraged to significantly reduce incorrect tracking hypotheses, thereby enhancing accuracy. Furthermore, diffusion weighting is designed to model the movement of subjects within the blind spots. Finally, the tracker optimization phase employs a two-tiered structure: it first efficiently associates tracklets with trackers, then a refinement module processes these trackers to resolve issues of redundancy caused by multiple appearance representations. This approach significantly reduces identity switches and improves tracking stability. In our experimental evaluation, the proposed appearance model, the MoE-DIA transformer, is benchmarked on standard person re-identification (Re-ID) datasets. It demonstrates superior feature expressiveness over existing domain generalization algorithms in both multi-source domain generalization and cross-domain training tests, confirming its ability to extract discriminative features for person identification. When the complete MTMCT system is evaluated, it shows superior performance compared to state-of-the-art methods on public datasets such as MTMMC, CAMPUS, and WILDTRACK. The framework achieves significant improvements in key metrics like Multiple Object Tracking Accuracy (MOTA), Multiple Object Tracking Precision (MOTP), and Identification F1 Score (IDF1). These results underscore its effectiveness in reducing identity switches and enhancing tracking stability, proving its practicality and reliability in real-world scenarios with various environmental interferences and camera configurations.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/99606
DOI:	10.6342/NTU202504138
全文授權:	同意授權(全球公開)
電子全文公開日期:	2028-07-31
顯示於系所單位：	電機工程學系

文件中的檔案：

檔案	大小	格式
ntu-113-2.pdf 此日期後於網路公開 2028-07-31	26.35 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。