基於運動輔助表徵學習之弱監督細粒度影片異常檢測

鄒玲; Ling Zou

Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98518

Title:	基於運動輔助表徵學習之弱監督細粒度影片異常檢測 Weakly-Supervised Fine-Grained Video Anomaly Detection via Motion-Assisted Representation Learning
Authors:	鄒玲 Ling Zou
Advisor:	鄭文皇 Wen-Huang Cheng
Keyword:	細粒度影片異常偵測,多模態學習,動作輔助表徵學習,大型語言模型, Fine-grained video anomaly detection,Multi-modal learning,Motion-Assisted representation learning,Large language model,
Publication Year :	2025
Degree:	碩士
Abstract:	細粒度影片異常檢測（Fine-grained Video Anomaly Detection, FG-VAD）旨在僅利用影片層級的異常存在指示和相應的語義類別標籤，對影片中的異常幀進行定位。儘管現有的大多數方法都利用了 CLIP 特徵來解決這個問題，但仍存在關鍵挑戰。在視覺方面，CLIP 特徵雖然在靜態影像上表現良好，但缺乏時序感知，常常因光照突變、物體快速移動或幀切換過快而導致誤報。在語意方面，許多方法難以充分捕捉所提供類別標籤的細微語義，導致相關異常行為的誤判。為了解決這些局限性，我們提出了一種新方法，包含兩個關鍵模組：（1）視覺時序平滑（Visual Temporal Smoothing，VTS）模組，透過引入時序一致性來減少誤報；（2）文字增強表徵模組（Text-Enhanced Representation，TER），利用大型語言模型（Large Language Model，LLM）豐富異常標籤的語義理解，從而實現更準確的幀級分類。在兩個基準資料集上的大量實驗和全面的消融研究表明，我們的方法有效性顯著，優於現有的最新方法。 Fine-grained Video Anomaly Detection (FG-VAD) aims to localize anomalous frames within a video using only video-level indications of anomaly presence and a corresponding semantic category label. While most existing methods leverage CLIP features to tackle this problem, key challenges still remain. On the visual side, CLIP features are effective for static images but lack temporal awareness, often leading to false alarms caused by sudden changes in illumination, rapid object motion, or fast frame transitions. On the semantic side, many approaches struggle to capture the nuanced meaning of the provided category label, resulting in missed detections of relevant anomalous actions. To overcome these limitations, we propose a novel method with two key components: (1) Visual Temporal Smoothing (VTS) module designed to reduce false positives by incorporating temporal consistency, and (2) Text-Enhanced Representation (TER) module that utilizes LLMs to enrich the semantic understanding of anomaly labels, enabling more accurate frame-level classification. Extensive experiments on two benchmark datasets, along with comprehensive ablation studies, demonstrate the effectiveness of our approach, showing that it outperforms existing state-of-the-art methods.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98518
DOI:	10.6342/NTU202503387
Fulltext Rights:	同意授權(限校園內公開)
metadata.dc.date.embargo-lift:	2025-08-15
Appears in Collections:	資訊工程學系

Files in This Item:

File	Size	Format
ntu-113-2.pdf Access limited in NTU ip range	27.79 MB	Adobe PDF

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets