深度學習於視覺之異常偵測

吳志強; Jhih-Ciang Wu

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/87952

標題:	深度學習於視覺之異常偵測 Deep Learning for Visual Anomaly Detection
作者:	吳志強 Jhih-Ciang Wu
指導教授:	劉庭祿 Tyng-Luh Liu
共同指導教授:	傅楸善 Chiou-Shann Fuh
關鍵字:	深度學習,異常偵測,非監督式學習,自監督式學習, deep learning,anomaly detection,unsupervised learning,self-supervised learning,
出版年 :	2023
學位:	博士
摘要:	異常檢測主要目的在於發掘空間中或時序上等各個領域的不規則模式，而異常之基本定義主要來自於與正常之間的差異部分。該任務前景廣闊，可廣泛應用於工業領域，近年來也吸引到電腦視覺領域的關注。因此，開發一種具備高準確度的自動驗證瑕疵或異常事件的演算法尤其重要。本論文包括四個部分：模擬單一類異常檢測、真實世界圖像異常檢測、影片異常檢測和醫學疾病檢測，其中每個部分著重在各式各樣的異常檢測並且所包含的開發方法是獨立的。每個部分都已被接受為期刊或會議論文。在論文的第一部分，我們解決了模擬單一類異常檢測的問題，它在訓練階段只給出了來自常見圖像分類資料集的特定一類。該任務之目的是在測試階段判斷輸入樣本是否屬於該類。根據問題設定的特色，我們將模擬單一類異常檢測歸類為類級分類。當前大多數的方法都有局限性，其中類別的標準僅依賴於重建誤差項。我們通過使用自動編碼器模型提出正規化項來打破這一限制。我們在訓練期間將正則化項與新穎性評分模塊配對，以確定給定圖像和給定一類之間的差異，從而提高我們模型的效率。在對模擬單一類異常檢測進行調查之後，我們尋求實際適用性並處理真實世界圖像異常檢測。與模擬單一類異常檢測不同，真實世界圖像異常檢測旨在檢測測試圖像是否包含缺陷。該檢測進一步分為兩類：異常分類和異常定位，其中前者屬於圖像級分類，後者則是像素級分類。我們提出了一種無監督的通用模型，稱為Metaformer，它利用元學習模型參數來實現高模型適應能力和實例感知變換器來強調定位異常區域的焦點區域，意即探索我們感興趣區域的重建落差。我們解決了真實世界圖像異常檢測問題當中基於重建的方法所常見的兩個關鍵問題：模型適應性和重建落差。前者概括了單一異常檢測模型用以處理廣泛的類別，而後者為定位異常區域提供了有用的線索。在第三部分中，我們解決了一個更具挑戰性的任務：影片異常檢測。該任務旨在捕獲異常事件並將它們定位在時間序列中。在此任務中，時間上的異常通常比空間上異常更為嚴重。我們為基於學習的模型引入了自監督式稀疏表示。我們提出了en-Normal和de-Normal模塊做為我們的核心組件，它們相對地利用了事先學習的的字典。前者用於獲取其重建正常事件特徵，而後者用於濾除正常事件特徵。該架構可同時解決單一類和弱監督式影片異常檢測，從而展現其高靈活性。在論文的最後一部分中，我們擴展了異常檢測的一般性並探索了我們提出的影片異常檢測方法的適用性。我們處理兩項疾病檢測任務：影片病理診斷和前列腺癌檢測。儘管任務的所給定資料型態不同，但我們將它們一起視為多實例學習問題。我們引入了一種方法，可以在每個實例中分離健康和不健康的特徵成分。有了這樣的分離特徵，我們可以突出顯示患病的特徵成分，以準確推理疾病評分。 Anomaly Detection (AD) aims to discover irregular patterns in various domains, such as spatial or temporal, while the difference between abnormal and normal fundamentally defines the anomaly. The task is promising, extensively applicable in the industrial field, and recently attracted interest in computer vision association. Accordingly, developing an algorithm for automatically verifying defects or anomalous events with reliable accuracy is crucial. This dissertation comprises four parts: simulated one-class AD, real-world image AD, video AD, and medical disease detection, where the developed approaches are independent but focus on various AD. Each part has been accepted as a journal or conference paper. In the first part of the dissertation, we address the problem of simulated one-class AD, which is given only a particular class (representing the normal class here) from classic image classification benchmarks during the training stage. The objective is to determine whether the input samples belong to that class in the testing phase. By the characteristic of problem setup, we categorize simulated one-class AD as a class-level classification. Most currently conventional methods have limitations in which the criterion for the novel class solely relies on the reconstruction error term. We break this restriction by proposing a normalization term with an autoencoder model. We pair the regularization with an additive novelty scoring module during training to determine the difference between a given image and the anomaly-free class, improving our model's efficiency. Following the investigation for simulated one-class AD, we seek practical applicability and deal with real-world image AD. Unlike the simulated one-class AD, real-world image AD aims to detect whether the testing image contains defects. The detection is further divided into two categories: anomaly classification and anomaly localization, where the first belongs to the image-level classification, and the other is pixel-level classification. We propose an unsupervised universal model, termed Metaformer, which leverages both meta-learned model parameters to achieve high model adaptation capability and instance-aware transformer to emphasize the focal regions for localizing abnormal regions, i.e., to explore the reconstruction gap at those regions of interest. We address two pivotal issues of reconstruction-based approaches to real-world image AD in images: model adaptation and reconstruction gap. The former generalizes an AD model to tackle a broad range of object categories, while the latter provides helpful clues for localizing abnormal regions. In the third part, we tackle a more challenging task, video AD, which aims to capture anomalous events and localize them in the temporal sequence. In this task, temporal anomalies are generally more critical than spatial anomalies. We introduce the self-supervised sparse representation for the learning-based model. Particularly, we design the en-Normal and de-Normal modules as our core components which leveraged oppositely with the learned task-specific dictionary. The former is used to obtain its reconstructed normal-event feature, while the latter is applied to filter out the normal-event feature. The flexibility is attached to the proposed architecture that generally carries out both one-class and weakly-supervised video AD. In the last part, we extend the generalization and explore the applicability of our proposed method for video AD. We tackle two disease detection tasks: the video pathological diagnosis and prostate cancer detection. Although the modalities of the tasks are disparate, we cast them as Multiple Instance Learning (MIL) problems together. We introduce an approach that decouples healthy and unhealthy feature ingredients per instance. With such decoupled features, we can highlight the diseased feature ingredients for accurately reasoning the disease score.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/87952
DOI:	10.6342/NTU202301248
全文授權:	同意授權(限校園內公開)
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-111-2.pdf 目前未授權公開取用	32.28 MB	Adobe PDF	檢視/開啟

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。