領域轉移下的非監督式異音偵測

Pu-He Wang; 王普禾

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/85934

標題:	領域轉移下的非監督式異音偵測 Unsupervised Anomalous Sound Detection under Domain Shifts
作者:	Pu-He Wang 王普禾
指導教授:	張智星(Jyh-Shing Jang) 張智星(Jyh-Shing Jang \| roger.jang@gmail.com \| ),
關鍵字:	異音偵測,異常檢測,時頻譜,領域轉移,領域自適應,非監督式學習,轉換器, abnormal sound detection,anomaly detection,time-spectrum,domain transfer,domain adaptation,unsupervised learning,transformers,
出版年 :	2022
學位:	碩士
摘要:	在機器學習的這個領域中，聲音相關的研究也有許多的應用，舉凡降噪、人聲分離、語音辨識等等。而異音偵測也是聲音處理其中一個題目，主要目的是讓模型能在短時間內正確的辨別一段聲音中是否有異常。近年來，隨著機器學習的應用越來越廣泛，也越來越多領域的使用者考慮引入機器學習到自己的領域來加速或提升正確率於本來的作業流程，而現在全世界也正走在工業 4.0 的道路上，許多傳統產業也漸漸對機器學習產生興趣，而異音偵測就是一個適合運用在工廠的應用，不只能輔助操作人員，更能加快作業速度與提升產品良率。本論文以過往異音偵測的研究為基礎，探討在領域轉移 (domain shift) 的情況下如何維持甚至提升正確率。本文以時頻譜的形式看待聲音，依此將聲音表示成多張的圖像，再以 CNN 結合轉換器的模型做特徵抽取，設計多任務學習的模型做為分類器判斷聲音片段是否為異音，我們使用 DCASE 競賽資料集，使用最先進的模型進行特徵抽取，並透過損失函數的設計與模型架構的優化來處理領域轉移，最後利用基於密度的方法計算異音分數，得到最終結果。最後我們與 DCASE 競賽官方公佈的模型、其他參賽者的模型做比較，官方公佈模型的基線平均接收操作特徵圖下面積 (area under the ROC Curve, AUC) 為 63.36%，本研究的模型相較於官方公佈模型的表現有顯著的提升，平均 AUC 可達到 72%，與其他參賽者的模型比較也能達到前五名的表現。 In this field of machine learning, sound-related research also has many applications, such as noise reduction, human voice separation, speech recognition, and so on. Abnormal sound detection is also one of the topics in sound processing. The main purpose is to allow the model to correctly identify whether there is an abnormality in a sound in a short period of time. In recent years, as the application of machine learning has become more and more extensive, more and more users are considering introducing machine learning into their own fields to speed up or improve the accuracy rate of the original operation process, and now the world is also moving on the road of Industry 4.0, many traditional industries are gradually becoming interested in machine learning, and abnormal sound detection is an application suitable for use in factories. It can not only assist operators, but also speed up operations and improve product yields. Based on the previous research on abnormal sound detection, this paper discusses how to maintain or even improve the accuracy in the case of field transfer. In this paper, we look at the sound in the form of spectrogram, and then represent the sound as images, and then use the CNN combined with the transformer model for feature extraction. The multi-task learning model is used as a classifier to judge whether the sound clip is abnormal or not. We use the DCASE competition dataset, ap- plying state-of-the-art model for feature extraction, and the domain transfer is handled through the design of the loss function and the optimization of the model architecture. Finally, the density-based method is used to calculate the abnormal sound score, and the final result is obtained. Finally, we compare with the model officially announced by the DCASE competition and the models of other contestants. The average area under the ROC Curve (AUC) of the baseline model that officially announced is 63.36%. Compared with the baseline model, Performance of the model in this study has been significantly im- proved. The average AUC can reach 72%, and it can also achieve the top five performance compared with other contestants’ models.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/85934
DOI:	10.6342/NTU202203426
全文授權:	同意授權(全球公開)
電子全文公開日期:	2022-09-30
顯示於系所單位：	資訊網路與多媒體研究所

文件中的檔案：

檔案	大小	格式
U0001-1509202211471500.pdf	9.92 MB	Adobe PDF	檢視/開啟

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。