領域轉移下的非監督式異音偵測

Pu-He Wang; 王普禾

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/85934

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	張智星(Jyh-Shing Jang)
dc.contributor.author	Pu-He Wang	en
dc.contributor.author	王普禾	zh_TW
dc.date.accessioned	2023-03-19T23:29:30Z	-
dc.date.copyright	2022-09-30
dc.date.issued	2022
dc.date.submitted	2022-09-22
dc.identifier.citation	[1] Markus Breunig, Hans-Peter Kriegel, Raymond Ng, and Joerg Sander. Lof: Identi- fying density-based local outliers. volume 29, pages 93–104, 06 2000. [2] AsaBen-Hur,DavidHorn,HavaSiegelmann,andVladimirVapnik.Asupportvector method for clustering. Advances in Neural Information Processing Systems, 13, 2000. [3] Bernhard Schölkopf, Robert C Williamson, Alex Smola, John Shawe-Taylor, and John Platt. Support vector method for novelty detection. Advances in neural information processing systems, 12, 1999. [4] Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. Isolation forest. In 2008 eighth ieee international conference on data mining, pages 413–422. IEEE, 2008. [5] Hansi Chen, Hongzhan Ma, Xuening Chu, and Deyi Xue. Anomaly detection and critical attributes identification for products with multiple operating conditions based on isolation forest. Advanced Engineering Informatics, 46:101139, 2020. [6] Yohei Kawaguchi, Keisuke Imoto, Yuma Koizumi, Noboru Harada, Daisuke Ni- izumi, Kota Dohi, Ryo Tanabe, Harsh Purohit, and Takashi Endo. Description and discussion on dcase 2021 challenge task 2: Unsupervised anomalous sound detection for machine condition monitoring under domain shifted conditions. arXiv preprint arXiv:2106.04492, 2021. [7] Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang- Chieh Chen. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018. [8] Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang- Chieh Chen. Mobilenetv2: Inverted residuals and linear bottlenecks. 2018. [9] Yaroslav Ganin and Victor Lempitsky. Unsupervised domain adaptation by back- propagation. In Francis Bach and David Blei, editors, Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, pages 1180–1189, Lille, France, 07–09 Jul 2015. PMLR. [10] Philip Haeusser, Thomas Frerix, Alexander Mordvintsev, and Daniel Cremers. As- sociative domain adaptation. In Proceedings of the IEEE international conference on computer vision, pages 2765–2773, 2017. [11] Juan Del Hoyo Ontiveros and Hector Courdourier. Ensemble of complementary anomaly detectors under domain shifted conditions technical report jose a. lopez, georg stemmer, paulo lopez-meyer, pradyumna s. singh. [12] Kazuki Morita, Tomohiko Yano, and Khai Tran. Anomalous sound detection using cnn-based features by self supervised learning. Tech. Rep., DCASE2021 Challenge, 2021. [13] Kevin Wilkinghoff. Utilizing sub-cluster adacos for anomalous sound detection un- der domain shifted conditions. Tech. Rep., DCASE2021 Challenge, 2021. [14] AshishVaswani,NoamShazeer,NikiParmar,JakobUszkoreit,LlionJones,AidanN Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017. [15] Keiron O’Shea and Ryan Nash. An introduction to convolutional neural networks. arXiv preprint arXiv:1511.08458, 2015. [16] Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang- Chieh Chen. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4510– 4520, 2018. [17] Zihang Dai, Hanxiao Liu, Quoc V Le, and Mingxing Tan. Coatnet: Marrying con- volution and attention for all data sizes. Advances in Neural Information Processing Systems, 34:3965–3977, 2021. [18] Rich Caruana. Multitask learning. Machine learning, 28(1):41–75, 1997. [19] David MJ Tax and Robert PW Duin. Support vector domain description. Pattern recognition letters, 20(11-13):1191–1199, 1999. [20] Noboru Harada, Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Masahiro Ya- suda, and Shoichiro Saito. Toyadmos2: Another dataset of miniature-machine op- erating sounds for anomalous sound detection under domain shift conditions. arXiv preprint arXiv:2106.02369, 2021. [21] Ryo Tanabe, Harsh Purohit, Kota Dohi, Takashi Endo, Yuki Nikaido, Toshiki Naka- mura, and Yohei Kawaguchi. Mimii due: Sound dataset for malfunctioning industrial machine investigation and inspection with domain shifts due to changes in opera- tional and environmental conditions. In 2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pages 21–25. IEEE, 2021. [22] Mei Wang and Weihong Deng. Deep visual domain adaptation: A survey. Neurocomputing, 312:135–153, 2018. [23] Antonia Creswell, Tom White, Vincent Dumoulin, Kai Arulkumaran, Biswa Sen- gupta, and Anil A Bharath. Generative adversarial networks: An overview. IEEE Signal Processing Magazine, 35(1):53–65, 2018. [24] AaronvandenOord,SanderDieleman,HeigaZen,KarenSimonyan,OriolVinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499, 2016. [25] David Snyder, Daniel Garcia-Romero, Gregory Sell, Daniel Povey, and Sanjeev Khudanpur. X-vectors: Robust dnn embeddings for speaker recognition. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP), pages 5329–5333. IEEE, 2018. [26] EliBingham,JonathanPChen,MartinJankowiak,FritzObermeyer,NeerajPradhan, Theofanis Karaletsos, Rohit Singh, Paul Szerlip, Paul Horsfall, and Noah D Goodman. Pyro: Deep universal probabilistic programming. The Journal of Machine Learning Research, 20(1):973–978, 2019. [27] Yu Zhang and Qiang Yang. An overview of multi-task learning. National Science Review, 5(1):30–43, 2018. [28] Cheongjae Lee, Sangkeun Jung, Kyungduk Kim, and Gary Geunbae Lee. Hybrid approach to robust dialog management using agenda and dialog examples. Computer Speech & Language, 24(4):609–631, 2010. [29] Francois Chollet et al. Keras, 2015. [30] Brian McFee, Colin Raffel, Dawen Liang, Daniel PW Ellis, Matt McVicar, Eric Battenberg, and Oriol Nieto. librosa: Audio and music signal analysis in python. In Proceedings of the 14th python in science conference, volume 8, 2015. [31] Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. Scikit-learn: Machine learning in python. Journal of machine learning research, 12(Oct):2825–2830, 2011.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/85934	-
dc.description.abstract	在機器學習的這個領域中，聲音相關的研究也有許多的應用，舉凡降噪、人聲分離、語音辨識等等。而異音偵測也是聲音處理其中一個題目，主要目的是讓模型能在短時間內正確的辨別一段聲音中是否有異常。近年來，隨著機器學習的應用越來越廣泛，也越來越多領域的使用者考慮引入機器學習到自己的領域來加速或提升正確率於本來的作業流程，而現在全世界也正走在工業 4.0 的道路上，許多傳統產業也漸漸對機器學習產生興趣，而異音偵測就是一個適合運用在工廠的應用，不只能輔助操作人員，更能加快作業速度與提升產品良率。本論文以過往異音偵測的研究為基礎，探討在領域轉移 (domain shift) 的情況下如何維持甚至提升正確率。本文以時頻譜的形式看待聲音，依此將聲音表示成多張的圖像，再以 CNN 結合轉換器的模型做特徵抽取，設計多任務學習的模型做為分類器判斷聲音片段是否為異音，我們使用 DCASE 競賽資料集，使用最先進的模型進行特徵抽取，並透過損失函數的設計與模型架構的優化來處理領域轉移，最後利用基於密度的方法計算異音分數，得到最終結果。最後我們與 DCASE 競賽官方公佈的模型、其他參賽者的模型做比較，官方公佈模型的基線平均接收操作特徵圖下面積 (area under the ROC Curve, AUC) 為 63.36%，本研究的模型相較於官方公佈模型的表現有顯著的提升，平均 AUC 可達到 72%，與其他參賽者的模型比較也能達到前五名的表現。	zh_TW
dc.description.abstract	In this field of machine learning, sound-related research also has many applications, such as noise reduction, human voice separation, speech recognition, and so on. Abnormal sound detection is also one of the topics in sound processing. The main purpose is to allow the model to correctly identify whether there is an abnormality in a sound in a short period of time. In recent years, as the application of machine learning has become more and more extensive, more and more users are considering introducing machine learning into their own fields to speed up or improve the accuracy rate of the original operation process, and now the world is also moving on the road of Industry 4.0, many traditional industries are gradually becoming interested in machine learning, and abnormal sound detection is an application suitable for use in factories. It can not only assist operators, but also speed up operations and improve product yields. Based on the previous research on abnormal sound detection, this paper discusses how to maintain or even improve the accuracy in the case of field transfer. In this paper, we look at the sound in the form of spectrogram, and then represent the sound as images, and then use the CNN combined with the transformer model for feature extraction. The multi-task learning model is used as a classifier to judge whether the sound clip is abnormal or not. We use the DCASE competition dataset, ap- plying state-of-the-art model for feature extraction, and the domain transfer is handled through the design of the loss function and the optimization of the model architecture. Finally, the density-based method is used to calculate the abnormal sound score, and the final result is obtained. Finally, we compare with the model officially announced by the DCASE competition and the models of other contestants. The average area under the ROC Curve (AUC) of the baseline model that officially announced is 63.36%. Compared with the baseline model, Performance of the model in this study has been significantly im- proved. The average AUC can reach 72%, and it can also achieve the top five performance compared with other contestants’ models.	en
dc.description.provenance	Made available in DSpace on 2023-03-19T23:29:30Z (GMT). No. of bitstreams: 1 U0001-1509202211471500.pdf: 10158207 bytes, checksum: 7200125fb92d24ca1acafc019a1ac521 (MD5) Previous issue date: 2022	en
dc.description.tableofcontents	致謝 iii 摘要 v Abstract vii 目錄 ix 第一章緒論 1 1.1 研究簡介與動機 1 1.2 研究貢獻 2 1.3 章節概述 2 第二章文獻探討 5 2.1 異常檢測模型 5 2.1.1 區域性異常因子 5 2.1.2 單類支持向量機 7 2.1.3 孤立森林 8 2.2 異音偵測模型 10 2.2.1 基於自編碼器的異音偵測模型 10 2.2.2 基於卷積神經網路的異音偵測模型 11 2.3 領域適應 13 2.3.1 概述領域自適應 14 2.3.2 聯想領域適應 15 2.4 參賽模型表現結果 17 2.4.1 互補異常檢測器的集合 17 2.4.2 基於卷積神經網路特徵的異常聲音檢測 18 2.4.3 利用子集群 ADACOS 進行域轉移條件下的異常聲音檢測 19 第三章資料集簡介 21 3.1 DCASEWorkshop 21 3.1.1 Challenge2021Task2 21 3.2 資料集術語 22 3.2.1 機器類別(machine type) 22 3.2.2 sectioon 23 3.2.3 源域(source domain)與目標域(target domain) 26 3.3 資料集大小與分佈 26 3.4 任務規則與定義 27 第四章研究方法 29 4.1 頻譜圖提取 29 4.1.1 梅爾頻譜圖 29 4.1.2 頻譜圖裁切 30 4.2 模型與架構 31 4.2.1 特徵抽取器1:AutoEncoder 32 4.2.2 特徵抽取器2:CoAtNet 33 4.2.2.1 轉換器(Transformer) 33 4.2.2.2 卷積神經網路 (Convolutional Neural Network, CNN) 35 4.2.2.3 CoAtNet 35 4.2.3 領域分類器 37 4.2.4 異音判斷器 39 4.2.4.1 局部異常因子 39 4.2.4.2 單類支持向量機 40 4.2.4.3 孤立森林 40 4.3 損失函數 40 4.4 訓練參數 42 4.5 實作工具 42 第五章實驗結果 43 5.1 實驗一:不同模型架構比較 43 5.1.1 實驗一之一:不同特徵抽取器與異音分數計算方法 44 5.1.2 實驗一之二:不同方法處理領域轉移 45 5.2 實驗二:與過去模型間的比較 46 5.3 實驗三:領域分類器的效果 48 5.4 實驗四:敏感度分析 50 5.4.1 實驗四之一:輸入時頻譜大小 50 5.4.2 實驗四之二:區域性異常因子k值 51 5.5 實驗五:集成學習實驗 52 5.5.1 實驗五之一:與過去方法做集成 52 5.5.2 實驗五之二:實驗4-1模型集成 53 5.5.3 實驗五之三:實驗4-2模型集成 54 5.5.4 實驗五之四:實驗5-2與5-3模型集成 55 第六章結論與未來展望 57 6.1 結論 57 6.2 未來展望 58 參考文獻 61
dc.language.iso	zh-TW
dc.title	領域轉移下的非監督式異音偵測	zh_TW
dc.title	Unsupervised Anomalous Sound Detection under Domain Shifts	en
dc.type	Thesis
dc.date.schoolyear	110-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	王新民(Hsin-Min Wang),李宏毅(Hung-Yi Lee)
dc.subject.keyword	異音偵測,異常檢測,時頻譜,領域轉移,領域自適應,非監督式學習,轉換器,	zh_TW
dc.subject.keyword	abnormal sound detection,anomaly detection,time-spectrum,domain transfer,domain adaptation,unsupervised learning,transformers,	en
dc.relation.page	64
dc.identifier.doi	10.6342/NTU202203426
dc.rights.note	同意授權(全球公開)
dc.date.accepted	2022-09-23
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊網路與多媒體研究所	zh_TW
dc.date.embargo-lift	2022-09-30	-
顯示於系所單位：	資訊網路與多媒體研究所

文件中的檔案：

檔案	大小	格式
U0001-1509202211471500.pdf	9.92 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。