語音分離技術研究:模型壓縮與多工學習

Chao-I Tuan; 段昭誼

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/55183

標題:	語音分離技術研究:模型壓縮與多工學習 Research of Speech Separation: Network Compression andMulti-task Learning
作者:	Chao-I Tuan 段昭誼
指導教授:	李宏毅(Hung-Yi Lee)
共同指導教授:	曹昱(Yu Tsao)
關鍵字:	語音分離,膜型壓縮,多工學習,終端應用,語音去噪, speech separation,speech denoising,model compression,multitask learning,endpoint applications,
出版年 :	2021
學位:	碩士
摘要:	本論文中，我們提出了兩種新穎的語音分離模型架構，分別以模型壓縮和噪聲環境下的語音分離任務為目標，我們期望透過改進現有語音分離模型以達到更通用化、更貼近真實應用場景的語音分離系統(Universal Separation)。針對模型壓縮，參照參數共享方法在自然語言處理模型壓縮上帶來的成功。我們探討參數共享方法，在時域語音分離模型上的影響，並針對時域模型設計對應的參數共享策略。模型穩定性評估對於壓縮後模型非常重要。實驗證明，我們所提出的MiTAS在保有相同的語音分離表現之外，能壓縮近50%參數量，並通過多重穩定性評估實驗。模型壓縮使得語音分離能朝向終端使用者並更接近應用的普及化。本論文第二個研究方向為改善噪聲環境下的語音分離任務的表現，由於語音去噪與語音分離任務在本質上相近，我們提出統一的模型架構SADDEL將兩任務透過多工學習框架合併在一個框架下，因此模型本身能執行語音分離以及語音去噪任務。實驗證明SADDEL較單一任務模型表現更好並較其他比較模型更貼近真實環境中的場景。其在語音分離及語音去噪表現和在未知噪聲及噪聲程度下的模型穩定性也都獲致成功。語音分離的應用包括，現實生活中語音分離數據的採集標記以及在嘈雜環境中進行自動語音辨識(Automatic Speech Recognition, ASR)、語者辨識(Speaker Recognition)等應用。將語音訊息從人聲混雜以及背景噪聲中提取出來，對於下游各種語音訊號處理系統皆相當重要。 In this paper, we propose two novel model architectures in speech separation to boost applications in real world scenarios through two aspects. Our goal targets model compression and speech separation in noisy environments respectively. We hope to improve the existing speech separation models to achieve wilder generalizability and step closer toward an universal separation system. Our first research interest is model compression, inspired by the success of parameter sharing in compression of natural language processing models. We investigated the effectiveness of such methods on time domain speech separation and proposed several parameter sharing strategies. We also looked into some important design aspects leading to a parameter efficient model. Model stability evaluation is very important for the compressed model. Experimental results have proved that our proposed MiTAS can compress nearly 75% of the model parameters while maintaining the same speech separation performance. Besides, MiTAS has passed multiple stability evaluation experiments indicating its robustness. In summary, MiTAS represents a significant step toward the realization of separation on edge devices and enables a wider range of downstream applications. Our second research interest is to improve the speech separation performance under noisy environments. Since speech separation and denoising tasks have similar nature. In this study, we propose a joint speech separation and denoising framework based on the multitask learning criterion to tackle the two issues simultaneously. Under the framework, the model itself can perform speech separation and speech denoising tasks. Experimental results demonstrate that SADDEL outperforms comparative speech denosing and speech separation models, and exhibits promising results on various noisy separation tasks. Moreover, SADDEL can provide high performance robustness across different datasets, noise types, and SNR levels. Common application of speech separation include labeling of collected real world separation data, automatic speech recognition (ASR) and speaker recognition in noisy environments. Extracting speech from a mixture of human voice and background noise is very important for various downstream speech processing systems.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/55183
DOI:	10.6342/NTU202100540
全文授權:	有償授權
顯示於系所單位：	資料科學學位學程

文件中的檔案：

檔案	大小	格式
U0001-0402202121321200.pdf 目前未授權公開取用	2.33 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。