輕量化微調於自監督式語音模型之探討

陳子晴; Zih-Ching Chen

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/90193

標題:	輕量化微調於自監督式語音模型之探討 Exploring Parameter-Efficient Tuning in Self-supervised Speech Models
作者:	陳子晴 Zih-Ching Chen
指導教授:	李宏毅 Hung-yi Lee
關鍵字:	輕量化微調方法,附加器,預訓練模型,自監督式語音模型, Parameter-efficient Fine-tuning,Adapter,Pre-trained Model,Self-supervised speech model,
出版年 :	2023
學位:	碩士
摘要:	本研究的目標是探索輕量化微調方法在自監督語音模型中的應用，以更有效率地的使用自監督式語音模型。研究表明，自監督學習對於各種語音任務都有著很大的潛力，可以透過微調的方式被應用於不同的下游語音任務中。然而，傳統的微調方法在處理數百萬個參數的自監督學習模型時存在著參數使用效率低的問題。為了解決這個問題，我們引入了附加器，這是一種在自然語言處理中常用的輕量級模塊，來讓自監督式預訓練語音模型更好且更有效率地被應用到下游任務當中。在本研究中，我們將自監督式預訓練語音模型的參數凍結，僅對附加器部分的參數進行微調。考慮到目前對於適配器在自監督語音任務中的有效性缺乏研究，我們通過在預訓練的語音自監督學習模型中添加不同的適配器模塊來填補這一空白。具體而言，我們將不同的高效微調方法應用於基於SUPERB基準的自監督語音模型。我們提出了一個適配器框架，用於處理多個下游語音處理任務，例如語音識別、分類和說話者識別。通過這項研究，我們希望能夠有效利用高效微調方法來提升語音模型的性能，並為語音處理領域中的多個下游任務提供更好的解決方案。 In this study, we aim to explore efficient fine-tuning methods for self-supervised speech representation learning. Recent research has demonstrated the potential of self-supervised learning for various speech tasks. However, traditional fine-tuning approaches suffer from inefficiency in parameter usage when dealing with large-scale self-supervised models. To address this issue, we introduce adapter modules, a lightweight module commonly used in natural language processing. Our approach involves freezing the parameters of the self-supervised learning model and only fine-tuning the adapter modules for downstream tasks. Considering the lack of research on the effectiveness of adapters in self-supervised speech tasks, we fill this gap by incorporating different adapter modules into pre-trained speech self-supervised learning models. Specifically, we apply different efficient fine-tuning methods, including adapter fine-tuning and prompt fine-tuning, on self-supervised speech models based on the SUPERB benchmark. We propose an adapter framework that can handle multiple downstream speech processing tasks, such as speech recognition, classification, and speaker identification. Through this research, we aim to effectively leverage efficient fine-tuning methods to enhance the performance of speech models. Additionally, we strive to fill the research gap in the application of adapters in self-supervised speech tasks and provide better solutions for multiple downstream tasks in the field of speech processing.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/90193
DOI:	10.6342/NTU202303836
全文授權:	同意授權(限校園內公開)
顯示於系所單位：	電信工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-111-2.pdf 授權僅限NTU校內IP使用（校園外請利用VPN校外連線服務）	1.34 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。