用於自監督式語音模型之各項任務泛用序列壓縮法

陳宣叡; Hsuan-Jui Chen

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/91351

標題:	用於自監督式語音模型之各項任務泛用序列壓縮法 Once-for-all Sequence Compression for Self-supervised Speech Models
作者:	陳宣叡 Hsuan-Jui Chen
指導教授:	李宏毅 Hung-yi Lee
關鍵字:	自監督式學習,序列壓縮法,各項任務泛用訓練, self-supervised learning,sequence compression,once-for-all training,
出版年 :	2023
學位:	碩士
摘要:	自監督式語音模型（self-supervised speech models）在現今多項語音下游任務中達到了最先進的結果，同時展現了其在不同下游任務的泛用性，為了降低自監督式語音模型的運算量以在不同的裝置運算限制之下運行，多種不同的技術被應用來降低自監督式語音模型的運算成本，其中序列壓縮法利用語音模型的特性，以減少序列長度的方式降低運算量。本論文提出各項任務泛用序列壓縮法，讓單一的預訓練模型能根據下游任務需求動態的改變其序列壓縮率。首先，本論文將所提出之各項任務序列壓縮法應用在兩種自監督式語音模型：知識蒸餾模型及對比式預訓練模型上，並將結果驗證在SUPERB基準中的多項語音下游任務當中。所提出之方法將前作預訓練模型所使用的單一序列壓縮率擴展到連續可用的壓縮率區間，同時將驗證的序列壓縮率進一步推進到了最大48倍的壓縮率。接著，為了更進一步節省使用網格搜尋（grid search）尋找最佳結果所帶來的額外運算量，本論文實驗同時優化下游任務模型及上游預訓練模型壓縮率，比較此設定所得之結果和以網格搜尋所得之最佳結果間的差異，初步驗證了所提出之框架在不需要網格搜尋的前提下亦能找到最佳下游任務結果。 Self-supervised speech models achieve state-of-the-art results in many speech downstream tasks, showing their generalizability across different tasks. In order to operate under multiple device computational constraints, several methods have been applied to lower the computational cost of self-supervised speech models. The thesis proposed a once-for-all sequence compression method for self-supervised speech models enabling a single pre-trained model to change the sequence compressing rate on demand at inference time. To begin with, the thesis applied the proposed once-for-all sequence compressing method on two self-supervised speech models: a knowledge distillation and a contrastive learned pre-train model, then evaluate the result on several downstream tasks from the SUPERB benchmark. The proposed method extends the original single sequence compressing rate into a continuous range of operating compressing rates, in addition, pushes the upper limit of sequence compressing to 48 times. To further reduce the computational cost of finding the optimal result by grid search, the thesis experiments with the ability to tune the upstream compressing rate along with the downstream model. Comparing the result of adaptive compressing rate learning with the overall best result obtained by grid search shows that the proposed framework has the ability to find the close to optimal result without grid search.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/91351
DOI:	10.6342/NTU202304502
全文授權:	同意授權(全球公開)
顯示於系所單位：	電信工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-112-1.pdf	4.9 MB	Adobe PDF	檢視/開啟

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。