Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電信工程學研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88249
標題: 經知識蒸餾之自監督式語音模型所生成之信號表徵序列之壓縮
Signal Representation Sequence Compression for Distilled Self-Supervised Speech Models
作者: 孟妍
Yen Meng
指導教授: 李琳山
Lin-Shan Lee
關鍵字: 自監督式學習,序列壓縮,次採樣,降低運算負擔,
Self-supervised Learning,Sequence Compression,Subsampling,Computational Load Reduction,
出版年 : 2023
學位: 碩士
摘要: 自監督式學習 (Self-supervised Learning) 的技術在語音處理領域上已有相當成功的發展。透過在大量未標註之語料上的預訓練 (Pre-training),自監督式語音模型 (Self-Supervised Speech Models) 能學習到語音中蘊含的各種語言知識與語言元素,如語音內容、語者特徵等,因而使自監督式語音模型經微調 (Fine-tuning) 在少量有標註之資料後,能夠在各類語音下游任務上均取得不錯的性能 (Performance) 表現。在大型自監督式語音模型崛起並取得壓倒性優勢後,為了使自監督式語音模型能夠更方便容易地被各界訓練及使用,壓縮自監督式語音模型的研究變得更為重要。先前的研究多集中在壓縮模型本身的大小;卻未曾注意到另一個可能的方向,壓縮時間軸上之序列,將其長度縮短,也可有效減少模型的運算負擔。這就是本論文的研究主軸:透過壓縮語音信號在時間軸上之序列長度,來降低自監督式語音模型之運算負擔。

由於不同類別的下游任務有不同的性質,本論文首先探討了各種下游任務對輸入的語音表徵 (Speech Representation) 的採樣率 (Sampling Rate),亦即單位時間內所需表徵總數,的敏感程度。本論文的研究並包括了在時間軸上進行固定間距次採樣 (Fixed-length Subsampling) 及可變間距次採樣 (Variable-length Subsampling) 兩種不同的壓縮序列長度的思維。本研究發現,如能使用適當的次採樣技術來壓縮序列長度,不僅可以顯著加快預訓練及推論的速度,而且有機會在固定採樣率下,提高特定下游任務的整體表現;本研究也證實了可變間距次採樣的技術在較高的序列壓縮比(Compression Ratio) 的目標下,可以獲得特別好的性能表現,尤其是在與語音內容相關、對採樣率較敏感之任務上。本論文也發現,如果我們能夠取得語音中的近似音素邊界,並使用此近似邊界進行次採樣,即使次採樣後的平均採樣率低至10 Hz,也仍能夠保有,甚至超越原本未經壓縮時間序列之模型的性能表現。
Self-supervised learning has achieved considerable success in speech processing. By pre-training on a large unlabeled speech dataset, self-supervised speech models can learn underlying structure, knowledge, and information in speech, such as the content and speaker characteristics, enabling the models to achieve good performance on various downstream speech tasks after fine-tuning only on a small amount of labeled data. With the rise of large-scale self-supervised speech models and their overwhelming advantages, research on compressing self-supervised speech models has become increasingly important to make them easier to be trained and used in various domains.

While previous research has primarily focused on compressing the model size, shortening the length of the signal representation sequences along the time axis is also effective for reducing the computational load in speech processing, but almost overlooked in the past. Therefore, the main focus of this thesis is to consider and analyze the possibility of compressing the length of the signal representation sequences along the time axis to reduce the computational cost of self-supervised speech models.

As different downstream tasks have different properties, this work first investigates how individual downstream tasks are sensitive to the sampling rates of the signal representations. This work studies both fixed-length subsampling and variable-length subsampling along the time axis in self-supervised learning. We find subsampling the signal representation sequences while training self-supervised speech models not only can significantly speed up the pre-training and inference processes, but may also improve the overall performance of specific downstream tasks under certain scenarios. It is also found that variable-length subsampling performs particularly well under some relatively high sequence compression ratios, especially for tasks related to speech content, which are more sensitive to signal representation subsampling rates. Additional experiments show that if given approximate phone boundaries, the average sampling rates based on the approximate phone boundaries can be as low as 10 Hz while outperforming the original model without sequence compression.
URI: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88249
DOI: 10.6342/NTU202301448
全文授權: 同意授權(全球公開)
顯示於系所單位:電信工程學研究所

文件中的檔案:
檔案 大小格式 
ntu-111-2.pdf3.9 MBAdobe PDF檢視/開啟
顯示文件完整紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved