元學習於端對端語音辨識之探討

Jui-Yang Hsu; 徐瑞陽

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/54516

標題:	元學習於端對端語音辨識之探討 Meta Learning in End-to-End Speech Recognition
作者:	Jui-Yang Hsu 徐瑞陽
指導教授:	李宏毅(Hung-Yi Lee)
關鍵字:	元學習,語音辨識,聲學模型,轉移學習,模型無關元學習,少資源,多工學習, Meta Learning,Speech Recognition,Acoustic Modeling,Transfer Learning,Model-Agnostic Meta Learning,Low-Resource,Multitask Learning,
出版年 :	2020
學位:	碩士
摘要:	本論文探討在標註資料有限的前提下，不同轉移學習方法在不同情境下的自動語音辨識之成效。實現上以 2017 年以降，在少樣本影像識別與強化學習中取得初步成功的元學習方法 - 模型無關元學習與在語音領域已行之有年的多工學習作為探討主軸。本論文所檢視的情境為跨語言音素辨識、跨腔調端對端語音辨識、跨語言端對端語音辨識三種，從聲學模型的任務出發到端對端語音辨識、從較為單純的深層類神經網路到複雜的轉換器模型、從資料相似程度較高的跨腔調情境到跨語言情境，一步一步地拓展元學習在語音相關任務上的應用界限。並以不同的預訓練資料集合、驗證資料集合、微調迭代數、微調資料量多寡及在預訓練時的採樣策略，嘗試找出在什麼樣的情境下，元學習能帶來更多表現的進步。實驗結果顯示，在極少資源的跨語言音素辨識、少資源但資料相似程度較高的跨腔調端對端語音辨識，元學習的方法都展露了較多工學習更為優秀的轉移成效；但在資料差異較大且不能以過少語料訓練的跨語言端對端語音辨識，其表現便與多工學習旗鼓相當。以此展現了適合應用元學習的語音任務情境需具備何種特性，作為學界後續研究的參考。 This thesis surveys various kinds of transfer learning methods under low-resource setting. In addition to the popular implementation of transfer learning, multitask learning methods, we firstly introduce meta learning methods into speech processing. This thesis uses cross-language phoneme recognition, cross-accent end-to-end speech recognition, cross-language end-to-end speech recognition as testing scenarios. To explore the limitation of applying meta learning in speech processing, we start from simple acoustic modeling to more complicated end-to-end speech recognition, from simple multi-layer neural network to more complicated transformer architecture, and from similar cross-accent settings to the more challenging and dissimilar cross-language setting. To find the suitable transfer learning methods under a specific scenario, we control the variables like pretraining datasets, validation sets, number of fine-tuning steps, number of data used in fine-tuning, and the sampling strategies during pretraining. The initial experiments show that under low-resource setting, in cross-language phoneme recognition and cross-accent end-to-end speech recognition tasks, meta learning methods outperform multitask learning methods. However, under more challenging tasks like cross-language end-to-end speech recognition, there is no performance gap between these two methods. We believe such findings can help the researchers explore more possibilities of applying meta learning methods in speech processing.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/54516
DOI:	10.6342/NTU202100553
全文授權:	有償授權
顯示於系所單位：	電機工程學系

文件中的檔案：

檔案	大小	格式
U0001-0502202100244400.pdf 未授權公開取用	2.5 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。