Skip navigation

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More
DSpace logo
English
中文
  • Browse
    • Communities
      & Collections
    • Publication Year
    • Author
    • Title
    • Subject
    • Advisor
  • Search TDR
  • Rights Q&A
    • My Page
    • Receive email
      updates
    • Edit Profile
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電機工程學系
Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/54516
Title: 元學習於端對端語音辨識之探討
Meta Learning in End-to-End Speech Recognition
Authors: Jui-Yang Hsu
徐瑞陽
Advisor: 李宏毅(Hung-Yi Lee)
Keyword: 元學習,語音辨識,聲學模型,轉移學習,模型無關元學習,少資源,多工學習,
Meta Learning,Speech Recognition,Acoustic Modeling,Transfer Learning,Model-Agnostic Meta Learning,Low-Resource,Multitask Learning,
Publication Year : 2020
Degree: 碩士
Abstract: 本論文探討在標註資料有限的前提下,不同轉移學習方法在不同情境下的自動語音辨識之成效。實現上以 2017 年以降,在少樣本影像識別與強化學習中取得初步成功的元學習方法 - 模型無關元學習與在語音領域已行之有年的多工學習作為探討主軸。本論文所檢視的情境為跨語言音素辨識、跨腔調端對端語音辨識、跨語言端對端語音辨識三種,從聲學模型的任務出發到端對端語音辨識、從較為單純的深層類神經網路到複雜的轉換器模型、從資料相似程度較高的跨腔調情境到跨語言情境,一步一步地拓展元學習在語音相關任務上的應用界限。並以不同的預訓練資料集合、驗證資料集合、微調迭代數、微調資料量多寡及在預訓練時的採樣策略,嘗試找出在什麼樣的情境下,元學習能帶來更多表現的進步。實驗結果顯示,在極少資源的跨語言音素辨識、少資源但資料相似程度較高的跨腔調端對端語音辨識,元學習的方法都展露了較多工學習更為優秀的轉移成效;但在資料差異較大且不能以過少語料訓練的跨語言端對端語音辨識,其表現便與多工學習旗鼓相當。以此展現了適合應用元學習的語音任務情境需具備何種特性,作為學界後續研究的參考。
This thesis surveys various kinds of transfer learning methods under low-resource setting. In addition to the popular implementation of transfer learning, multitask learning methods, we firstly introduce meta learning methods into speech processing. This thesis uses cross-language phoneme recognition, cross-accent end-to-end speech recognition, cross-language end-to-end speech recognition as testing scenarios. To explore the limitation of applying meta learning in speech processing, we start from simple acoustic modeling to more complicated end-to-end speech recognition, from simple multi-layer neural network to more complicated transformer architecture, and from similar cross-accent settings to the more challenging and dissimilar cross-language setting. To find the suitable transfer learning methods under a specific scenario, we control the variables like pretraining datasets, validation sets, number of fine-tuning steps, number of data used in fine-tuning, and the sampling strategies during pretraining. The initial experiments show that under low-resource setting, in cross-language phoneme recognition and cross-accent end-to-end speech recognition tasks, meta learning methods outperform multitask learning methods. However, under more challenging tasks like cross-language end-to-end speech recognition, there is no performance gap between these two methods. We believe such findings can help the researchers explore more possibilities of applying meta learning methods in speech processing.
URI: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/54516
DOI: 10.6342/NTU202100553
Fulltext Rights: 有償授權
Appears in Collections:電機工程學系

Files in This Item:
File SizeFormat 
U0001-0502202100244400.pdf
  Restricted Access
2.5 MBAdobe PDF
Show full item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved