Skip navigation

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More
DSpace logo
English
中文
  • Browse
    • Communities
      & Collections
    • Publication Year
    • Author
    • Title
    • Subject
    • Advisor
  • Search TDR
  • Rights Q&A
    • My Page
    • Receive email
      updates
    • Edit Profile
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電信工程學研究所
Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/91978
Title: 語音處理的解離表徵學習、多任務學習及通用模型
Disentangled Representation Learning, Multi-task Learning and Universal Modeling for Speech Processing
Authors: 陳奕禎
Yi-Chen Chen
Advisor: 李宏毅
Hung-yi Lee
Keyword: 語音處理,多任務學習,統一模型,多模態,解離表徵,
speech processing,multi-task learning,universal model,multimodal,disentangled representation,
Publication Year : 2023
Degree: 博士
Abstract: 由於深度學習的成功,愈來愈多基於強大深度模型的語音處理應用影響了我們的生活。然而,這些強大的模型總是針對特定任務而特化,而難以泛化用在其他的任務。因此,對於每個任務,我們都必須分別收集、設計、訓練以及調整所有的資料及模型架構。如果有一種通用模型能夠同時學習並進行多種不同的語音處理任務,那有些任務也許可以透過其他任務習得的技能而更加進步,而且透過不同任務得來的資料也可以被加以利用。但是,這種通用模型需要能從語音訊號中汲取不同種資訊(內容或語者),也能處理不同輸入或輸出模態(語音或文字)的多種任務。因此本論文中,我們針對這兩種問題提出方法。我們藉由對抗式訓練,解離並汲取語音訊號中不同種資訊。我們也提出方法能夠利用多任務訓練使單一模型處理多種任務。
Owing to the success of deep learning, more and more applications built on powerful deep models for speech processing tasks have influenced our lives. However, these powerful models are always task-specific and have limited capability to generalize to other tasks. Therefore, all the data and model architectures are collected, designed, trained, and tuned separately for each task. If a universal model can simultaneously learn and perform multiple speech processing tasks, some tasks might be improved with the related abilities learned from other tasks, and more data from various tasks might be leveraged. However, such universal model requires capabilities to extract different kinds of information from speech signals (content or speaker) and handle various tasks with different input and output modalities (speech or text). Hence, in this thesis, we propose approaches to address these two problems. We disentangle and extract different kinds of information from speech signals with adversarial training. Then we propose approaches to handle various tasks using one single model with multi-task learning.
URI: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/91978
DOI: 10.6342/NTU202302463
Fulltext Rights: 同意授權(全球公開)
Appears in Collections:電信工程學研究所

Files in This Item:
File SizeFormat 
ntu-112-1.pdf4.29 MBAdobe PDFView/Open
Show full item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved