請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/97312| 標題: | 應用自督導式學習模型於自動化華語發音評估系統 Automatic Mandarin Pronunciation Assessment Applying Self-supervised Learning Models |
| 作者: | 謝文崴 Wen-Wei Hsieh |
| 指導教授: | 葉丙成 Ping-Cheng Yeh |
| 共同指導教授: | 劉德馨 Te-Hsin Liu |
| 關鍵字: | 語料庫,華語,第二語言,深度學習,電腦輔助發音訓練, corpus,Mandarin,second language (L2),deep learning,computer-aided pronunciation training (CAPT), |
| 出版年 : | 2024 |
| 學位: | 碩士 |
| 摘要: | 電腦輔助發音訓練在近年來有長足的發展,其有兩個主要用途,發音評分與教學,而影響發音的因素有準確度、語調、韻律等,這些都是較難量化的標準,除此之外,華語發音的文獻更是稀少且缺少資料集,大部分都是針對英語的研究。為了解決這問題,本論文有主要的兩個貢獻,一是建立公開、開源、專業的華語語音資料集,此資料集收集了法籍學習者的語料,並經由有華語教學背景、相關實習經驗的專家進行評分,此外,也一同開源評分系統,可供未來想建立更龐大的語料庫的研究者做參考。透過此資料集,學者們能做模型訓練,並針對華語發音評分做進一步的改進與優化。其次,本論文另一個貢獻在於應用了自督導式學習的機器學習模型在此資料集,在資料有限的情況下能有不錯的準確率,也比較了不同的自督導式學習模型和傳統方法做特徵萃取,並結合我們為語音設計的深度學習模型做訓練,提供了基準讓後續的研究者能夠做參考。總體而言,本研究不僅建立了開源、專業的華語語音資料集,讓研究者能應用此資料集做更多相關的研究,同時也開源評分系統,希望能讓更多人投入在華語語料的建立,最後透過應用了自督導式學習的機器學習模型,在此資料集上有不錯的表現,為後續的研究建立的基準。 This paper addresses significant advancements in computer-aided pronunciation training, highlighting its two primary applications: pronunciation scoring and instructional support. Factors like accuracy, intonation, and prosody significantly affect pronunciation, but these elements are often difficult to measure. Additionally, literature on Mandarin pronunciation is notably scarce and lacks substantial datasets, with most research focusing on English. To address these issues, this thesis makes two main contributions. First, it establishes an open-source, professionally curated Mandarin speech dataset, which includes recordings from French learners and has been evaluated by experts with backgrounds in Mandarin teaching and relevant practical experience. Additionally, an open-source scoring system has been developed to serve as a reference for researchers aiming to create larger corpora. This dataset allows scholars to train models and improve and optimize Mandarin pronunciation assessments. Secondly, the thesis employs self-supervised learning models on this dataset, achieving commendable accuracy despite the limited data available. It compares different self-supervised learning models and traditional methods for feature extraction, integrating them with our specially designed deep learning model for voice training. This method provides a benchmark for future researchers to reference. This study establishes an open-source, professional Mandarin speech dataset, facilitating further research in this area. It also shares the scoring system, encouraging more contributions to the development of Mandarin corpora. The successful application of self-supervised learning models on this dataset establishes a strong foundation for subsequent research. |
| URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/97312 |
| DOI: | 10.6342/NTU202500780 |
| 全文授權: | 同意授權(全球公開) |
| 電子全文公開日期: | 2025-04-25 |
| 顯示於系所單位: | 電信工程學研究所 |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-113-2.pdf | 9.21 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
