利用多任務學習模型建立發音特徵來改善華語錯誤發音偵測與診斷之回饋

Xuan-Bo Chen; 陳宣伯

Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/7292

Title:	利用多任務學習模型建立發音特徵來改善華語錯誤發音偵測與診斷之回饋 Mandarin Mispronunciation Detection and Diagnosis Feedback Using Articulatory Attributes Based Multi-task Learning
Authors:	Xuan-Bo Chen 陳宣伯
Advisor:	張智星(Jyh-Shing Roger Jang)
Keyword:	電腦輔助發音訓練,錯誤發音偵測,錯誤發音診斷,發音部位-發音方式,多任務學習,鑑別性訓練,時間延遲神經網路, computer assisted pronunciation training,mispronunciation detection,articulatory features,multi-task learning,discriminative training,time-delay neural networks,
Publication Year :	2019
Degree:	碩士
Abstract:	此篇論文在探討電腦輔助發音訓練，我們聚焦於錯誤發音偵測以及提供與口腔模型相關的回饋。我們提出加入發音特徵(speech attributes)，像是發音部位-發音方式 (place and manner)，能有助於改善錯誤發音偵測與更為精準地提供錯誤發音診斷。實作上我們利用時間延遲神經網路(time-delay neural networks)並採用多任務學習策略(multi-task learning strategy)訓練了具鑑別力的發音模型並能輸出一個音素的發音分數(articulatory score)，以及利用時間延遲神經網路訓練聲學模型並能輸出一個音素分數(phonetic score)。在測試階段，系統會基於發音分數與音素分數偵測發音錯誤並且給予一個精準的發音改進回饋。此論文實驗採用的語料為公視國語新聞廣播節目 (MATBN)，並利用equal error rate (EER)、diagnosis accuracy (DA)來顯示深度類神經網路-隱式馬可夫模型(DNN-HMM)的表現比高斯混合模型-隱式馬可夫模型(GMM-HMM)來得好。除此之外，我們提出的方法能適用於各種語言，但此篇論文著重於華語的探討。 This paper presents our research on computer assisted pronunciation training (CAPT). We focus on mispronunciation detection and articulation feedback. We propose taking into account the speech attributes, namely place and manner of articulation, in the assessment models to improve mispronunciation detection and return precise articulation feedback to learners. We train a discriminative articulatory model based on time-delay neural networks (TDNNs) with the multi-task learning strategy to give the articulatory score and a TDNN-based acoustic model to give the phonetic score. In testing, the system detects mispronunciations and returns precise articulation feedback based on both the phonetic and articulatory scores. The results of experiments conducted on the MATBN Mandarin Chinese broadcast news corpus show that the proposed models outperform the Gaussian mixture model (GMM)-based and deep neural network (DNN)-based baselines in terms of equal error rate (EER) and diagnostic accuracy (DA). Furthermore, our mispronunciation detection system should work in any language, although the current system focuses on Mandarin.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/7292
DOI:	10.6342/NTU201901749
Fulltext Rights:	同意授權(全球公開)
metadata.dc.date.embargo-lift:	2024-07-25
Appears in Collections:	資訊工程學系

Files in This Item:

File	Size	Format
ntu-108-1.pdf Until 2024-07-25	3.15 MB	Adobe PDF

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets