使用深度學習以改善語音評分之方法與比較

Chun-Hao Fan; 范君豪

Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/49897

Full metadata record

???org.dspace.app.webui.jsptag.ItemTag.dcfield???	Value	Language
dc.contributor.advisor	張智星(Jyh-Shing Roger Jang)
dc.contributor.author	Chun-Hao Fan	en
dc.contributor.author	范君豪	zh_TW
dc.date.accessioned	2021-06-15T12:25:52Z	-
dc.date.available	2017-06-28
dc.date.copyright	2016-08-24
dc.date.issued	2016
dc.date.submitted	2016-08-11
dc.identifier.citation	[1] Kai-Fu Lee, and Hsiao-Wuen Hon, “Speaker-Independent Phone Recognition Using Hidden Markov Models”, International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1989. [2] Li Deng, Dong Yu, Jinyu Li, Jui-Ting Huang, Kaisheng Yao, Frank Seide, Michael Seltzer, Geoff Zweig, Xiaodong He, Jason Williams, Yifan Gong, and Alex Acero, “Recent Advances In Deep Learning For Speech Research At Microsoft”, International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2013. [3] Xavier Glorot, and Yoshua Bengio, “Understanding the Difficulty of Training Deep Feedforward Neural Networks”, International Conference on Artificial Intelligence and Statistics (AISTATS), 2010. [4] Stanford University, CS231n: Convolutional Neural Networks for Visual Recognition, Lecture 5 Slides, pp.52-64. available at 'http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf'. [5] Stanford University, CS231n: Convolutional Neural Networks for Visual Recognition, Note: Weight Initialization available at 'http://cs231n.github.io/neural-networks-2/'. [6] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, “Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification”, International Conference on Computer Vision (ICCV), 2015. [7] Stack Exchange, “What are good initial weights in a neural network?”, available at 'http://stats.stackexchange.com/questions/47590/what-are-good-initial-weights-in-a-neural-network'. [8] Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov, “Dropout: A Simple Way to Prevent Neural Networks from Overfitting”, Journal of Machine Learning Research 15, pp.1929-1958, 2014. [9] Catia Cucchiarini, Nelmer Strik, and Lou Boves, “Automatic Evaluation of Dutch Pronunciation by Using Speech Recognition Technology”, Paper presented at the IEEE Automatic Speech Recognition and Understanding Workshop, 1997. [10] 李俊毅，語音評分，國立清華大學碩士論文，民國 91 年。 [11] 羅瑞麟，以語音辨識與評分輔助口說英文學習，國立清華大學碩士論文, 民國 92 年。 [12] 陳宏瑞，使用多重聲學模型以改良台語語音評分，國立清華大學碩士論文，民國100年。 [13] 劉承泰，嵌入式語音命令系統的設計與改進，國立清華大學碩士論文，民國102年。 [14] Diederik P. Kingma, and Jimmy Lei Ba, “Adam: A Method for Stochastic Optimization”, International Conference on Learning Representations (ICLR), 2015. [15] Hung-Yi Lee, Machine Learning and having it deep and structured, available at 'http://speech.ee.ntu.edu.tw/~tlkagk/courses_MLSD15_2.html'. [16] Jyh-Shing Roger Jang, 'Machine Learning Toolbox', available at 'http://mirlab.org/jang/matlab/toolbox/machineLearning'. [17] Jyh-Shing Roger Jang, 'ASR (Automatic Speech Recognition) Toolbox', available at 'http://mirlab.org/jang/matlab/toolbox/ASR'. [18] Jyh-Shing Roger Jang, 'Speech and Audio Processing (SAP) Toolbox', available at 'http://mirlab.org/jang/matlab/toolbox/sap'. [19] Jyh-Shing Roger Jang, Matlab Toolbox: 'Utility Toolbox', available at 'http://mirlab.org/jang/matlab/toolbox/utility'. [20] TIMIT Speech Corpus Introduction, available at 'https://catalog.ldc.upenn.edu/docs/LDC93S1/readme.txt'.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/49897	-
dc.description.abstract	語句發音標準與否是溝通上重要的一環，也與欲表達的意義有著密不可分的關係，不同但類似的發音可能代表了不同的含意，因此，發音的標準性在語言學習中有其重要的地位。本論文中共分為兩大部份，分別為利用類神經網路模型來分類音素以及利用類神經網路模型的分類結果來進行英語語句的評分，並建立一套以類神經網路模型為基底的英文語音評分系統，藉以達到電腦輔助語言學習之功用。類神經網路及深度學習部分，本論文利用了MFCC特徵及filter-bank特徵來比較其在深度學習中的效果，同時也測試了多種類神經網路的參數組合，在找出對於訓練資料集比較合適的參數組合後便會以大維度特徵來進一步實驗，最終的實驗結果以使用了MFCC的大維度特徵為最好，其類神經網路模型的音素辨識率可達73.33 %。語音評分的部分，本論文以HMM-GMM為基底的語音評分系統來當作比較及改善的對像，本論文提出了max-gap評分方法與adaptive-k評分方法以利用類神經網路模型的輸出結果來進行語音評分。在語音評分上的測試結果顯示，adaptive-k評分方法相較於以HMM-GMM為基底的語音評分系統在短句評分中有較好的表現，但在長句評分中仍待改善，整體而言，adaptive-k評分方法相對於以HMM-GMM為基底的語音評分系統仍有所改進。	zh_TW
dc.description.abstract	Pronunciation plays an important role in communication. Similar but different pronunciations may lead to different meanings. Therefore, correct pronunciation is a very important part of language learning. The thesis is divided into two parts. The first part describes the use of deep neural networks (DNN) to classify phonemes. The second part explain how we can use the DNN output to perform speech assessment. Building a DNN-based speech assessment system is the main goal of this thesis. In terms of the use of DNN, we have compared the features of MFCC and Mel-filter bank coefficients. Moreover, we have tried a number of DNN configurations in order to find the best setting. Our main finding is that large-dimension features can give better accuracy. In our experiments, the best recognition rate of DNN models can be as high as 73.33% using large-dimension MFCC features. In terms of speech assessment, we have proposed two methods, max-gap and adaptive-k, to use the DNN’s output for speech assessment. A conventional HMM-GMM based speech assessment system is regard as a baseline. Our experiments demonstrate that, adaptive-k outperforms HMM-GMM for short sentence assessment. For long sentences, adaptive-k and HMM-GMM have comparable performance. In general, adaptive-k is still better than HMM-GMM for speech assessment.	en
dc.description.provenance	Made available in DSpace on 2021-06-15T12:25:52Z (GMT). No. of bitstreams: 1 ntu-105-R03944018-1.pdf: 3734292 bytes, checksum: 3a208b040fe9f2f8886ab232a19ae161 (MD5) Previous issue date: 2016	en
dc.description.tableofcontents	口試委員會審定書 I 摘要 II Abstract III 目錄 IV 圖目錄 VI 表目錄 VIII Chapter 1 緒論 1 1.1 研究動機 1 1.2 研究介紹 2 1.2.1 音素 2 1.2.2 英語的語音系統 3 1.3 研究目標 4 1.4 論文架構 5 Chapter 2 文獻探討 6 2.1 相關研究 6 2.1.1 深度學習與類神經網路模型訓練之相關研究 6 2.1.2 語音評分之相關研究 7 2.2 以HMM-GMM為基底之語音評分系統簡介 8 2.2.1 系統架構 8 2.2.2 評分方法 10 Chapter 3 研究方法 12 3.1 語音評分系統 12 3.1.1 問題定義 12 3.1.2 系統架構 12 3.1.3 前置處理 13 3.1.4 評分要點與驗證方式 13 3.2 類神經網路 15 3.2.1 處理過擬合 15 3.2.2 類神經網路模型的重訓練 16 3.2.3 類神經網路架構 17 3.3 語音評分之方法 19 3.3.1 對數機率與基本評分方法 19 3.3.2 Max-gap評分方法 22 3.3.3 好的預測與壞的預測 25 3.3.4 Adaptive-k 評分方法 27 Chapter 4 實驗結果與討論分析 30 4.1 類神經網路模型的訓練與實驗 30 4.1.1 語料庫介紹 30 4.1.2 語句的選擇 32 4.1.3 語音特徵 32 4.1.4 訓練參數與設定 33 4.1.5 類神經網路模型訓練之實驗結果 35 4.1.6 加入丟棄法之實驗結果 38 4.1.7 大維度特徵之實驗結果 40 4.2 語音評分的實驗 41 4.2.1 語料庫與實驗參數 41 4.2.2 語音評分實驗結果 43 4.2.3 問題討論與錯誤分析 46 4.2.4 效能改進 51 Chapter 5 結論與未來展望 55 5.1 結論 55 5.2 未來展望 56 參考文獻 57
dc.language.iso	zh-TW
dc.title	使用深度學習以改善語音評分之方法與比較	zh_TW
dc.title	Improving Speech Assessment Using Deep Neural Networks	en
dc.type	Thesis
dc.date.schoolyear	104-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	李宏毅(Hung-Yi Lee),王新民(Hsin-Min Wang)
dc.subject.keyword	類神經網路,語音評分,發音評分,電腦輔助語言學習,口說發音輔助學習,	zh_TW
dc.subject.keyword	neural network,speech assessment,pronunciation scoring,computer assisted language learning (CALL),computer assisted pronunciation training (CAPT),	en
dc.relation.page	58
dc.identifier.doi	10.6342/NTU201602150
dc.rights.note	有償授權
dc.date.accepted	2016-08-11
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊網路與多媒體研究所	zh_TW
Appears in Collections:	資訊網路與多媒體研究所

Files in This Item:

File	Size	Format
ntu-105-1.pdf Restricted Access	3.65 MB	Adobe PDF

Show simple item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets