考慮發聲特徵用於個人化電腦輔助發音訓練之對話遊戲

Chuan-Hsun Wu; 吳全勳

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/4893

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	李琳山(Lin-Shan Lee)
dc.contributor.author	Chuan-Hsun Wu	en
dc.contributor.author	吳全勳	zh_TW
dc.date.accessioned	2021-05-14T17:49:49Z	-
dc.date.available	2015-09-17
dc.date.available	2021-05-14T17:49:49Z	-
dc.date.copyright	2015-09-17
dc.date.issued	2015
dc.date.submitted	2015-08-24
dc.identifier.citation	[1] Michael Levy, Computer-Assisted Language Learning: Context and Conceptualization., ERIC, 1997. [2] Simon King and Paul Taylor, “Detection of phonological features in continuous speech using neural networks,” Computer Speech & Language, vol. 14, no. 4, pp. 333–353, 2000. [3] Hua Yuan, Ji Xu, Junhong Zhao, and Jia Liu, “Improve low-resource non-native mispronunciation detection with native speech by articulatory-based tandem feature,” in Signal and Information Processing (ChinaSIP), 2013 IEEE China Summit & International Conference on. IEEE, 2013, pp. 127–131. [4] Abhijeet Sangwan and John HL Hansen, “Automatic analysis of mandarin accented english using phonological features,” Speech Communication, vol. 54, no. 1, pp. 40–54, 2012. [5] Chung-Hsien Wu, Han-Ping Shen, and Yan-Ting Yang, “Phone set construction based on context-sensitive articulatory attributes for code-switching speech recognition,” in Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on. IEEE, 2012, pp. 4865–4868. [6] “Plato,” http://en.wikipedia.org/wiki/PLATO_(computer_system). [7] “Duolingo,” http://www.duolingo.com. 61 [8] “Rosetta stone,” http://www.rosettastone.com. [9] “English town,” http://www.englishtown.com.tw. [10] Maxine Eskenazi, “An overview of spoken language technology for education,” Speech Communication, vol. 51, no. 10, pp. 832–844, 2009. [11] Ambra Neri, Catia Cucchiarini, and Helmer Strik, “The effectiveness of computerbased speech corrective feedback for improving segmental quality in l2 dutch,” Re- CALL, vol. 20, no. 02, pp. 225–243, 2008. [12] SilkeMWitt and Steve J Young, “Phone-level pronunciation scoring and assessment for interactive language learning,” Speech communication, vol. 30, no. 2, pp. 95– 108, 2000. [13] Silke Maren Witt, Use of speech recognition in computer-assisted language learning, Ph.D. thesis, University of Cambridge, 1999. [14] Alissa M Harrison, Wai-Kit Lo, Xiaojun Qian, and Helen Meng, “Implementation of an extended recognition network for mispronunciation detection and diagnosis in computer-assisted pronunciation training.,” in SLaTE, 2009, pp. 45–48. [15] Masayuki Suzuki, Yu Qiao, Nobuaki Minematsu, and Keikichi Hirose, “Pronunciation proficiency estimation based on multilayer regression analysis using speakerindependent structural features,” Second Language Studies: Acquisition, Learning, Education and Technology, vol. 17, pp. 18, 2010. 62 [16] Nobuaki Minematsu, “Yet another acoustic representation of speech sounds,” in Acoustics, Speech, and Signal Processing, 2004. Proceedings.(ICASSP’04). IEEE International Conference on. IEEE, 2004, vol. 1, pp. I–585. [17] Masayuki Suzuki, Yu Qiao, Nobuaki Minematsu, and Keikichi Hirose, “Integration of multilayer regression analysis with structure-based pronunciation assessment.,” in INTERSPEECH, 2010, pp. 586–589. [18] Tongmu Zhao, Akemi Hoshino, Masayuki Suzuki, Nobuaki Minematsu, and Keikichi Hirose, “Automatic chinese pronunciation error detection using svm trained with structural features.,” in SLT, 2012, pp. 473–478. [19] Olaf Husby, A° sta Øvregaard, Preben Wik, Øyvind Bech, Egil Albertsen, Sissel Nefzaoui, Eli Skarpnes, and Jacques C Koreman, “Dealing with l1 background and l2 dialects in norwegian capt.,” in SLaTE, 2011, pp. 133–136. [20] Yow-BangWang and Lin-Shan Lee, “Improved approaches of modeling and detecting error patterns with empirical analysis for computer-aided pronunciation training,” in Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on. IEEE, 2012, pp. 5049–5052. [21] “Ntu chinese,” http://chinese.ntu.edu.tw, 2009. [22] Donna Christian et al., Profiles in Two-Way Immersion Education. Language in Education: Theory and Practice 89., ERIC, 1997. 63 [23] Diane J Litman and Scott Silliman, “Itspoke: An intelligent tutoring spoken dialogue system,” in Demonstration papers at HLT-NAACL 2004. Association for Computational Linguistics, 2004, pp. 5–8. [24] Joakim Gustafson, “Developing multimodal spoken dialogue systems: Empirical studies of spoken human–computer interaction,” 2002. [25] Steve Young, Milica Gasic, Blaise Thomson, and Jason D Williams, “Pomdp-based statistical spoken dialog systems: A review,” Proceedings of the IEEE, vol. 101, no. 5, pp. 1160–1179, 2013. [26] Teruhisa Misu, Komei Sugiura, Kiyonori Ohtake, Chiori Hori, Hideki Kashioka, Hisashi Kawai, and Satoshi Nakamura, “Modeling spoken decision making dialogue and optimization of its dialogue strategy,” in Proceedings of the 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue. Association for Computational Linguistics, 2010, pp. 221–224. [27] Sungjin Lee and Maxine Eskenazi, “Incremental sparse bayesian method for online dialog strategy learning,” Selected Topics in Signal Processing, IEEE Journal of, vol. 6, no. 8, pp. 903–916, 2012. [28] Farzad Ehsani, Jared Bernstein, and Amir Najmi, “An interactive dialog system for learning japanese,” Speech Communication, vol. 30, no. 2, pp. 167–177, 2000. [29] Antoine Raux and Maxine Eskenazi, “Using task-oriented spoken dialogue systems for language learning: potential, practical applications and challenges,” in InSTIL/ICALL Symposium 2004, 2004. 64 [30] Antoine Raux and Maxine Eskenazi, “Non-native users in the let’s go!! spoken dialogue system: Dealing with linguistic mismatch.,” in HLT-NAACL. Citeseer, 2004, pp. 217–224. [31] Antoine Raux, Brian Langner, Alan W Black, and Maxine Eskenazi, “Let’s go: Improving spoken dialog systems for the elderly and non-native,” in in Eurospeech03. Citeseer, 2003. [32] W Lewis Johnson, “Serious use of a serious game for language learning,” Frontiers in Artificial Intelligence and Applications, vol. 158, pp. 67, 2007. [33] Yushi Xu et al., Language technologies in speech-enabled second language learning games: From reading to dialogue, Ph.D. thesis, Massachusetts Institute of Technology, 2012. [34] Stephanie Seneff, Chao Wang, and Chih-yu Chao, “Spoken dialogue systems for language learning,” in Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations. Association for Computational Linguistics, 2007, pp. 13–14. [35] Chih-yu Chao, Stephanie Seneff, and Chao Wang, “An interactive interpretation game for learning chinese.,” in SLaTE. Citeseer, 2007, pp. 41–44. [36] Chao Wang and Stephanie Seneff, “A spoken translation game for second language learning,” FRONTIERS IN ARTIFICIAL INTELLIGENCE AND APPLICATIONS, vol. 158, pp. 315, 2007. 65 [37] “國語注音符號第二式,” http://www.edu.tw/files/site_content/ M0001/er/c1.htm. [38] Bruce Hayes, Introductory phonology, vol. 32, John Wiley & Sons, 2011. [39] Roman Jakobson and Morris Halle, “Fundamentals of language.,” 1956. [40] Noam Chomsky and Morris Halle, “The sound pattern of english.,” 1968. [41] Vikramjit Mitra, Hosung Nam, Carol Y Espy-Wilson, Elliot Saltzman, and Louis Goldstein, “Retrieving tract variables from acoustics: a comparison of different machine learning strategies,” Selected Topics in Signal Processing, IEEE Journal of, vol. 4, no. 6, pp. 1027–1045, 2010. [42] Benigno Uria, Steve Renals, and Korin Richmond, “A deep neural network for acoustic-articulatory speech inversion,” 2011. [43] Vikramjit Mitra, Wen Wang, Andreas Stolcke, Hosung Nam, Colleen Richey, Jiahong Yuan, and Mark Liberman, “Articulatory trajectories for large-vocabulary speech recognition,” in Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 2013, pp. 7145–7149. [44] Vikramjit Mitra, Ganesh Sivaraman, Hosung Nam, Carol Espy-Wilson, and Elliot Saltzman, “Articulatory features from deep neural networks and their role in speech recognition,” in Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on. IEEE, 2014, pp. 3017–3021. [45] 毛世楨and 葉軍, 對外漢語教學語音測試研究, 中國科學社會出版社, 2002. [46] Andrew G Barto, Reinforcement learning: An introduction, MIT press, 1998. 66 [47] Leslie Pack Kaelbling, Michael L Littman, and Andrew W Moore, “Reinforcement learning: A survey,” Journal of artificial intelligence research, pp. 237–285, 1996. [48] Olivier Sigaud and Olivier Buffet, Markov decision processes in artificial intelligence, John Wiley & Sons, 2013. [49] R Bellmann, “Dynamic programming,” Princeton, NJ, 1957. [50] William TB Uther and ManuelaMVeloso, “Tree based discretization for continuous state space reinforcement learning,” in Aaai/iaai, 1998, pp. 769–774. [51] Matthieu Geist and Olivier Pietquin, “Parametric value function approximation: A unified view,” in Adaptive Dynamic Programming And Reinforcement Learning (ADPRL), 2011 IEEE Symposium on. IEEE, 2011, pp. 9–16. [52] N Roy and JP How, “A tutorial on linear function approximators for dynamic programming and reinforcement learning,” 2013. [53] Pei-hao Su, Chuan-hsun Wu, and Lin-shan Lee, “A recursive dialogue game for personalized computer-aided pronunciation training,” . [54] Senthilkumar Chandramohan, Matthieu Geist, and Olivier Pietquin, “Optimizing spoken dialogue management with fitted value iteration.,” in INTERSPEECH, 2010, pp. 86–89. [55] Andras Antos, Csaba Szepesv´ari, and R´emi Munos, “Fitted q-iteration in continuous action-space mdps,” in Advances in neural information processing systems, 2008, pp. 9–16. 67 [56] AM Farahmand, Mohammad Ghavamzadeh, Csaba Szepesv´ari, and Shie Mannor, “Regularized fitted q-iteration for planning in continuous-space markovian decision problems,” in American Control Conference, 2009. ACC’09. IEEE, 2009, pp. 725– 730. [57] John Neter, Michael H Kutner, Christopher J Nachtsheim, and William Wasserman, Applied linear statistical models, vol. 4, Irwin Chicago, 1996. [58] Anthony R Cassandra, Leslie Pack Kaelbling, and Michael L Littman, “Acting optimally in partially observable stochastic domains,” in AAAI, 1994, vol. 94, pp. 1023–1028. [59] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/4893	-
dc.description.abstract	本論文提出了一套在電腦輔助語言學習 (Computer-Assisted Language Learning,CALL) 中考慮發聲特徵 (Articulatory Feature) 之對話遊戲 (Dialogue Game) 架構。本論文中使用自動發音評量系統與餐廳情境對話之劇本，並利用連續狀態馬可夫決策程序(Markov Decision Process, MDP) 作為系統之模型，並以增強式學習 (Reinforcement Learning, RL) 訓練出系統之對化管理決策。此外，本論文亦採用由真實學習者語料庫，包括華語教師標註之發音偏誤類型 (Pronunciation Error Pattern)，訓練得到之學習者模擬模型，來產生模擬學習者來訓練系統模型。過去相關研究少有發聲特徵結合電腦輔助語言學習的思考，本論文特提出了此全新構想。主要考量來自前人的作品中由於永遠有若干低頻發音單位，若學習者說不好，系統將必須耗費相對多練習回合，以實際練習到這些低頻的發音單位。為改善此現象，本論文考慮以下重要假設：當某一發音單位出現頻率極低時，練習與該單位有高比例相同發聲特徵之其他發音單位，亦可視為一種虛擬而有進步效果之練習。此一假設為本論文之基礎，雖然吾人並不曾有機會在實驗中證實此假設成立。因此本論文結合發聲特徵設定，希望以此虛擬練習次數之設定，彌補在前人系統中上述的缺陷。本論文中建構出考量發聲特徵之華語學習對話樹遊戲，訓練系統適性提供練習對話語句給予不同發音情況的學習者。並當語句缺乏某發音單位時，可以其他有高比例發聲特徵相同的發音單位，作為替代的虛擬練習，亦可進一步給予不同發聲特徵不同權重，此設計使系統更專注於學習者表現不佳或練習不足之發音單位，或練習該發音單位中高比例的發聲特徵之組合，以提供較多練習機會於這些發音單位。實驗證實與分析顯示本論文中所提出方法之有其成效並可行，如果上述假設可以成立。	zh_TW
dc.description.abstract	In this thesis we propose a new dialogue game framework considering Articulatory Features (AFs) for personalized Computer-Assisted Language Learning (CALL). We use an automatic pronunciation evaluator and a set of dialogue scripts for reastaurant scenarios, with policy for selecting learning sentence trained by Reinforcement Learning (RL), based on continuous state Markov Decision Process (MDP) as the system’s model, We utilize a corpus of real learner data, including pronunciation Error Patterns (EP) annotated by Mandarin teachers, to train a learner simulation model, in order to produce a huge quantity of simulated learners for MDP training. This thesis proposes a new concept of considering Articulatory Features (AFs) in a dialogue game for Computer-Assisted Language Learning (CALL). In the previous work, the learner has to go through longer dialogue paths (more dialogue turns) to practice some rare and ill-pronounced pronunciation units. Here the new approach is based on an important hypothesis: practicing other pronunciation unitswith highproportion of the same set of AFs of a considered rare unit, taken as ’pseudo practice’, can somehow offer improvement to the pronunciation of the considered rare unit. We further set different weights for different AFs within different pronunciation units, so as to have the system concentrated on those rare or ill-pronounced units. Experimental results verify the feasibility of the proposed framework based on the hypothesis above.	en
dc.description.provenance	Made available in DSpace on 2021-05-14T17:49:49Z (GMT). No. of bitstreams: 1 ntu-104-R02922002-1.pdf: 8701276 bytes, checksum: fb747b6f9bb73253a421abb496223768 (MD5) Previous issue date: 2015	en
dc.description.tableofcontents	口試委員會審定書 i 中文摘要 ii 英文摘要 iii 一、導論 1 1.1 研究動機 1 1.2 相關研究 2 1.3 研究方向與貢獻 4 1.4 章節安排 5 二、背景知識 7 2.1 音位與音素 7 2.2 國際音標 7 2.3 華語語音介紹 9 2.3.1 聲母與韻母 9 2.3.2 聲調 10 2.4 發聲特徵分類 12 2.4.1 二元特徵 12 2.4.2 多值特徵 13 2.4.3 發聲軌跡 13 2.5 發音偏誤類型 14 2.6 增強式學習 14 2.6.1 馬可夫決策程序 (Markov Decision Process) 模型 14 2.6.2 連續狀態馬可夫決策程序模型 16 三、實驗語料庫 20 3.1 樹狀對話劇本集 20 3.2 真實華語學習者語料庫 21 3.3 華語教師偏誤標註與轉換 23 四、考慮發聲特徵之對話遊戲架構設計 27 4.1 前人系統 27 4.1.1 系統架構 27 4.1.2 系統原理 29 4.1.3 前作之結果 30 4.2 本論文系統 30 4.2.1 系統原理 31 4.2.2 模擬學習者 31 4.2.3 以高斯混合模型建構之學習者模型 33 4.2.4 訓練與測試 34 4.2.5 模擬階段 34 4.3 連續狀態馬可夫決策程序模型 35 4.3.1 模型之參數 35 4.3.2 模型之訓練演算法 37 4.4 虛擬練習 42 4.4.1 虛擬練習次數計算 42 4.4.2 結合權重式發聲特徵 44 4.4.3 權重設定 45 五、實驗結果與分析 46 5.1 無權重發聲特徵之實驗與分析 46 5.2 結合權重式發聲特徵之實驗與分析 53 六、結論與展望 59 6.1 總結 59 6.2 未來研究方向 60 6.2.1 部分可觀測馬可夫決策程序 60 6.2.2 深層Q-神經網路 60 參考文獻 61 附錄 69
dc.language.iso	zh-TW
dc.title	考慮發聲特徵用於個人化電腦輔助發音訓練之對話遊戲	zh_TW
dc.title	Dialogue Game Considering Articulatory Features for Personalized Computer-Aided Pronunciation Training	en
dc.type	Thesis
dc.date.schoolyear	103-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	王小川,李宏毅,簡仁宗,陳信宏,鄭秋豫
dc.subject.keyword	發聲特徵,電腦輔助語言學習,對話系統,	zh_TW
dc.subject.keyword	Articulatory Feature,Computer-Assisted Language Learning,Dialogue System,	en
dc.relation.page	71
dc.rights.note	同意授權(全球公開)
dc.date.accepted	2015-08-24
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-104-1.pdf	8.5 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。