Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電信工程學研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/59573
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor李琳山(Lin-Shan Lee)
dc.contributor.authorYen-Ju Luen
dc.contributor.author呂彥儒zh_TW
dc.date.accessioned2021-06-16T09:28:28Z-
dc.date.available2017-06-12
dc.date.copyright2017-06-12
dc.date.issued2017
dc.date.submitted2017-04-10
dc.identifier.citation[1] Xuedong D Huang, Yasuo Ariki, and Mervyn A Jack, Hidden Markov models for speech recognition, vol. 2004, Edinburgh university press Edinburgh, 1990.
[2] Paul Werbos, “Beyond regression: New tools for prediction and analysis in the behavioral sciences,” 1974.
[3] John Nickolls, Ian Buck, Michael Garland, and Kevin Skadron, “Scalable parallel programming with cuda,” Queue, vol. 6, no. 2, pp. 40–53, 2008.
[4] Geoffrey E Hinton, Simon Osindero, and Yee-Whye Teh, “A fast learning algorithm for deep belief nets,” Neural computation, vol. 18, no. 7, pp. 1527–1554, 2006.
[5] John Duchi, Elad Hazan, and Yoram Singer, “Adaptive subgradient methods for online learning and stochastic optimization,” Journal of Machine Learning Research, vol. 12, no. Jul, pp. 2121–2159, 2011.
[6] Anthony J Robinson, “An application of recurrent nets to phone probability estimation,” IEEE transactions on Neural Networks, vol. 5, no. 2, pp. 298–305, 1994.
[7] Sepp Hochreiter and Ju ̈rgen Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
[8] Cheng-Tao Chung, Chun-an Chan, and Lin-shan Lee, “Unsupervised discovery of linguistic structure including two-level acoustic patterns using three cascaded stages of iterative optimization,” in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2013, pp. 8081–8085.
[9] Cheng-Tao Chung, Chun-an Chan, and Lin-shan Lee, “Unsupervised spoken term detection with spoken queries by multi-level acoustic patterns with varying model granularity,” in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2014, pp. 7814–7818.
[10] Yun-Chiao Li, Hung-yi Lee, Cheng-Tao Chung, Chun-an Chan, and Lin-shan Lee, “Towards unsupervised semantic retrieval of spoken content with query expansion based on automatically discovered acoustic patterns,” in Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on. IEEE, 2013, pp. 198–203.
[11] Hung-yi Lee, Yun-Chiao Li, Cheng-Tao Chung, and Lin-shan Lee, “Enhancing query expansion for semantic retrieval of spoken content with automatically discovered acoustic patterns,” in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2013, pp. 8297–8301.
[12] Aren Jansen, Kenneth Church, and Hynek Hermansky, “Towards spoken term discovery at scale with zero resources.,” in INTERSPEECH, 2010, pp. 1676–1679.
[13] James Baker, “The dragon system–an overview,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 23, no. 1, pp. 24–29, 1975.
[14] Lawrence R Rabiner, “A tutorial on hidden markov models and selected applications in speech recognition,” Proceedings of the IEEE, vol. 77, no. 2, pp. 257–286, 1989.
[15] Sadaoki Furui, “Digital speech processing, synthesis, and recognition (revised and expanded),” Digital Speech Processing, Synthesis, and Recognition (Second Edition, Revised and Expanded), 2000.
[16] Janet M Baker, Li Deng, James Glass, Sanjeev Khudanpur, Chin-Hui Lee, Nelson Morgan, and Douglas O’Shaughnessy, “Developments and directions in speech recognition and understanding, part 1 [dsp education],” IEEE Signal Processing Magazine, vol. 26, no. 3, pp. 75–80, 2009.
[17] B-H Juang, “Maximum-likelihood estimation for mixture multivariate stochastic observations of markov chains,” AT&T technical journal, vol. 64, no. 6, pp. 1235–1249, 1985.
[18] Herve A Bourlard and Nelson Morgan, Connectionist speech recognition: a hybrid approach, vol. 247, Springer Science & Business Media, 2012.
[19] Abdel-rahman Mohamed, Tara N Sainath, George Dahl, Bhuvana Ramabhadran, Geoffrey E Hinton, and Michael A Picheny, “Deep belief networks using discriminative features for phone recognition,” in 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 2011, pp. 5060–5063.
[20] Abdel-rahman Mohamed, George E Dahl, and Geoffrey Hinton, “Acousticmodeling using deep belief networks,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 1, pp. 14–22, 2012.
[21] George E Dahl, Dong Yu, Li Deng, and Alex Acero, “Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 1, pp. 30–42, 2012.
[22] Tomas Mikolov, Stefan Kombrink, Anoop Deoras, Lukar Burget, and Jan Cernocky, “Rnnlm-recurrent neural network language modeling toolkit,” in Proc. of the 2011 ASRU Workshop, 2011, pp. 196–201.
[23] Tomas Mikolov, Anoop Deoras, Stefan Kombrink, Lukas Burget, and Jan Cernocky`, “Empirical evaluation and combination of advanced language modeling techniques.,” in INTERSPEECH, 2011, number s 1, pp. 605–608.
[24] Geoffrey E Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan R Salakhutdinov, “Improving neural networks by preventing co-adaptation of feature detectors,” arXiv preprint arXiv:1207.0580, 2012.
[25] Nitish Srivastava, Geoffrey E Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting.,” Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929–1958, 2014.
[26] Najim Dehak, Patrick J Kenny, Re ́da Dehak, Pierre Dumouchel, and Pierre Ouellet, “Front-end factor analysis for speaker verification,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 4, pp. 788–798, 2011.
[27] Kanishka Rao, Fuchun Peng, Has ̧im Sak, and Franc ̧oise Beaufays, “Grapheme-to-phoneme conversion using long short-term memory recurrent neural networks,” in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2015, pp. 4225–4229.
[28] Vikramjit Mitra, Ganesh Sivaraman, Hosung Nam, Carol Espy-Wilson, and Elliot Saltzman, “Articulatory features from deep neural networks and their role in speech recognition,” in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2014, pp. 3017–3021.
[29] Yebo Bao, Hui Jiang, Lirong Dai, and Cong Liu, “Incoherent training of deep neural networks to de-correlate bottleneck features for speech recognition,” in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2013, pp. 6980–6984.
[30] Dong Yu and Michael L Seltzer, “Improved bottleneck features using pretrained deep neural networks.,” in Interspeech, 2011, vol. 237, p. 240.
[31] Olivier Chapelle, Bernhard Scholkopf, and Alexander Zien,“Semi-supervisedlearn- ing (chapelle, o. et al., eds.; 2006)[book reviews],” IEEE Transactions on Neural Networks, vol. 20, no. 3, pp. 542–542, 2009.
[32] Sugato Basu, Arindam Banerjee, and Raymond Mooney, “Semi-supervised clustering by seeding,” in In Proceedings of 19th International Conference on Machine Learning (ICML-2002). Citeseer, 2002.
[33] Emily Denton, Sam Gross, and Rob Fergus, “Semi-supervised learning with context-conditional generative adversarial networks,” arXiv preprint arXiv:1611.06430, 2016.
[34] R Caruana, “Multitask learning: A knowledge-based source of inductive bias1,” in Proceedings of the Tenth International Conference on Machine Learning. Citeseer, pp. 41–48.
[35] Ronan Collobert and Jason Weston, “A unified architecture for natural language processing: Deep neural networks with multitask learning,” in Proceedings of the 25th international conference on Machine learning. ACM, 2008, pp. 160–167.
[36] Gokhan Tur, “Multitask learning for spoken language understanding,” in 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings. IEEE, 2006, vol. 1, pp. I–I.
[37] Xiao Li, Ye-Yi Wang, and Go ̈khan Tu ̈r, “Multi-task learning for spoken language understanding with shared slots.,” in INTERSPEECH, 2011, vol. 20, p. 1.
[38] Zhizheng Wu, Cassia Valentini-Botinhao, Oliver Watts, and Simon King, “Deep neural networks employing multi-task learning and stacked bottleneck features for speech synthesis,” in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2015, pp. 4460–4464.
[39] John S Garofolo et al., “Getting started with the darpa timit cd-rom: An acoustic phonetic continuous speech database,” National Institute of Standards and Technology (NIST), Gaithersburgh, MD, vol. 107, 1988.
[40] Jui-Ting Huang, Jinyu Li, Dong Yu, Li Deng, and Yifan Gong, “Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers,” in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2013, pp. 7304–7308.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/59573-
dc.description.abstract近年來語音辨識已經發展得相當完善,但是仍然倚賴著大量人力標註的語料進行模型訓練。考慮到事實上有相當豐富的語料因為無人標註而無法使用,本論文乃嘗試用非監督式(unsupervised)的方法,使用九種不同的由機器自動習得的聲學組型 (Automatically Discovered Acoustic Patterns)來增進語音辨識的效能,並且以更多的未標註語料加入模型訓練以減少所需標註語料的總量。
我們先以兩階段式的深層類神經網路進行半監督式訓練。第一階段的深層類神經網路為非監督式學習,使用大量的未標註語料以其自動習得的聲學組型抽取瓶頸特徵。再將該瓶頸特徵與聲學特徵向量串接,以較少的標註語料作為第二階段的監督式深層類神經網路的輸入進行訓練,並以此達到了優化語音辨識的目標,且在減少標注語料時仍然擁有接近的辨識結果。
其次我們也透過多目標的深層類神經網路訓練將三連音素之標註作為主要訓練目標,另將九種不同的自動習得之聲學組型作為次要訓練目標,以幫助基於三連音素標註的監督式訓練。實驗結果顯示比起單純的只用標註語料的監督式訓練均有進步,證明了自動習得的聲學組型對已標註的語料一樣可以提供幫助。最後,本論文結合上述兩者,實現兼有瓶頸特徵及多目標訓練的深層類神經網路,發現兩者可以相輔相成,達到最佳的實驗結果。
zh_TW
dc.description.abstractIn recent years, speech recognition has advanced considerably. However, it still depends on a huge amount of labeled speech corpus. In fact, there are lots of unlabeled corpus that can't be use. This thesis uses unsupervised learning to enhance speech recognition by using 9 different Automatically Discovered Acoustic Patterns, attempting to improve the result by bottleneck feature, semi-supervised learning, multi-target learning.
First we use a two-phase deep neural network on semi-supervised learning. The first-phase DNN is unsupervised learning. Using a huge amount of unlabeled corpus and their Automatically Discovered Acoustic Patterns to extract bottleneck feature. These bottleneck features are then combined with acoustic feature vector. In phase two, less labeled corpus are used to train the supervised DNN, which achieves the goal of improving speech recognition and in the mean time sustains similar results when using less labeled corpus.
In addition, we train the multi-target DNN for label and use those labels as the main target and the 9 different Automatically Discovered Acoustic Patterns as a secondary target to improve the supervised learning. The result of the experiment shows this method is useful compared to training which only uses labeled corpus, proving that Automatically Discovered Acoustic Patterns can help speech recognition when we have the labeled corpus. Finally, the thesis combined the two methods above, presenting DNN that contains bottleneck features and multi-target learning, we find that these two compliment each other, achieving the best result.
en
dc.description.provenanceMade available in DSpace on 2021-06-16T09:28:28Z (GMT). No. of bitstreams: 1
ntu-106-R03942063-1.pdf: 7141836 bytes, checksum: a38f3b0ceb34cc1baa54640aa6dc64e4 (MD5)
Previous issue date: 2017
en
dc.description.tableofcontents口試委員會審定書.................................. i
誌謝.......................................... ii
中文摘要....................................... iii
一、導論....................................... 1
1.1 研究動機.................................. 1
1.2 研究方向.................................. 2
1.3 章節安排.................................. 2
二、背景知識 .................................... 4
2.1 簡介..................................... 4
2.2 隱藏式馬可夫模型及維特比演算法 ................... 4
2.2.1 隱藏式馬可夫模型 ........................ 4
2.2.2 模型訓練.............................. 6
2.2.3 維特比演算法 ........................... 8
2.3 深層類神經網路 .............................. 10
2.3.1 簡介 ................................ 10
2.3.2 訓練方法.............................. 13
2.4 以非監督式方式自動發現之聲學組型 .................. 17
2.4.1 自動聲學組型架構 ........................ 18
2.4.2 訓練自動聲學組型 ........................ 18
2.4.3 聲學組型空間 ........................... 21
三、實驗語料庫及基準實驗 ............................ 24
3.1 TIMIT語料庫................................ 24
3.2 基準實驗.................................. 24
3.2.1 簡介 ................................ 24
3.2.2 特徵抽取.............................. 25
3.2.3 隱藏式馬可夫模型應用...................... 26
3.2.4 深層類神經網路應用 ....................... 27
3.2.5 辭典 ................................ 28
3.2.6 語言模型.............................. 29
3.2.7 解碼 ................................ 30
3.3 基準實驗結果 ............................... 31
四、深度神經網路瓶頸特徵 ............................ 33
4.1 多輸入特徵之深層類神經網路 ...................... 33
4.2 非監督式學習之瓶頸特徵......................... 34
4.2.1 瓶頸特徵抽取 ........................... 34
4.2.2 非監督式聲學組型 ........................ 34
4.3 共享隱藏層瓶頸特徵學習......................... 36
4.3.1 共享瓶頸特徵學習架構...................... 36
4.3.2 實驗結果.............................. 36
4.4 不共享隱藏層瓶頸特徵學習 ....................... 38
4.4.1 不共享瓶頸特徵學習架構 .................... 38
4.4.2 實驗結果.............................. 38
4.5 本章實驗比較與綜合分析......................... 39
4.6 本章總結.................................. 41
五、半監督式學習.................................. 43
5.1 半監督式學習 ............................... 43
5.2 未標註語料抽取特徵向量......................... 43
5.3 半監督式大資料學習 ........................... 45
5.3.1 大資料學習 ............................ 45
5.3.2 實驗結果與分析.......................... 45
5.4 半監督式小資料學習 ........................... 47
5.4.1 小資料學習 ............................ 47
5.4.2 實驗結果與分析.......................... 48
5.5 本章總結.................................. 50
六、多目標之深度神經網路 ............................ 51
6.1 相關研究.................................. 51
6.2 非監督式聲學組型之多目標深層神經網路 ............... 52
6.2.1 不共享參數之多目標神經網路.................. 54
6.2.2 共享參數之多目標神經網路 ................... 55
6.2.3 綜合比較與分析.......................... 57
6.3 大資料之多目標神經網路......................... 58
6.3.1 大資料多目標學習介紹...................... 58
6.3.2 大資料多目標學習架構...................... 59
6.3.3 結果與分析 ............................ 59
6.4 本章總結.................................. 60
七、多目標深度神經網路與瓶頸特徵結合 .................... 61
7.1 本章簡介.................................. 61
7.2 瓶頸特徵與多目標深層神經網路的結合................. 61
7.2.1 瓶頸特徵與多目標結合...................... 61
7.2.2 實驗結果與分析.......................... 62
7.3 綜合比較與分析 .............................. 63
7.3.1 大資料實驗結果比較 ....................... 63
7.3.2 主實驗結果比較.......................... 63
7.4 本章總結.................................. 66
八、結論與展望 ................................... 67
8.1 結論與貢獻................................. 67
8.2 未來展望.................................. 67
參考文獻....................................... 69
dc.language.isozh-TW
dc.subject半監督zh_TW
dc.subject半監督zh_TW
dc.subject非監督zh_TW
dc.subject語音辨識zh_TW
dc.subject機器學習zh_TW
dc.subject語音辨識zh_TW
dc.subject非監督zh_TW
dc.subject機器學習zh_TW
dc.subjectspeech recognitionen
dc.subjectunsuperviseden
dc.subjectsemi-superviseden
dc.subjectunsuperviseden
dc.subjectsemi-superviseden
dc.subjectmachine learningen
dc.subjectspeech recognitionen
dc.subjectmachine learningen
dc.title深層非監督式學習以提升語音辨識之效能zh_TW
dc.titleEnhancing Speech Recognition by Deep Unsupervised
Learning
en
dc.typeThesis
dc.date.schoolyear105-2
dc.description.degree碩士
dc.contributor.oralexamcommittee李宏毅(Hung-yi Lee),謝宏昀(Hung-Yun Hsieh)
dc.subject.keyword機器學習,語音辨識,非監督,半監督,zh_TW
dc.subject.keywordmachine learning,speech recognition,unsupervised,semi-supervised,en
dc.relation.page74
dc.identifier.doi10.6342/NTU201700741
dc.rights.note有償授權
dc.date.accepted2017-04-11
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept電信工程學研究所zh_TW
顯示於系所單位:電信工程學研究所

文件中的檔案:
檔案 大小格式 
ntu-106-1.pdf
  未授權公開取用
6.97 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved