利用深度學習強化口述語彙偵測系統

Chia-Hsing Hsu; 許家興

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/59536

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	李宏毅(Hung-Yi Lee)
dc.contributor.author	Chia-Hsing Hsu	en
dc.contributor.author	許家興	zh_TW
dc.date.accessioned	2021-06-16T09:27:03Z	-
dc.date.available	2017-06-12
dc.date.copyright	2017-06-12
dc.date.issued	2017
dc.date.submitted	2017-05-23
dc.identifier.citation	[1] Ciprian Chelba, Timothy J Hazen, and Murat Saraclar, “Retrieval and browsing of spoken content,” IEEE Signal Processing Magazine, vol. 25, no. 3, 2008. [2] Lin-shan Lee and Berlin Chen, “Spoken document understanding and organization,” IEEE Signal Processing Magazine, vol. 22, no. 5, pp. 42–60, 2005. [3] Murat Saraclar and Richard Sproat, “Lattice-based search for spoken utterance retrieval,” URBANA, vol. 51, 2004. [4] Cyril Allauzen, Mehryar Mohri, , and Murat Saraclar, “Proceedings of general indexation of weighted automata: application to spoken utterance retrieval,” in the Workshop on Interdisciplinary Approaches to Speech Indexing and Retrieval at HLT- NAACL, Stroudsburg, PA, USA, 2004, Association for Computational Linguistics, pp. 33–40, SpeechIR 04. [5] Oya Aran, Ismail Ari, Lale Akarun, Erinc Dikici, Siddika Parlak, Murat Saraclar, Pavel Campr, , and Marek Hruz, “Speech and sliding text aided sign retrieval from hearing impaired sign news videos,” Multimodal User Interfaces, vol. 2, no. 2, pp. 117–131, 2008. [6] S. Parlak and M. Saraclar, “Spoken term detection for turkish broadcast news,” ICASSP, pp. 5244–5247., 2008. [7] E. Arisoy, D. Can, S. Parlak, H. Sak, and M. Saraclar, “Turkish broadcast news transcription and retrieval,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 17, no. 5, pp. 874 883, 2009. [8] S.ParlakandM. Saraclar,“Performance analysis and improvement of Turkish broad- cast news retrieval,” IEEE Transactions on Audio ,Speech, and Language Processing, vol. 20, no. 3, pp. 731–741, 2012. [9] D. Can and M. Saraclar, “Lattice indexing for spoken term detection,” IEEE Trans- actions on Audio, Speech, and Language Processing, vol. 19, no. 8, pp. 2338–2347, 2011. [10] Chao Liu, Dong Wang, and Javier Tejedor, “N-gram fst indexing for spoken term detection,” in INTERSPEECH, 2012, ISCA. [11] T. Hori, I.L. Hetherington, T.J. Hazen, and J.R. Glass, “Open vocabulary spoken utterance retrieval using confusion networks,” in ICASSP, vol. 4, pp. IV–73–IV76, 2007. [12] D. Can, E. Cooper, A. Sethy, C. White, B. Ramabhadran, and M. Saraclar, “Effect of pronunciations on oov queries in spoken term detection,” in ICASSP, pp. 3957– 3960, 2009. [13] C. Parada, A. Sethy, and B. Ramabhadran, “Balancing false alarms and hits in spoken term detection,” in ICASSP, pp. 5286 5289, 2010. [14] Carolina Parada, Abhinav Sethy, Mark Dredze, and Frederick Jelinek, “A spoken term detection framework for recovering out-of vocabulary words using the web,” in INTERSPEECH, pp. 1269–1272, 2010. [15] Julien Fayolle, Murat Saraclar, Fabienne Moreau, Christian Raymond, and Guillaume Gravier, “Lexical-phonetic automata for spoken utterance indexing and retrieval,” in INTERSPEECH, 2012, ISCA. [16] Po-Chih Lin, “Hybrid word/sub-word based spoken term detection with text/spoken queries using weighted finite state transducers,” MASTER THESIS, 2013. [17] DE Rumelhart, GE Hinton, and RJ Williams, “Learning representations by back- propagating errors,” Cognitive modeling, 1998. [18] Burges Chris, Shaked Tal, Renshaw Erin, Lazier Ari, Deeds Matt, Hamilton Nicole, and Hullender Greg, “Learning to rank using gradient descent,” in Proceedings of the 22nd international conference on Machine learning. ACM, 2005, pp. 89–96. [19] Ning Qian, “On the momentum term in gradient descent learning algorithms,” Neural networks, vol. 12, no. 1, pp. 145–151, 1999. [20] Rivarol Vergin,Douglas O’shaughnessy, and Azarshid Farhat,“Generalized mel frequency cepstral coefficients for large vocabulary speaker-independent continuous- speech recognition,” IEEE Transactions on Speech and Audio Processing, vol. 7, no. 5, pp. 525–532, 1999. [21] Vassil Panayotov, Guoguo Chen, Daniel Povey, and Sanjeev Khudanpur, “Librispeech: an asr corpus based on public domain audio books,” in Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on. IEEE, 2015, pp. 5206–5210. [22] Douglas B Paul and Janet M Baker, “The design for the wall street journal-based csr corpus,” in Proceedings of the workshop on Speech and Natural Language. Association for Computational Linguistics, 1992, pp. 357–362. [23] Tomas Mikolov, Martin Karafia ́t, Lukas Burget, Jan Cernocky`, and Sanjeev Khudanpur, “Recurrent neural network based language model.,” in Interspeech, 2010, vol. 2, p. 3. [24] Kyunghyun Cho, Bart Van Merrie ̈nboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio, “Learning phrase representations using rnn encoder-decoder for statistical machine translation,” arXiv preprint arXiv:1406.1078, 2014. [25] Tomas Mikolov, Wen-tau Yih, and Geoffrey Zweig, “Linguistic regularities in continuous space word representations.,” in HLT NAACL, 2013, vol. 13, pp. 746–751. [26] Paul J Werbos, “Backpropagation through time: what it does and how to do it,” Proceedings of the IEEE, vol. 78, no. 10, pp. 1550–1560, 1990. [27] Has ̧im Sak, Andrew Senior, and Franc ̧oise Beaufays, “Long short-term memory based recurrent neural network architectures for large vocabulary speech recognition,” arXiv preprint arXiv:1402.1128, 2014. [28] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, 2012, pp. 1097 1105. [29] Chris Burges, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole Hamilton, and Greg Hullender, “Learning to rank using gradient descent,” in Proceedings of the 22nd international conference on Machine learning. ACM, 2005, pp. 89–96. [30] Christopher JC Burges, Robert Ragno, and Quoc Viet Le, “Learning to rank with nonsmooth cost functions,” in NIPS, 2006, vol. 6, pp. 193–200. [31] Ming-Feng Tsai, Tie-Yan Liu, Tao Qin, Hsin-Hsi Chen, and Wei-Ying Ma, “Frank: a ranking method with fidelity loss,” in Proceedings of the 30th annual inter- national ACM SIGIR conference on Research and development in information re- trieval. ACM, 2007, pp. 383–390.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/59536	-
dc.description.abstract	近年來隨著科技發達,人們記錄著有價值的錄音、影音檔也隨著科技進步,儲存的越來越多。在語音文件檢索中最重要的關鍵技術即為口語詞彙偵測(spoken term detection),其目的是從語音文件中找到完全相符於使用者輸入的查尋詞(query term)。為了讓人們更容易查詢到自己想得到的資料,先利用語音辨識技術,再藉由以自動的方式對於其內含的語音資訊建立起全文索引與檢索的機制。本篇論文主旨在於固定的語音辨識的系統下,藉由深度類神經網路的知識對檢索資料重新打分數與使用成對學習法使系統有排序的知識,使得檢索效果增加。	zh_TW
dc.description.provenance	Made available in DSpace on 2021-06-16T09:27:03Z (GMT). No. of bitstreams: 1 ntu-106-R03942100-1.pdf: 9742747 bytes, checksum: 209efe5bf39832bb5c53a3f53c169570 (MD5) Previous issue date: 2017	en
dc.description.tableofcontents	口試委員會審定書........ .......................... i 誌謝................ .......................... ii 中文摘要....................................... iii 一、導論....................................... 1 1.1 研究背景.................................. 1 1.2 研究動機與方向.............................. 2 1.3 章節安排.................................. 2 二、背景知識 .................................... 4 2.1 資訊檢索與語音資訊檢索......................... 4 2.1.1 資訊檢索.............................. 4 2.1.2 語音資訊檢索 ........................... 5 2.1.3 詞圖、唯一最佳序列與N最佳序列................ 6 2.1.4 資訊檢索評估機制 ........................ 8 2.2 加權有限狀態轉換器的語音資訊檢索 .................. 10 2.2.1 簡介與系統架構.......................... 10 2.3 類神經網路................................. 15 2.3.1 簡介 ................................ 15 2.3.2 類神經網路的運作原理...................... 15 2.3.3 訓練類神經網路.......................... 17 2.4 本章總結.................................. 21 三、基於類神經網路的口述詞彙偵測 ....................... 22 3.1 簡介..................................... 22 3.1.1 改進動機.............................. 22 3.2 系統架構.................................. 23 3.3 類神經網路的檢索模型 .......................... 24 3.3.1 聲學特徵抽取與查詢詞的特徵向量表示法 ........... 25 3.4 實驗結果與分析 .............................. 28 3.4.1 實驗設定.............................. 28 3.4.2 基準實驗.............................. 29 3.4.3 實驗結果與分析.......................... 30 3.5 本章總結.................................. 32 四、基於遞迴式神經網路與卷積神經網路的口述詞彙偵測 . . . . . . . . . . . 33 4.1 簡介..................................... 33 4.2 遞迴式神經網路 .............................. 33 4.3 沿時間反向傳播演算法 .......................... 35 4.4 長短期記憶神經網絡 ........................... 37 4.5 利用遞迴式神經網路的查詢詞特徵向量表示法............. 39 4.6 卷積神經網路 ............................... 40 4.7 基於遞迴式神經網路與卷積神經網路的口述詞彙偵測 . . . . . . . . . 41 4.8 實驗與分析................................. 42 4.8.1 實驗設定.............................. 42 4.8.2 基準實驗.............................. 42 4.8.3 實驗結果與分析.......................... 43 4.9 本章總結.................................. 45 五、使用成對學習方法的口述詞彙偵測...................... 47 5.1 簡介..................................... 47 5.2 成對學習方法(Pairwise learning)..................... 47 5.3 使用成對學習方法訓練檢索模型..................... 49 5.4 實驗與分析................................. 50 5.4.1 實驗設定.............................. 50 5.4.2 實驗結果與分析.......................... 51 5.5 本章總結.................................. 52 六、結論與展望 ................................... 53 6.1 結論與展望................................. 53 6.1.1 結論 ................................ 53 6.1.2 展望 ................................ 53 參考文獻....................................... 55
dc.language.iso	zh-TW
dc.title	利用深度學習強化口述語彙偵測系統	zh_TW
dc.title	Enhanced Spoken Term Detection by Deep learning	en
dc.type	Thesis
dc.date.schoolyear	105-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	曹昱(Yu Taso),賴穎暉(Ying-Hui Lai),陳縕儂(Yun-Nung Chen)
dc.subject.keyword	關鍵詞檢索,深度學習,	zh_TW
dc.subject.keyword	spoken term detection,deep learning,	en
dc.relation.page	59
dc.identifier.doi	10.6342/NTU201700831
dc.rights.note	有償授權
dc.date.accepted	2017-05-24
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	電信工程學研究所	zh_TW
顯示於系所單位：	電信工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-106-1.pdf 目前未授權公開取用	9.51 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。