Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電信工程學研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/67254
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor李宏毅
dc.contributor.authorChia-Wei Aoen
dc.contributor.author敖家維zh_TW
dc.date.accessioned2021-06-17T01:25:11Z-
dc.date.available2017-08-10
dc.date.copyright2017-08-10
dc.date.issued2017
dc.date.submitted2017-08-08
dc.identifier.citation[1] Ilya Sutskever, Oriol Vinyals, and Quoc V Le, “Sequence to sequence learning with neural networks,” in Advances in neural information processing systems, 2014, pp. 3104–3112.
[2] Yaodong Zhang and James R Glass, “Unsupervised spoken keyword spotting via segmental dtw on gaussian posteriorgrams,” in Automatic Speech Recognition & Understanding, 2009. ASRU 2009. IEEE Workshop on. IEEE, 2009, pp. 398–403.
[3] 沈昇勳, “藉助線上課程之自動結構化、分類與理解以提升學習效率,” 2016.
[4] Ciprian Chelba, Timothy J Hazen, and Murat Sarac¸lar, “Retrieval and browsing of spoken content,” Signal Processing Magazine, IEEE, vol. 25, no. 3, pp. 39–49,2008.
[5] Lin-shan Lee and Berlin Chen, “Spoken document understanding and organization,” Signal Processing Magazine, IEEE, vol. 22, no. 5, pp. 42–60, 2005.
[6] “Text retrieval conference,” Website, http://trec.nist.gov/.
[7] Murat Saraclar and Richard Sproat, “Lattice-based search for spoken utterance retrieval,” Urbana, vol. 51, pp. 61801, 2004.
[8] Jonathan Mamou, David Carmel, and Ron Hoory, “Spoken document retrieval from call-center conversations,” in Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2006,pp. 51–58.
[9] Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol, “Extracting and composing robust features with denoising autoencoders,” in Proceedings of the 25th international conference on Machine learning. ACM, 2008, pp.1096–1103.
[10] Jiwei Li, Minh-Thang Luong, and Dan Jurafsky, “A hierarchical neural autoencoder for paragraphs and documents,” arXiv preprint arXiv:1506.01057, 2015.
[11] Pierre Baldi, “Autoencoders, unsupervised learning, and deep architectures.,” ICML unsupervised and transfer learning, vol. 27, no. 37-50, pp. 1, 2012.
[12] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean, “Distributed representations of words and phrases and their compositionality,” in Advances in neural information processing systems, 2013, pp. 3111–3119.
[13] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean, “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013.
[14] Jeffrey Pennington, Richard Socher, and Christopher D Manning, “Glove: Global vectors for word representation.,” in EMNLP, 2014, vol. 14, pp. 1532–1543.
[15] Yu-An Chung, Chao-Chung Wu, Chia-Hao Shen, Hung-Yi Lee, and Lin-Shan Lee, “Audio word2vec: Unsupervised learning of audio segment representations using sequence-to-sequence autoencoder,” arXiv preprint arXiv:1603.00982, 2016.
[16] Stephen E Robertson, “The probability ranking principle in ir,” .
[17] Ian Ruthven and Mounia Lalmas, “A survey on the use of relevance feedback for information access systems,” The Knowledge Engineering Review, vol. 18, no. 02, pp. 95–145, 2003.
[18] EllenMVoorhees, “Query expansion using lexical-semantic relations,” in SIGIR’94. Springer, 1994, pp. 61–69.
[19] Jinxi Xu and W Bruce Croft, “Query expansion using local and global document analysis,” in Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 1996, pp. 4–11.
[20] Chun-an Chan and Lin-shan Lee, “Unsupervised spoken-term detection with spoken queries using segment-based dynamic time warping.,” in INTERSPEECH, 2010, pp. 693–696.
[21] Timothy J Hazen, Wade Shen, and Christopher White, “Query-by-example spoken term detection using phonetic posteriorgram templates,” in Automatic Speech Recognition & Understanding, 2009. ASRU 2009. IEEE Workshop on. IEEE, 2009, pp. 421–426.
[22] John S Garofolo, Cedric GP Auzanne, and Ellen M Voorhees, “The trec spoken document retrieval track: A success story.,” .
[23] Geoffrey E Hinton, Simon Osindero, and Yee-Whye Teh, “A fast learning algorithm for deep belief nets,” Neural computation, vol. 18, no. 7, pp. 1527–1554, 2006.
[24] David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams, “Learning representations by back-propagating errors,” Cognitive modeling, vol. 5, no. 3, pp. 1.
[25] Nitish Srivastava, Geoffrey E Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting.,” Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929–1958, 2014.
[26] Jeffrey L Elman, “Finding structure in time,” Cognitive science, vol. 14, no. 2, pp. 179–211, 1990.
[27] Paul J Werbos, “Backpropagation through time: what it does and how to do it,”Proceedings of the IEEE, vol. 78, no. 10, pp. 1550–1560, 1990.
[28] Felix Gers, Long short-term memory in recurrent neural networks, Ph.D. thesis, Universit¨at Hannover, 2001.
[29] Vassil Panayotov, Guoguo Chen, Daniel Povey, and Sanjeev Khudanpur, “Librispeech: an asr corpus based on public domain audio books,” in Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on. IEEE, 2015, pp. 5206–5210.
[30] Douglas B Paul and Janet M Baker, “The design for the wall street journal-based csr corpus,” in Proceedings of the workshop on Speech and Natural Language. Association for Computational Linguistics, 1992, pp. 357–362.
[31] Kyunghyun Cho, Bart Van Merri¨enboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio, “Learning phrase representations using rnn encoder-decoder for statistical machine translation,” arXiv preprint arXiv:1406.1078, 2014.
[32] Sainbayar Sukhbaatar, Jason Weston, Rob Fergus, et al., “End-to-end memory networks,”in Advances in neural information processing systems, 2015, pp. 2440–2448.
[33] Alex Graves, Greg Wayne, and Ivo Danihelka, “Neural turing machines,” arXiv preprint arXiv:1410.5401, 2014.
[34] Ankit Kumar and Ozan Irsoy, “Ask me anything: Dynamic memory networks for natural language processing,” .
[35] Aishwarya Agrawal, Jiasen Lu, Stanislaw Antol, Margaret Mitchell, C Lawrence Zitnick, Devi Parikh, and Dhruv Batra, “Vqa: Visual question answering,” International Journal of Computer Vision, pp. 1–28.
[36] Alexander M Rush, Sumit Chopra, and JasonWeston, “A neural attention model for abstractive sentence summarization,” arXiv preprint arXiv:1509.00685, 2015.
[37] Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio, “Show, attend and tell: Neural image caption generation with visual attention,” in International Conference on Machine Learning, 2015, pp. 2048–2057.
[38] William Chan, Navdeep Jaitly, Quoc Le, and Oriol Vinyals, “Listen, attend and spell: A neural network for large vocabulary conversational speech recognition,” in Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on. IEEE, 2016, pp. 4960–4964.
[39] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio, “Neural machine translation by jointly learning to align and translate,” arXiv preprint arXiv:1409.0473,2014.
[40] Diederik Kingma and Jimmy Ba, “Adam: A method for stochastic optimization,”arXiv preprint arXiv:1412.6980, 2014.
[41] George H Dunteman, Principal components analysis, Number 69. Sage, 1989.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/67254-
dc.description.abstract本論文之主軸在探討語音數位內容之口述詞彙偵測。由於近年來網路蓬勃發
展,使得網路上包含語音資訊的多媒體如線上課程、電影、戲劇、會議錄音等日
漸增加,因此,語音數位內容之檢索也隨之受到重視。語音數位內容檢索的關鍵
部分為口述語彙偵測,找出語音文件中出現查詢詞的部分。本論文的查詢詞為語
音訊號,並非文字。傳統的方法都會藉由語音辨識系統先將查詢詞轉為文字,而
本論文則不經過語音辨識系統,使用機器學習中的類神經網路,在訓練語料中學
習聲音的特徵,如此便可直接在語音訊號上進行口述詞彙偵測,以避免語音辨識
系統錯誤率影響檢索系統的問題。
本論文採用了專注式機制,此機制能夠使模型關注在語音文件中某個區塊,
避免多餘的雜訊影響。回顧機制能夠使模型依照先前的輸入而關注在語音文件中
不同地方,進而模型能夠多次關注語音文件,且更精準的找到查詢詞。同時也嘗
試使用語音詞向量,將語音文件編碼成為一向量,其向量能夠有詞與詞之間的相
關性,藉由語音文件向量進行口述詞彙偵測。
zh_TW
dc.description.provenanceMade available in DSpace on 2021-06-17T01:25:11Z (GMT). No. of bitstreams: 1
ntu-106-R04942094-1.pdf: 8215388 bytes, checksum: dc2670669ce9e56bf5ec64a87d259c8f (MD5)
Previous issue date: 2017
en
dc.description.tableofcontents誌謝. . i
中文摘要. . iii
一、導論. . 1
1.1 研究動機. . 1
1.2 研究方向. . 3
1.3 章節安排. . 4
二、背景知識. . 5
2.1 資訊檢索與語音資訊檢索. . 5
2.1.1 資訊檢索. . 5
2.1.2 語音資訊檢索. . 7
2.1.3 片段式動態時間校準(Segmental DTW). . 8
2.1.4 資訊檢索評估機制 . . 11
2.2 深層類神經網路(Deep Neural Network, DNN). . 14
2.2.1 簡介. . 14
2.2.2 運作原理. . 15
2.2.3 訓練類神經網路. . 16
2.2.4 類神經網路的困難. . 18
2.3 遞迴式類神經網路(Recurrent Neural Network,RNN). . 20
2.3.1 簡介. . 20
2.3.2 運作原理. . 21
2.3.3 沿時間反向傳播演算法. . 21
2.3.4 長短期記憶神經網路. . 22
2.4 本章總結. . 25
三、基於遞迴類神經網路之依例查詢口述語彙偵測. . 26
3.1 簡介. . 26
3.2 利用遞迴式神經網路的特徵向量表示法. . 27
3.2.1 抽取聲學特徵. . 27
3.2.2 序列對序列模型(Sequence-to-Sequence Model). . 30
3.3 系統架構. . 31
3.3.1 系統概觀. . 31
3.3.2 遞迴類神經網路模型. . 32
3.3.3 訓練方式. . 32
3.4 實驗結果與分析. . 33
3.4.1 實驗設定. . 33
3.4.2 基準實驗. . 34
3.4.3 實驗結果與分析.. 34
3.5 本章總結.. 36
四、基於專注式類神經網路之依例查詢口述語彙偵測.. 37
4.1 簡介. . 37
4.2 專注式機制. . 38
4.3 模型架構. . 39
4.3.1 系統架構簡介. . 39
4.3.2 語音查詢詞之向量表示法. . 40
4.3.3 專注式機制與語音文件表示法. . 41
4.3.4 回顧機制. . 42
4.3.5 分類器. . 43
4.4 非監督式訓練. . 43
4.5 實驗與分析. . 44
4.5.1 基準實驗與實驗設定. . 44
4.5.2 實驗結果與比較. . 44
4.5.3 結合基準實驗. . 46
4.5.4 專注式機制探討. . 47
4.5.5 非監督式訓練結果. . 52
4.6 本章總結. . 53
五、基於語音詞向量之依例查詢口述語彙偵測. . 54
5.1 簡介. . 54
5.2 語音詞向量. . 54
5.3 模型架構. . 56
5.3.1 模型簡介. . 56
5.3.2 推理機制(Inference mechanism). . 56
5.3.3 訓練方式. . 56
5.4 實驗與分析 . . 57
5.4.1 實驗設定與基準實驗. . 57
5.4.2 實驗結果與分析. . 58
5.5 本章總結. . 61
六、基於專注式和語音詞向量之依例查詢口述語彙偵測. . 62
6.1 簡介. . 62
6.2 模型架構. . 62
6.2.1 模型簡介. . 62
6.2.2 訓練方式. . 64
6.3 實驗與分析. . 65
6.3.1 實驗設定與基準實驗. . 65
6.3.2 實驗結果與分析. . 66
6.4 本章總結. . 67
七、結論與展望. . 68
7.1 結論. . 68
7.2 未來與展望. . 69
參考文獻. . 70
dc.language.isozh-TW
dc.subject依例查詢zh_TW
dc.subject專注式模型zh_TW
dc.subjectAttention-based Modelen
dc.subjectQuery-by-exampleen
dc.title基於專注式類神經網路之依例查詢口述語彙偵測zh_TW
dc.titleQuery-by-example Spoken Term Detection based on
Attention-based Neural Network
en
dc.typeThesis
dc.date.schoolyear105-2
dc.description.degree碩士
dc.contributor.oralexamcommittee李琳山,鄭秋豫,王小川,陳信宏
dc.subject.keyword專注式模型,依例查詢,zh_TW
dc.subject.keywordAttention-based Model,Query-by-example,en
dc.relation.page75
dc.identifier.doi10.6342/NTU201702646
dc.rights.note有償授權
dc.date.accepted2017-08-08
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept電信工程學研究所zh_TW
顯示於系所單位:電信工程學研究所

文件中的檔案:
檔案 大小格式 
ntu-106-1.pdf
  未授權公開取用
8.02 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved