請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/50719完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 李琳山 | |
| dc.contributor.author | Yen-Chen Wu | en |
| dc.contributor.author | 吳彥諶 | zh_TW |
| dc.date.accessioned | 2021-06-15T12:54:26Z | - |
| dc.date.available | 2019-07-26 | |
| dc.date.copyright | 2016-07-26 | |
| dc.date.issued | 2016 | |
| dc.date.submitted | 2016-07-18 | |
| dc.identifier.citation | [1] Ciprian Chelba, Timothy J Hazen, and Murat Sarac¸lar, “Retrieval and browsing of spoken content,” Signal Processing Magazine, IEEE, vol. 25, no. 3, pp. 39–49, 2008.
[2] Lin-shan Lee and Berlin Chen, “Spoken document understanding and organization,” Signal Processing Magazine, IEEE, vol. 22, no. 5, pp. 42–60, 2005. [3] Murat Saraclar and Richard Sproat, “Lattice-based search for spoken utterance retrieval,” Urbana, vol. 51, pp. 61801. [4] Jerome R Bellegarda, “Statistical language model adaptation: review and perspectives,” Speech communication, vol. 42, no. 1, pp. 93–108, 2004. [5] Peter F Brown, Peter V Desouza, Robert L Mercer, Vincent J Della Pietra, and Jenifer C Lai, “Class-based n-gram models of natural language,” Computational linguistics, vol. 18, no. 4, pp. 467–479, 1992. [6] William Gale and Geoffrey Sampson, “Good-turing smoothing without tears,” Journal of Quantitative Linguistics, vol. 2, no. 3, pp. 217–237, 1995. [7] Frankie James, “Modified kneser-ney smoothing of n-gram models,” Research Institute for Advanced Computer Science, Tech. Rep. 00.07, 2000. [8] Hung-Yi Lee, Tsung-Hsien Wen, and Lin-Shan Lee, “Improved semantic retrieval of spoken content by language models enhanced with acoustic similarity graph,” in Spoken Language Technology Workshop (SLT), 2012 IEEE. IEEE, 2012, pp. 182–187. [9] Hung-yi Lee and Lin-shan Lee, “Enhanced spoken term detection using support vector machines and weighted pseudo examples,” Audio, Speech, and Language Processing, IEEE Transactions on, vol. 21, no. 6, pp. 1272–1284, 2013. [10] Hung-Yi Lee, Yun-Nung Chen, and Lin-Shan Lee, “Improved speech summarization and spoken term detection with graphical analysis of utterance similarities,” Proc. APSIPA, 2011. [11] Tee Kiah Chia, Khe Chai Sim, Haizhou Li, and Hwee Tou Ng, “Statistical latticebased spoken document retrieval,” ACM Transactions on Information Systems (TOIS), vol. 28, no. 1, pp. 2, 2010. [12] John Lafferty and Chengxiang Zhai, “Document language models, query models, and risk minimization for information retrieval,” in Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2001, pp. 111–119. [13] John S Garofolo, Cedric GP Auzanne, and Ellen M Voorhees, “The trec spoken document retrieval track: A success story.,” . [14] Ian Ruthven and Mounia Lalmas, “A survey on the use of relevance feedback for information access systems,” The Knowledge Engineering Review, vol. 18, no. 02, pp. 95–145, 2003. [15] Gerard Salton, Anita Wong, and Chung-Shu Yang, “A vector space model for automatic indexing,” Communications of the ACM, vol. 18, no. 11, pp. 613–620, 1975. [16] Chengxiang Zhai and John Lafferty, “Model-based feedback in the language modeling approach to information retrieval,” in Proceedings of the tenth international conference on Information and knowledge management. ACM, 2001, pp. 403–410. [17] Stephen E Robertson and K Sparck Jones, “Relevance weighting of search terms,” Journal of the American Society for Information science, vol. 27, no. 3, pp. 129–146, 1976. [18] Simon Tong and Edward Chang, “Support vector machine active learning for image retrieval,” in Proceedings of the ninth ACM international conference on Multimedia. ACM, 2001, pp. 107–118. [19] King-Shy Goh, Edward Y Chang, and Wei-Cheng Lai, “Multimodal conceptdependent active learning for image retrieval,” in Proceedings of the 12th annual ACM international conference on Multimedia. ACM, 2004, pp. 564–571. [20] Jingrui He, Hanghang Tong, Mingjing Li, Hong-Jiang Zhang, and Changshui Zhang, “Mean version space: a new active learning method for content-based image retrieval,” in Proceedings of the 6th ACM SIGMM international workshop on Multimedia information retrieval. ACM, 2004, pp. 15–22. [21] Diane Kelly and Jaime Teevan, “Implicit feedback for inferring user preference: a bibliography,” in ACM SIGIR Forum. ACM, 2003, vol. 37, pp. 18–28. [22] Oren Kurland, Lillian Lee, and Carmel Domshlak, “Better than the real thing?: iterative pseudo-query processing using cluster-based language models,” in Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2005, pp. 19–26. [23] Jinxi Xu and W Bruce Croft, “Query expansion using local and global document analysis,” in Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 1996, pp. 4–11. [24] Shipeng Yu, Deng Cai, Ji-Rong Wen, and Wei-Ying Ma, “Improving pseudorelevance feedback in web information retrieval using web page segmentation,” in Proceedings of the 12th international conference on World Wide Web. ACM, 2003, pp. 11–18. [25] Tetsuya Sakai, Toshihiko Manabe, and Makoto Koyama, “Flexible pseudo-relevance feedback via selective sampling,” ACM Transactions on Asian Language Information Processing (TALIP), vol. 4, no. 2, pp. 111–135, 2005. [26] Guihong Cao, Jian-Yun Nie, Jianfeng Gao, and Stephen Robertson, “Selecting good expansion terms for pseudo-relevance feedback,” in Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2008, pp. 243–250. [27] Kyung Soon Lee, W Bruce Croft, and James Allan, “A cluster-based resampling method for pseudo-relevance feedback,” in Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2008, pp. 235–242. [28] Yuanhua Lv and ChengXiang Zhai, “A comparative study of methods for estimating query language models with pseudo feedback,” in Proceedings of the 18th ACM conference on Information and knowledge management. ACM, 2009, pp. 1895–1898. [29] Yuanhua Lv and ChengXiang Zhai, “Positional relevance model for pseudorelevance feedback,” in Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval. ACM, 2010, pp. 579– 586. [30] Richard S Sutton and Andrew G Barto, Reinforcement learning: An introduction, MIT press, 1998. [31] Ronald A Howard, “Dynamic programming and markov processes..,” 1960. [32] Stuart Dreyfus, “Richard bellman on the birth of dynamic programming,” Operations Research, vol. 50, no. 1, pp. 48–51, 2002. [33] Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov, “Dropout: A simple way to prevent neural networks from overfitting,” The Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929–1958, 2014. [34] David Robins, “Interactive information retrieval: Context and basic notions,” Informing Science, vol. 3, no. 2, pp. 57–62, 2000. [35] Ian Ruthven, “Interactive information retrieval,” Annual review of information science and technology, vol. 42, no. 1, pp. 43–91, 2008. [36] Teruhisa Misu and Tatsuya Kawahara, “Bayes risk-based dialogue management for document retrieval system with speech interface,” Speech Communication, vol. 52, no. 1, pp. 61–71, 2010. [37] Teruhisa Misu and Toshio Kawahara, “Speech-based interactive information guidance system using question-answering technique,” in Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on. IEEE, 2007, vol. 4, pp. IV–145. [38] Jingjing Liu, Scott Cyphers, Panupong Pasupat, Ian McGraw, and James R Glass, “A conversational movie search system based on conditional random fields.,” in INTERSPEECH, 2012, pp. 2454–2457. [39] Ian McGraw, Scott Cyphers, Panupong Pasupat, Jingjing Liu, and James R Glass, “Automating crowd-supervised learning for spoken language systems.,” in INTERSPEECH, 2012, pp. 2474–2477. [40] Tao Tao and ChengXiang Zhai, “Regularized estimation of mixture models for robust pseudo-relevance feedback,” in Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2006, pp. 162–169. [41] Tsung-Hsien Wen, Hung-Yi Lee, and Lin-Shan Lee, “Interactive spoken content retrieval with different types of actions optimized by a markov decision process.,” in INTERSPEECH, 2012, pp. 2458–2461. [42] Tsung-HsienWen, Hung-yi Lee, Pei-hao Su, and Lin-Shan Lee, “Interactive spoken content retrieval by extended query model and continuous state space markov decision process,” in Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 2013, pp. 8510–8514. [43] Ben He and Iadh Ounis, “Query performance prediction,” Information Systems, vol. 31, no. 7, pp. 585–594, 2006. [44] Ying Zhao, Falk Scholer, and Yohannes Tsegay, “Effective pre-retrieval query performance prediction using similarity and variability evidence,” in Advances in Information Retrieval, pp. 52–64. Springer, 2008. [45] Yun Zhou and W Bruce Croft, “Query performance prediction in web search environments,” in Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2007, pp. 543–550. [46] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015. [47] Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra, “Continuous control with deep reinforcement learning,” arXiv preprint arXiv:1509.02971, 2015. [48] Fr ́ed ́eric Bimbot, Christophe Cerisara, C ́ecile Fougeron, Guillaume Gravier, Lori Lamel, Franc ̧ois Pellegrino, and Pascal Perrier, Eds.,INTERSPEECH 2013, 14th Annual Conference of the International Speech Communication Association, Lyon, France, August 25-29, 2013. ISCA, 2013. | |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/50719 | - |
| dc.description.abstract | 本論文之主軸在探討語音數位內容之互動式檢索(Interactive Retrieval of Spoken Content)。近年來多媒體數位內容(Multimedia Content) 如線上課程、影音節目、會議錄音等大幅增加,語音數位內容( Spoken Content ) 之檢索也因而大受重視。本論文之目標放在互動式檢索; 由於語音或多媒體文件很難呈現在螢光幕上故瀏覽耗時,而過差的語音辨識率更可能使檢索結果不如人意,因此藉由系統與使用者互動使系統對使用者想找的資訊有更多瞭解,是一個有效改善此一問題的方法。
在本論文中,我們不但模仿前人用馬可夫決策模型(Markov Decision Process,MDP)來建立互動式檢索的模型,採用強化學習(Reinforcement Learning)演算法來學習出最佳系統決策, 更使用深層強化學習(Deep Reinforcement Learning)解決問題讓整個技術向前邁進一大步。實驗顯示,我們提出的方法確實能夠大幅改善檢索程序,幫助使用者更有效的找到所要找的資訊。 | zh_TW |
| dc.description.abstract | Interactive retrieval is important for spoken content. The reason is because when looking for text documents, one can easily scan through and select on a search engine result page, whereas similar privileges don not exist when searching for spoken content. Besides, it is hard for the users to find the desired spoken content when the search results are noisy, which usually happens due to the imperfect speech recognition components in spoken content retrieval. A way to counter the difficulties of spoken content retrieval is human-machine interaction that machine takes different actions to request additional information from the user to obtain better retrieval results. The most suitable actions depend on the situations, so in previous works, some hand-crafted states estimated from the current search results are used to determine the actions, but the hand-crafted states are not necessary the best indicator for choosing actions. In this paper, we applied the Deep-Q- Learning method in interactive retrieval of spoken content. Deep-Q- Learning sidesteps the estimation of the hand-crafted states and can directly determine the action based on retrieval results without any human knowledge. It reached discernible improvements compared with the hand-crafted states. | en |
| dc.description.provenance | Made available in DSpace on 2021-06-15T12:54:26Z (GMT). No. of bitstreams: 1 ntu-105-R03942044-1.pdf: 4686277 bytes, checksum: 0381461eeeebcf5aefbea07b8cea2b88 (MD5) Previous issue date: 2016 | en |
| dc.description.tableofcontents | 誌謝. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
中文摘要. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii 英文摘要. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii 一、導論. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 背景介紹. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 研究動機與目的. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 相關研究與主要貢獻. . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 章節安排. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 二、基礎背景知識. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.1 自動語音辨識系統. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.1.1 聲學模型. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.1.2 語言模型. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1.3 詞典. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.1.4 詞圖與N 最佳序列. . . . . . . . . . . . . . . . . . . . . . . . 8 2.1.5 辭典外詞彙(Out of Vocabulary, OOV) . . . . . . . . . . . . . 10 2.2 資料搜尋系統. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.1 語音文件搜尋. . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2.2 語音文件索引. . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2.3 向量空間檢索模型(VSM) . . . . . . . . . . . . . . . . . . . . 14 2.2.4 語言模型檢索模型(LM) . . . . . . . . . . . . . . . . . . . . . 15 2.2.5 資訊檢索評估機制. . . . . . . . . . . . . . . . . . . . . . . . 16 2.3 互動式系統. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.3.1 相關回饋. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.3.2 對話管理者- 馬可夫決策模型(MDP) . . . . . . . . . . . . . . 20 2.3.3 人工類神經網路. . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.4 本章總結. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 三、互動式語音資訊檢索系統. . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.1 簡介. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.1.1 前人研究與改進動機. . . . . . . . . . . . . . . . . . . . . . . 26 3.1.2 系統組成. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.2 語音檢索的相關回饋. . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.2.1 向量空間模型相關回饋. . . . . . . . . . . . . . . . . . . . . . 29 3.2.2 語言模型相關性回饋. . . . . . . . . . . . . . . . . . . . . . . 30 3.3 本章總結. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 四、深層強化學習與對話管理者. . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.1 簡介. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.2 馬可夫決策模型模擬互動式搜尋系統. . . . . . . . . . . . . . . . . . 34 4.2.1 狀態(State) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.2.2 補充:狀態特徵值(state feature) . . . . . . . . . . . . . . . . . 37 4.2.3 行動(Action) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.2.4 獎勵(Reward)與總回報(Return) . . . . . . . . . . . . . . . . . 41 4.3 強化學習演算法. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.3.1 值迭代學習法. . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4.3.2 貼合值迭代學習法. . . . . . . . . . . . . . . . . . . . . . . . 44 4.3.3 深層Q-類神經網路(Deep Q-Network, DQN) . . . . . . . . . . 46 4.4 系統評估. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.4.1 實驗設定. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.4.2 模擬使用者. . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.4.3 實驗結果. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.5 互動式搜尋系統結論. . . . . . . . . . . . . . . . . . . . . . . . . . . 55 五、模擬使用者. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 5.1 簡介. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 5.2 使用者調查. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 5.2.1 問卷設計與其內容. . . . . . . . . . . . . . . . . . . . . . . . 59 5.2.2 調查結果. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 5.3 實驗分析. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 5.3.1 互動行為代價實驗. . . . . . . . . . . . . . . . . . . . . . . . 65 5.3.2 使用者行為實驗. . . . . . . . . . . . . . . . . . . . . . . . . . 67 5.4 結論. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 六、結論與未來展望. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 6.1 對話管理者. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 6.2 模擬使用者. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 6.3 系統架構. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 參考文獻. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 | |
| dc.language.iso | zh-TW | |
| dc.subject | 互動式資訊檢索 | zh_TW |
| dc.subject | 深層強化學習 | zh_TW |
| dc.subject | 語音數位內容檢索 | zh_TW |
| dc.subject | 互動式資訊檢索 | zh_TW |
| dc.subject | 深層強化學習 | zh_TW |
| dc.subject | 語音數位內容檢索 | zh_TW |
| dc.subject | Interactive Information Retrieval | en |
| dc.subject | Spoken Content Retrieval | en |
| dc.subject | Deep Reinforcement Learning | en |
| dc.subject | Spoken Content Retrieval | en |
| dc.subject | Interactive Information Retrieval | en |
| dc.subject | Deep Reinforcement Learning | en |
| dc.title | 使用深層強化學習之互動式語音數位內容檢索 | zh_TW |
| dc.title | Interactive Spoken Content Retrieval with Deep Reinforcement Learning | en |
| dc.type | Thesis | |
| dc.date.schoolyear | 104-2 | |
| dc.description.degree | 碩士 | |
| dc.contributor.oralexamcommittee | 李宏毅,陳信宏,王小川,鄭秋豫,簡仁宗 | |
| dc.subject.keyword | 深層強化學習,互動式資訊檢索,語音數位內容檢索, | zh_TW |
| dc.subject.keyword | Deep Reinforcement Learning,Interactive Information Retrieval,Spoken Content Retrieval, | en |
| dc.relation.page | 80 | |
| dc.identifier.doi | 10.6342/NTU201600891 | |
| dc.rights.note | 有償授權 | |
| dc.date.accepted | 2016-07-18 | |
| dc.contributor.author-college | 電機資訊學院 | zh_TW |
| dc.contributor.author-dept | 電信工程學研究所 | zh_TW |
| 顯示於系所單位: | 電信工程學研究所 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-105-1.pdf 未授權公開取用 | 4.58 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
