語音數位內容檢索 ─ 相關回饋、圖論及語意

Hung-Yi Lee; 李宏毅

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/64347

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	李琳山(Lin-shan Lee)
dc.contributor.author	Hung-Yi Lee	en
dc.contributor.author	李宏毅	zh_TW
dc.date.accessioned	2021-06-16T17:42:01Z	-
dc.date.available	2012-08-17
dc.date.copyright	2012-08-17
dc.date.issued	2012
dc.date.submitted	2012-08-14
dc.identifier.citation	[1] Lin-Shan Lee and Berlin Chen, “Spoken document understanding and organization,” Signal Processing Magazine, IEEE, vol. 22, pp. 42 – 60, 2005. [2] C. Chelba, T.J. Hazen, and M. Saraclar, “Retrieval and browsing of spoken content,” Signal Processing Magazine, IEEE, vol. 25, no. 3, pp. 39 –49, may 2008. [3] John S. Garofolo, Cedric G. P. Auzanne, and Ellen M. Voorhees, The TREC Spoken Document Retrieval Track: A Success Story, 2000. [4] Murat Saraclar, “Lattice-based search for spoken utterance retrieval,” in In Proceedings of HLT-NAACL 2004, 2004, pp. 129–136. [5] Gokhan Tur and Renato DeMori, Spoken Language Understanding: Systems for Extracting Semantic Information from Speech, chapter 15, pp. 417–446, John Wiley & Sons Inc, 2011. [6] Lin-Shan Lee and Yi-Cheng Pan, “Voice-based information retrieval: how far are we from the text-based information retrieval ?,” in ASRU, 13 2009-dec. 1 2009, pp. 26 –43. [7] David R. H. Miller, Michael Kleber, Chia lin Kao, and Owen Kimball, “Rapid and accurate spoken term detection,” in INTERSPEECH, 2007. [8] D. Vergyri, I. Shafran, A. Stolcke, R. R. Gadde, M. Akbacak, B. Roark, and W.Wang, “The SRI/OGI 2006 spoken term detection system,” in INTERSPEECH, 2007. [9] Jonathan Mamou, Bhuvana Ramabhadran, and Olivier Siohan, “Vocabulary independent spoken term detection,” in SIGIR ’07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, 2007, pp. 615–622. [10] Peng Yu, Kaijiang Chen, Lie Lu, and Frank Seide, “Searching the audio notebook: keyword search in recorded conversations,” in HLT ’05: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, 2005, pp. 947–954. [11] Ciprian Chelba and Alex Acero, “Position specific posterior lattices for indexing speech,” in Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, 2005, pp. 443–450. [12] S. Parlak and M. Saraclar, “Spoken term detection for Turkish broadcast news,” in ICASSP, 2008. [13] Takaaki Hori, I. Lee Hetherington, Timothy J. Hazen, and James R. Glass, “Open vocabulary spoken utterance retrieval using confusion networks,” in ICASSP, 2007. [14] Jonathan Mamou, David Carmel, and Ron Hoory, “Spoken document retrieval from call-center conversations,” in Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, 2006, SIGIR ’06, pp. 51–58. [15] Jorge Silva, Ciprian Chelba, and Alex Acero, “Pruning analysis for the position specific posterior lattices for spoken document search,” in ICASSP, 2006. [16] Jorge Silva, Ciprian Chelba, and Alex Acero, “Integration of metadata in spoken document search using position specific posterior latices,” in SLT, 2006. [17] Zheng-Yu Zhou, Peng Yu, Ciprian Chelba, and Frank Seide, “Towards spoken document retrieval for the internet: lattice indexing for large-scale web-search architectures,” in Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, 2006, pp. 415–422. [18] Frank Seide, Peng Yu, and Yu Shi, “Towards spoken-document retrieval for the enterprise: Approximate word-lattice indexing with text indexers,” in ASRU, 2007. [19] Yi-Cheng Pan, Hung-Lin Chang, and Lin-Shan Lee, “Analytical comparison between position specific posterior lattices and confusion networks based on words and subword units for spoken document indexing,” in ASRU, 2007. [20] Cyril Allauzen, Mehryar Mohri, and Murat Saraclar, “General indexation of weighted automata: application to spoken utterance retrieval,” in Proceedings of the Workshop on Interdisciplinary Approaches to Speech Indexing and Retrieval at HLT-NAACL 2004, 2004. [21] B. Logan, P. Moreno, J. M. Van Thong, and E. Whittacker, “An experimental study of an audio indexing system for the web,” in ICSLP, 2000. [22] Yi-Cheng Pan, Hung-Lin Chang, , and Lin-Shan Lee, “Subword-based position specific posterior lattices (S-PSPL) for indexing speech information,” in INTERSPEECH, 2007. [23] Sha Meng, Peng Yu, Frank Seide, and Jia Liu, “A study of lattice-based spoken term detection for chinese spontaneous speech,” in ASRU, 2007. [24] J. Scott Olsson, Jonathan Wintrode, and Matthew Lee, “Fast unconstrained audio search in numerous human languages,” in ICASSP, 2008. [25] Corentin Dubois and Delphine Charlet, “Using textual information from LVCSR transcripts for phonetic-based spoken term detection,” in ICASSP, 2008. [26] Dogan Can, Erica Cooper, Abhinav Sethy, Chris White, Bhuvana Ramabhadran, and Murat Saraclar, “Effect of pronunciations on OOV queries in spoken term detection,” in ICASSP, 2009. [27] Dong Wang, Simon King, and Joe Frankel, “Stochastic pronunciation modelling for spoken term detection,” in INTERSPEECH, 2009. [28] Roy Wallace, Robbie Vogt, and Sridha Sridharan, “A phonetic search approach to the 2006 NIST spoken term detection evaluation,” in INTERSPEECH, 2007. [29] Yoshiaki Itoh, Kohei Iwata, Kazunori Kojima, Masaaki Ishigame, Kazuyo Tanaka, and Shi wook Lee, “An integration method of retrieval results using plural subword models for vocabulary-free spoken document retrieval,” in INTERSPEECH, 2007. [30] K. Ng, Subword-based approaches for spoken document retrieval, Ph.D. thesis, Massachusetts Institute of Technology, 2000. [31] B. Logan, J.-M. Van Thong, and P.J. Moreno, “Approaches to reduce the effects of OOV queries on indexed spoken audio,” Multimedia, IEEE Transactions on, vol. 7, no. 5, pp. 899 – 906, oct. 2005. [32] Ville T. Turunen, “Reducing the effect of OOV query words by using morph-based spoken document retrieval,” in INTERSPEECH, 2008. [33] Ville T. Turunen and Mikko Kurimo, “Indexing confusion networks for morph based spoken document retrieval,” in Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, 2007, SIGIR ’07, pp. 631–638. [34] DongWang, Joe Frankel, Javier Tejedor, and Simon King, “A comparison of phone and grapheme-based spoken term detection,” in ICASSP, 2008. [35] Shi wook Lee, Kazuyo Tanaka, and Yoshiaki Itoh, “Combining multiple subword representations for open-vocabulary spoken document retrieval,” in ICASSP, 2005. [36] Sha Meng, Peng Yu, Jia Liu, and Frank Seide, “Fusing multiple systems into a compact lattice index for chinese spoken term detection,” in ICASSP, 2008. [37] Chao-Hong Meng, Hung-Yi Lee, and Lin-Shan Lee, “Improved lattice-based spoken document retrieval by directly learning from the evaluation measures,” in ICASSP, 2009. [38] Carolina Parada, Abhinav Sethy, and Bhuvana Ramabhadran, “Query-by-example spoken term detection for OOV terms,” in ASRU, 2009. [39] Timothy J. Hazen,Wade Shen, and Christopher White, “Query-by-example spoken term detection using phonetic posteriorgram templates,” in ASRU, 2009. [40] Yaodong Zhang and James R. Glass, “Unsupervised spoken keyword spotting via segmental DTW on gaussian posteriorgrams,” in ASRU, 2009. [41] W. Shen, C. White, and T. Hazen, “A comparison of query-by-example methods for spoken term detection,” in INTERSPEECH, 2009. [42] Hui Lin, Alex Stupakov, and Jeff Bilmes, “Improving multi-lattice alignment based spoken keyword spotting,” in ICASSP, 2009. [43] Hui Lin, Alex Stupakov, and Jeff Bilmes, “Spoken keyword spotting via multi lattice alignment,” in INTERSPEECH, 2008. [44] Chun-An Chan and Lin-Shan Lee, “Unsupervised spoken term detection with spoken queries using segment-based dynamic time warping,” in INTERSPEECH, 2010. [45] Chun-An Chan and Lin-Shan Lee, “Unsupervised hidden markov modeling of spoken queries for spoken term detection without speech recognition,” in INTERSPEECH, 2011. [46] HaipengWang, Cheung-Chi Leung, Tan Lee, Bin Ma, and Haizhou Li, “An acoustic segment modeling approach to query-by-example spoken term detection,” in ICASSP, 2012. [47] http://www.itl.nist.gov/iad/mig/tests/std/2006/index. html. [48] Text REtrieval Conference, http://trec.nist.gov/. [49] Hung-Yi Lee and Lin-Shan Lee, “Improving retrieval performance by user feedback: a new framework for spoken term detection,” in ICASSP, 2010. [50] Yu-Hui Chen, Chia-Chen Chou, Hung-Yi Lee, and Lin-Shan Lee, “An initial attempt to improve spoken term detection by learning optimal weights for different indexing features,” in ICASSP, 2010, pp. 5278 –5281. [51] Roy Wallace, Robbie Vogt, Brendan Baker, and Sridha Sridharan, “Optimising figure of merit for phonetic spoken term detection,” in ICASSP, 2010. [52] Javier Tejedor, Doroteo T. Toledano, Miguel Bautista, Simon King, Dong Wang, and Jose Colas, “Augmented set of features for confidence estimation in spoken term detection,” in INTERSPEECH, 2010. [53] Dong Wang, Simon King, Joe Frankel, and Peter Bell, “Term-dependent confidence for out-of-vocabulary term detection,” in INTERSPEECH, 2009. [54] Joseph Keshet, David Grangier, and Samy Bengio, “Discriminative keyword spotting,” Speech Communication, vol. 51, pp. 317 – 329, 2009. [55] M. Wollmer, F. Eyben, J. Keshet, A. Graves, B. Schuller, and G. Rigoll, “Robust discriminative keyword spotting for emotionally colored spontaneous speech using bidirectional lstm networks,” in ICASSP, 2009. [56] Tomoyosi Akiba, Hiromitsu Nishizaki, Kiyoaki Aikawa, Tatsuya Kawahara, and Tomoko Matsui, “Overview of the IR for spoken documents task in NTCIR-9 workshop,” in Proceedings of NTCIR-9 Workshop, 2011. [57] J.-M. Van Thong, P.J. Moreno, B. Logan, B. Fidler, K. Maffey, and M. Moores, “Speechbot: an experimental speech-based search engine for multimedia content on the web,” Multimedia, IEEE Transactions on, vol. 4, no. 1, pp. 88 –96, mar 2002. [58] http://speechfind.utdallas.edu/. [59] http://www.ngsw.org/. [60] Masataka Goto, Jun Ogata, and Kouichirou Eto, “Podcastle: A web 2.0 approach to speech recognition research,” in INTERSPEECH, 2007. [61] Jun Ogata and Masataka Goto, “Podcastle: Collaborative training of acoustic models on the basis of wisdom of crowds for podcast transcription,” in INTERSPEECH, 2009. [62] Christopher Alberti, Michiel Bacchiani, Ari Bezman, Ciprian Chelba, Anastassia Drofa, Hank Liao, Pedro Moreno, Ted Power, Arnaud Sahuguet, Maria Shugrina, and Olivier Siohan, “An audio indexing system for election video material,” in ICASSP, 2009. [63] labs.google.com/gaudi. [64] http://web.sls.csail.mit.edu/lectures/. [65] Bo-June Hsu and J. Glass, “Language model parameter estimation using user transcriptions,” in ICASSP, 2009. [66] Gregory T Yu, “Efficient error correction for speech systems using constrained re-recognition,” M.S. thesis, Massachusetts Institute of Technology, 2008. [67] Hung-Yi Lee, Yueh-Lien Tang, Hao Tang, and Lin-Shan Lee, “Spoken term detection from bilingual spontaneous speech using code-switched lattice-based structures for words and subword units,” in ASRU, 2009. [68] Sheng-Yi Kong, Miao-Ru Wu, Che-Kuang Lin, Yi-Sheng Fu, and Lin-Shan Lee, “Learning on demand – course lecture distillation by information extraction and semantic structuring for spoken documents,” in ICASSP, 2009. [69] I. Ruthven and M. Lalmas, “A survey on the use of relevance feedback for information access systems,” in The Knowledge Engineering Review, 2003. [70] J. J. Rocchio, “Relevance feedback in information retrieval,” in The SMART retrieval system - experiments in automatic document processing, 1971. [71] S. E. Robertson and K. Sparck Jones, “Relevance weighting of search terms,” in Journal of the American Society for Information Science, 1976. [72] Chengxiang Zhai and John Lafferty, “Model-based feedback in the language modeling approach to information retrieval,” in CIKM ’01: Proceedings of the tenth international conference on Information and knowledge management, 2001, pp. 403–410. [73] X. S. Zhou and T. S. Huang, “Relevance feedback in image retrieval: A comprehensive review,” in Multimedia systems, 2003. [74] Ritendra Datta, Dhiraj Joshi, Jia Li, and James Z. Wang, “Image retrieval: Ideas, influences, and trends of the new age,” ACM Comput. Surv., vol. 40, no. 2, pp. 5:1–5:60, May 2008. [75] Pengyu Hong, Qi Tian, and T.S. Huang, “Incorporate support vector machines to content-based image retrieval with relevance feedback,” in Image Processing, 2000. Proceedings. 2000 International Conference on, 2000. [76] S.D. MacArthur, C.E. Brodley, and Chi-Ren Shyu, “Relevance feedback decision trees in content-based image retrieval,” in Content-based Access of Image and Video Libraries, 2000. Proceedings. IEEE Workshop on, 2000. [77] Jing Xin and J.S. Jin, “Learning from user feedback for image retrieval,” in Information, Communications and Signal Processing, 2003 and the Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint Conference of the Fourth International Conference on, 2003. [78] N. Vasconcelos and A. Lippman, “Bayesian relevance feedback for content-based image retrieval,” in Content-based Access of Image and Video Libraries, 2000. Proceedings. IEEE Workshop on, 2000. [79] Rong Yan, Alexander Hauptmann, and Rong Jin, “Negative pseudo-relevance feedback in content-based video retrieval,” in Proceedings of the eleventh ACM international conference on Multimedia, 2003. [80] Rong Yan, Alexander Hauptmann, and Rong Jin, “Multimedia search with pseudo-relevance feedback,” in Proceedings of the 2nd international conference on Image and video retrieval, 2003, CIVR’03, pp. 238–247. [81] Bo Yang, Tao Mei, Xian-Sheng Hua, Linjun Yang, Shi-Qiang Yang, and Mingjing Li, “Online video recommendation based on multimodal fusion and relevance feedback,” in Proceedings of the 6th ACM international conference on Image and video retrieval, 2007. [82] Gui-Rong Xue, Hua-Jun Zeng, Zheng Chen, Yong Yu, Wei-Ying Ma, WenSi Xi, and WeiGuo Fan, “Optimizing web search using web click-through data,” in Proceedings of the thirteenth ACM international conference on Information and knowledge management, 2004, CIKM ’04, pp. 118–126. [83] Olivier Chapelle and Ya Zhang, “A dynamic bayesian network click model for web search ranking,” in Proceedings of the 18th international conference on World wide web, 2009, WWW ’09, pp. 1–10. [84] Diane Kelly and Jaime Teevan, “Implicit feedback for inferring user preference: a bibliography,” SIGIR Forum, vol. 37, no. 2, pp. 18–28, Sept. 2003. [85] Thorsten Joachims, Laura Granka, Bing Pan, Helene Hembrooke, and Geri Gay, “Accurately interpreting click-through data as implicit feedback,” in Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, 2005, SIGIR ’05, pp. 154–161. [86] Diane Kelly and Nicholas J. Belkin, “Display time as implicit feedback: understanding task effects,” in Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, 2004. [87] Xuehua Shen, Bin Tan, and ChengXiang Zhai, “Context-sensitive information retrieval using implicit feedback,” in Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, 2005, SIGIR ’05, pp. 43–50. [88] Georg Buscher, Andreas Dengel, and Ludger van Elst, “Eye movements as implicit relevance feedback,” in CHI ’08: CHI ’08 extended abstracts on Human factors in computing systems, 2008, pp. 2991–2996. [89] Georg Buscher, Andreas Dengel, and Ludger van Elst, “Query expansion using gaze-based feedback on the sub-document level,” in SIGIR ’08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, 2008, pp. 387–394. [90] Jarkko Salojarvi, Kai Puolamaki, and Samuel Kaski, “Implicit relevance feedback from eye movements,” in Artificial Neural Networks: Biological Inspirations – ICANN 2005, 2005, pp. 513–518. [91] Ioannis Arapakis, Joemon M. Jose, and Philip D. Gray, “Affective feedback: an investigation into the role of emotions in the information seeking process,” in SIGIR ’08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, 2008, pp. 395–402. [92] K. Bain, S. Basson, A. Faisman, and D. Kanevsky, “Accessibility, transcription, and access everywhere,” IBM Systems Journal, vol. 44, pp. 589 –603, 2005. [93] Thorsten Joachims and Filip Radlinski, “Search engines that learn from implicit feedback,” Computer, vol. 40, pp. 34–40, 2007. [94] T. Deselaers, R. Paredes, E. Vidal, and H. Ney, “Learning weighted distances for relevance feedback in image retrieval,” in Pattern Recognition, 2008. ICPR 2008. 19th International Conference on, 2008. [95] Donna Harman, “Relevance feedback revisited,” in Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval, 1992. [96] Makoto Iwayama, “Relevance feedback with a small number of relevance judgements: incremental relevance feedback vs. document clustering,” in Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, 2000. [97] H. Muller, W. Muller, S. Marchand-Maillet, T. Pun, and D.M. Squire, “Strategies for positive and negative relevance feedback in image retrieval,” in Pattern Recognition, 2000. Proceedings. 15th International Conference on, 2000. [98] Chris Buckley, Gerard Salton, and James Allan, “The effect of adding relevance information in a relevance feedback environment,” in Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval, 1994. [99] Oren Kurland, Lillian Lee, and Carmel Domshlak, “Better than the real thing?: iterative pseudo-query processing using cluster-based language models,” in Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, 2005. [100] Jinxi Xu and W. Bruce Croft, “Query expansion using local and global document analysis,” in Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, 1996. [101] Victor Lavrenko and W. Bruce Croft, “Relevance based language models,” in SIGIR ’01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, 2001, pp. 120–127. [102] Shipeng Yu, Deng Cai, Ji-Rong Wen, and Wei-Ying Ma, “Improving pseudo-relevance feedback in web information retrieval using web page segmentation,” in Proceedings of the 12th international conference on World Wide Web, 2003. [103] Tetsuya Sakai, Toshihiko Manabe, and Makoto Koyama, “Flexible pseudo-relevance feedback via selective sampling,” ACM Transactions on Asian Language Information Processing, vol. 4, pp. 111–135, 2005. [104] Guihong Cao, Jian-Yun Nie, Jianfeng Gao, and Stephen Robertson, “Selecting good expansion terms for pseudo-relevance feedback,” in Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, 2008. [105] Kyung Soon Lee, W. Bruce Croft, and James Allan, “A cluster-based re-sampling method for pseudo-relevance feedback,” in Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, 2008. [106] Yuanhua Lv and ChengXiang Zhai, “A comparative study of methods for estimating query language models with pseudo feedback,” in Proceeding of the 18th ACM conference on Information and knowledge management, 2009. [107] Yuanhua Lv and ChengXiang Zhai, “Positional relevance model for pseudo-relevance feedback,” in Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval, 2010. [108] W.-H. Lin, R. Jin, and A. Hauptmann, “Web image retrieval re-ranking with relevance model,” in Web Intelligence, 2003. WI 2003. Proceedings. IEEE/WIC International Conference on, 2003, pp. 242 – 248. [109] S. Rudinac, M. Larson, and A. Hanjalic, “Exploiting visual re-ranking to improve pseudo-relevance feedback for spoken-content-based video retrieval,” in Image Analysis for Multimedia Interactive Services, 2009. WIAMIS ’09. 10th Workshop on, 2009. [110] C. Parada, A. Sethy, and B. Ramabhadran, “Balancing false alarms and hits in spoken term detection,” in ICASSP, 2010. [111] Savitha Srinivasan and Dragutin Petkovic, “Phonetic confusion matrix based spoken document retrieval,” in Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, 2000, SIGIR ’00, pp. 81–87. [112] H. Nanjo and T. Kawahara, “A new ASR evaluation measure and minimum bayes risk decoding for open-domain speech understanding,” in ICASSP, 2005. [113] T. Shichiri, H. Nanjo, and T. Yoshimi, “Minimum bayes-risk decoding with presumed word significance for speech based information retrieval,” in ICASSP, 2008. [114] Q. Fu and B.-H. Juang, “Automatic speech recognition based on weighted minimum classification error (W-MCE) training method,” in ASRU, 2007. [115] Biing-Hwang Juang, Wu Hou, and Chin-Hui Lee, “Minimum classification error rate methods for speech recognition,” Speech and Audio Processing, IEEE Transactions on, vol. 5, no. 3, pp. 257 –265, may 1997. [116] J. Shao, R.-P. Yu, Q. Zhao, Y. Yan, and F. Seide, “Towards vocabulary-independent speech indexing for large-scale repositories,” in INTERSPEECH, 2008. [117] Hung-Yi Lee, Chia-Ping Chen, Ching-Feng Yeh, and Lin-Shan Lee, “Improved spoken term detection by discriminative training of acoustic models based on user relevance feedback,” in INTERSPEECH, 2010. [118] Hung-Yi Lee, Chia-Ping Chen, Ching-Feng Yeh, and Lin-Shan Lee, “A framework integrating different relevance feedback scenarios and approaches for spoken term detection,” in SLT, 2012. [119] Hung-Yi Lee, Chia-Ping Chen, and Lin-Shan Lee, “Integrating recognition and retrieval with relevance feedback for spoken term detection,” Audio, Speech, and Language Processing, IEEE Transactions on, vol. 20, no. 7, pp. 2095 –2110, sept. 2012. [120] Jen-Tzung Chien and Meng-Sung Wu, “Minimum rank error language modeling,” Audio, Speech, and Language Processing, IEEE Transactions on, vol. 17, pp. 267 –276, 2009. [121] Daniel Povey, Discriminative Training for Large Vocabulary Speech Recognition, Ph.D. thesis, Cambridge University Engineering Dept, 2003. [122] B.-Y. Liang, “Acoustic models for continuous mandarin speech recognition,” M.S. thesis, NTU, 1998. [123] Tsung-Wei Tu, Hung-Yi Lee, and Lin-Shan Lee, “Improved spoken term detection using support vector machines with acoustic and context features from pseudo-relevance feedback,” in ASRU, 2011. [124] Hung-Yi Lee, Tsung-Wei Tu, Chia-Ping Chen, Chao-Yu Huang, and Lin-Shan Lee, “Improved spoken term detection using support vector machines based on lattice context consistency,” in ICASSP, 2011. [125] Christopher J. C. Burges, “A tutorial on support vector machines for pattern recognition,” Data Min. Knowl. Discov., vol. 2, no. 2, pp. 121–167, June 1998. [126] Yaodong Zhang and J.R. Glass, “Towards multi-speaker unsupervised speech pattern discovery,” in ICASSP, 2010. [127] S.C.H. Hoi, M.R. Lyu, and R. Jin, “A unified log-based relevance feedback scheme for image retrieval,” Knowledge and Data Engineering, IEEE Transactions on, vol. 18, no. 4, pp. 509 – 524, april 2006. [128] Thomas G. Dietterich, “An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization,” Mach. Learn., vol. 40, no. 2, pp. 139–157, Aug. 2000. [129] Yisong Yue, Thomas Finley, Filip Radlinski, and Thorsten Joachims, “A support vector method for optimizing average precision,” in Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, 2007. [130] Y. Chen, X. Zhou, and T. S. Huang, “One-class SVM for learning in image retrieval,” in Proc. IEEE ICIP, 2002. [131] Xiang Sean Zhou and T.S. Huang, “Small sample learning during multimedia retrieval using biasmap,” in Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on, 2001. [132] Masashi Sugiyama, “Local fisher discriminant analysis for supervised dimensionality reduction,” in Proceedings of the 23rd international conference on Machine learning, 2006, ICML ’06, pp. 905–912. [133] Xiaojin Zhu, “Semi-supervised learning literature survey,” Tech. Rep. 1530, Computer Sciences, University of Wisconsin-Madison, 2005. [134] Thorsten Joachims, “Transductive inference for text classification using support vector machines,” in 16th International Conference on Machine Learning, 1999. [135] Mikhail Belkin, Partha Niyogi, and Vikas Sindhwani, “Manifold regularization: A geometric framework for learning from labeled and unlabeled examples,” J. Mach. Learn. Res., vol. 7, pp. 2399–2434, 2006. [136] Chia-Ping Chen, Hung-Yi Lee, Ching-Feng Yeh, and Lin-Shan Lee, “Improved spoken term detection by feature space pseudo-relevance feedback,” in INTERSPEECH, 2010. [137] Yun-Nung Chen, Chia-Ping Chen, Hung-Yi Lee, Chun-An Chan, and Lin-Shan Lee, “Improved spoken term detection with graph-based re-ranking in feature space,” in ICASSP, 2011. [138] Hung-Yi Lee, Yun-Nung Chen, and Lin-Shan Lee, “Improved speech summarization and spoken term detection with graphical analysis of utterance similarities,” in APSIPA, 2011. [139] Hung-Yi Lee, Po-Wei Chou, and Lin-Shan Lee, “Open-vocabulary retrieval of spoken content with shorter/longer queries considering word/subword-based acoustic feature similarity,” in INTERSPEECH, 2012. [140] Amy N. Langville and Carl D. Meyer, “A survey of eigenvector methods for web information retrieval,” SIAM Rev., vol. 47, pp. 135–161, January 2005. [141] Sergey Brin and Lawrence Page, “The anatomy of a large-scale hypertextual web search engine,” Computer Networks and ISDN Systems, vol. 30, no. 1–7, pp. 107 – 117, 1998. [142] Winston H. Hsu, Lyndon S. Kennedy, and Shih-Fu Chang, “Video search reranking through random walk over document-level context graph,” in Proceedings of the 15th international conference on Multimedia, 2007, pp. 971–980. [143] Xinmei Tian, Linjun Yang, Jingdong Wang, Yichen Yang, Xiuqing Wu, and Xian-Sheng Hua, “Bayesian video search reranking,” in Proceedings of the 16th ACM international conference on Multimedia, 2008, pp. 131–140. [144] G. Aradilla, J. Vepa, and H. Bourlard, “Using posterior-based features in template matching for speech recognition,” in ICSLP, 2006. [145] Ching-Feng Yeh, Liang-Che Sun, Chao-Yu Huang, and Lin-Shan Lee, “Bilingual acoustic modeling with state mapping and three-stage adaptation for transcribing unbalanced code-mixed lectures,” in ICASSP, 2011. [146] Maximilian Bisani and Hermann Ney, “Joint-sequence models for grapheme-to-phoneme conversion,” Speech Communication, vol. 50, pp. 434 – 451, 2008. [147] Ariya Rastrow, Abhinav Sethy, Bhuvana Ramabhadran, and Frederick Jelinek, “Towards using hybrid word and fragment units for vocabulary independent LVCSR systems,” in INTERSPEECH, 2009. [148] Tsung-Wei Tu, Hung-Yi Lee, Yu-Yu Chou, and Lin-Shan Lee, “Semantic query expansion and context-based discriminative term modeling for spoken document retrieval,” in ICASSP, 2012. [149] Hung-Lin Chang, Yi-Cheng Pan, and Lin-Shan Lee, “Latent semantic retrieval of spoken documents over position specific posterior lattices,” in SLT, 2008. [150] B. Chen, Pei-Ning Chen, and Kuan-Yu Chen, “Query modeling for spoken document retrieval,” in ASRU, 2011. [151] Xinhui Hu, Ryosuke Isotani, Hisashi Kawai, and Satoshi Nakamura, “Cluster-based language model for spoken document retrieval using NMF-based document clustering,” in INTERSPEECH, 2010. [152] Tomoyosi Akiba and Koichiro Honda, “Effects of query expansion for spoken document passage retrieval,” in INTERSPEECH, 2011. [153] Ryo Masumura, Seongjun Hahm, and Akinori Ito, “Language model expansion using webdata for spoken document retrieval,” in INTERSPEECH, 2011. [154] Thomas Hofmann, “Probabilistic latent semantic analysis,” in In Proc. of Uncertainty in Artificial Intelligence, UAI’99, 1999. [155] Xing Wei and W. Bruce Croft, “Lda-based document models for ad-hoc retrieval,” in Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, 2006, SIGIR ’06, pp. 178–185. [156] Quan Wang, Jun Xu, Hang Li, and Nick Craswell, “Regularized latent semantic indexing,” in Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, 2011, SIGIR ’11, pp. 685–694. [157] Tao Tao and ChengXiang Zhai, “Regularized estimation of mixture models for robust pseudo-relevance feedback,” in SIGIR ’06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, 2006, pp. 162–169. [158] Tee Kiah Chia, Khe Chai Sim, Haizhou Li, and Hwee Tou Ng, “Statistical lattice-based spoken document retrieval,” ACM Trans. Inf. Syst., vol. 28, pp. 2:1–2:30, 2010. [159] Chengxiang Zhai and John Lafferty, “A study of smoothing methods for language models applied to ad hoc information retrieval,” in Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, 2001, SIGIR ’01, pp. 334–342. [160] Karen Sparck Jones, Steve Walker, and Stephen E. Robertson, “A probabilistic model of information retrieval: Development and comparative experiments,” in Information Processing and Management, 2000. [161] Matthew Lease and Eugene Charniak, “A dirichlet-smoothed bigram model for retrieving spontaneous speech,” in Advances in Multilingual and Multimodal Information Retrieval. Springer-Verlag, 2008. [162] David M. Blei, Andrew Y. Ng, and Michael I. Jordan, “Latent dirichlet allocation,” J. Mach. Learn. Res., vol. 3, pp. 993–1022, 2003. [163] Avinash Atreya and Charles Elkan, “Latent semantic indexing (LSI) fails for TREC collections,” SIGKDD Explor. Newsl., vol. 12, pp. 5–10, 2011.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/64347	-
dc.description.abstract	一般的語音資訊檢索可以分成兩個階段。語音辨識引擎先將語料庫中的語音資訊轉寫成文字並儲存起來；然後在檢索時，就直接把文字資訊檢索的方法套用在這些辨識結果上。如果語音辨識引擎可以正確的將語音轉寫成文字，上述架構當然可以得到良好的結果，然而在語音辨識系統正確率較差的情況下，這樣的架構勢必會造成語音資訊檢索的效能大幅下降。本論文的核心思想就是要突破上述架構中語音資訊檢索因完全仰賴語音辨識結果所造成的效能限制，這將會是語音資訊檢索這個領域未來非常重要的發展方向。本論文首先提出了以使用者相關回饋來重估測辨識系統的聲學模型參數的新技術。有別於傳統的聲學模型訓練法，本論文以提升檢索效能做為聲學模型訓練的目標，並將檢索系統以排序結果進行評估的特性在聲學模型訓練的過程中加以考量。另一方面，本論文提出了以聲學特徵參數做為機器學習特徵的想法，這個方法成功的被實作在虛擬回饋的架構下。其次，為了彌補在辨識過程中所漏失的資訊，本論文提出以聲學特徵相似度來改進語音資訊檢索的想法，這個想法可以被用在虛擬回饋以及圖學基礎之重排序上。最後，雖然今日語音檢索的研究仍集中在提升口述語彙偵測的效能，但本論文進一步考慮了語意檢索，目標在找出語意相關的語音文件，而不僅僅是找出包含查詢詞的文件。本文提出了以聲學特徵相似度來提升詞頻估測準確率的方法，這個方法可以進一步提升語意檢索中的語言檢索模型、文件擴展、查詢詞擴展等技術之效能。	zh_TW
dc.description.abstract	Multimedia content over the Internet is very attractive, while the spoken part of such content very often tells the core information. Therefore, spoken content retrieval will be very important in helping users retrieve and browse efficiently across the huge qualities of multimedia content in the future. There are usually two stages in typical spoken content retrieval approaches. In the first stage, the audio content is recognized into text symbols by an Automatic Speech Recognition (ASR) system based on a set of acoustic models and language models. In the second stage, after the user enters a query, the retrieval engine searches through the recognition output and returns to the user a list of relevant spoken documents or segments. If the spoken content can be transcribed into text with very high accuracy, the problem is naturally reduced to text information retrieval. However, the inevitable high recognition error rates for spontaneous speech under a wide variety of acoustic conditions and linguistic context make this never possible. In this thesis, the above standard two-stage architecture is completely broken, and the two stages of recognition and retrieval are mixed up and considered as a whole. A set of approaches beyond retrieving over recognition output has been developed here. This idea is very helpful for spoken content retrieval, and may become one of the main future directions in this area. To consider the two stages of recognition and retrieval as a whole, it is proposed to adjust the acoustic model parameters borrowing the techniques of discriminative training but based on user relevance feedback. The problem of retrieval oriented acoustic model re-estimation is different from the conventional acoustic model training approaches for speech recognition in at least two ways: 1. The model training information includes only whether a spoken segment is relevant to a query or not; it does not include the transcription of any utterance. 2. The goal is to improve retrieval performance rather than recognition accuracy. A set of objective functions for retrieval oriented acoustic model re-estimation is proposed to take the properties of retrieval into consideration. There have been some previous works in spoken content retrieval taking advantage of the discriminative capability of machine learning methods. Different from the previous works that derive information from recognition output as features, acoustic vectors such as MFCC are taken as the features for discriminating relevant and irrelevant segments, and they are successfully applied on the scenario of Pseudo Relevance Feedback (PRF). The recognition process can be considered as ``quantization', in which the acoustic vector sequences are quantized into word symbols. Because different vector sequences may be quantized into the same symbol, much of the information in the spoken content may be lost in the stage of speech recognition. Information directly from the acoustic vector space is considered to compensate for the recognition output in this thesis. This is realized by either PRF or a graph-based re-ranking approach considering the similarity structure among all the segments retrieved. This approach is successfully applied on not only word-based retrieval system but also subword-based system, and these approaches improve the results of Out-of-Vocabulary (OOV) queries as well. The task of Spoken Term Detection (STD) is mainly considered in this thesis, for which the goal is simply returning spoken segments that contain the query terms. Although most works in spoken content retrieval nowadays continue to focus on STD, in this thesis a more general task is also considered: to retrieve the spoken documents semantically related to the queries, no matter the query terms are included in the spoken documents or not. Taking ASR transcriptions as text, the techniques such as latent semantic analysis or query expansion developed for text-based information retrieval can be directly applied for this task. However, the inevitable recognition errors in ASR transcriptions degrade the performance of these techniques. To have more robust semantic retrieval of spoken documents, the expected term frequencies derived from the lattices are enhanced by acoustic similarity with a graph-based approach. The enhanced term frequencies improve the performance of language modelling retrieval approach, document expansion techniques based on latent semantic analysis, and query expansion methods considering both words and latent topic information.	en
dc.description.provenance	Made available in DSpace on 2021-06-16T17:42:01Z (GMT). No. of bitstreams: 1 ntu-101-D99942018-1.pdf: 3072335 bytes, checksum: 3c2ffce418e2ca8546f6cf982ca3bd60 (MD5) Previous issue date: 2012	en
dc.description.tableofcontents	I Introduction and Background Review 1 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1 Spoken Content Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.1 Spoken Term Detection . . . . . . . . . . . . . . . . . . . . . . . 5 1.1.2 Semantic Retrieval of Spoken Content . . . . . . . . . . . . . . . 6 1.2 Organization of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . 7 2 Spoken Content Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1 Basic Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 Lattices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3 Out-of-Vocabulary problem and Subword-based Indexing . . . . . . . . . 13 2.4 Query-by-Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.5 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.6 Optimizing Evaluation Performance . . . . . . . . . . . . . . . . . . . . 18 2.7 Machine Learning Methods . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.8 Benchmark Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.9 Spoken Content Retrieval in the Real World . . . . . . . . . . . . . . . . 21 3 Relevance Feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.1 User Relevance Feedback . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.1.1 Short-term Context User Relevance Feedback . . . . . . . . . . . 27 3.1.2 Long-term Context User Relevance Feedback . . . . . . . . . . . 28 3.2 Pseudo-Relevance Feedback . . . . . . . . . . . . . . . . . . . . . . . . 29 II Improved Spoken Content Retrieval 31 4 Retrieval Oriented Acoustic Model Re-estimation by Relevance Feedback . . . 32 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.2 Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.3 Acoustic Model Re-estimation in Short-term Context User Relevance Feedback . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.3.1 Objective Function . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.3.2 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.4 Acoustic Model Re-estimation in Long-term Context User Relevance Feedback . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.5 Experiments for Lecture Courses . . . . . . . . . . . . . . . . . . . . . . 43 4.5.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . 43 4.5.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . 45 4.6 Experiments for Broadcast News . . . . . . . . . . . . . . . . . . . . . . 52 4.6.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.6.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . 53 4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 5 Machine Learning Methods with Pseudo-relevance Feedback . . . . . . . . . . 56 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 5.2 Support Vector Machines for Pseudo-relevance Feedback . . . . . . . . . 57 5.3 Feature Representations based on Acoustic Information . . . . . . . . . . 60 5.4 Enhanced Pseudo-relevance Feedback . . . . . . . . . . . . . . . . . . . 63 5.4.1 Example Selection and Reliability Estimation based on Acoustic Similarity . . . . . . . . . . . . . . . . . . . . 63 5.4.2 Modified SVM . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 5.5 Experiments for Lecture Courses . . . . . . . . . . . . . . . . . . . . . . 67 5.5.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . 67 5.5.2 Features based on Acoustic Information . . . . . . . . . . . . . . 68 5.5.3 Enhanced PRF . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 5.6 Experiments for Broadcast News . . . . . . . . . . . . . . . . . . . . . . 77 5.7 Experiences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 5.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 6 Example-based Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 6.2 Example-based Approach . . . . . . . . . . . . . . . . . . . . . . . . . . 89 6.2.1 Complete Formulation for the First-Pass Retrieval . . . . . . . . 90 6.2.2 Acoustic Vector Similarity . . . . . . . . . . . . . . . . . . . . . 92 6.2.3 Example-based Pseudo-relevance Feedback . . . . . . . . . . . . 95 6.2.4 Graph-based Re-ranking . . . . . . . . . . . . . . . . . . . . . . 95 6.3 Experiments for IV queries on Lecture Courses . . . . . . . . . . . . . . 97 6.3.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . 98 6.3.2 Example-based Pseudo-relevance Feedback . . . . . . . . . . . . 99 6.3.3 Graph-based Re-ranking . . . . . . . . . . . . . . . . . . . . . . 101 6.3.4 Experimental Results based on Subword Lattices . . . . . . . . . 103 6.4 Experiments for OOV queries on Lecture Courses . . . . . . . . . . . . . 104 6.4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . 107 6.4.2 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . 108 6.5 Experiments for Broadcast News . . . . . . . . . . . . . . . . . . . . . . 109 6.5.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . 109 6.5.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . 110 6.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 7 Semantic Retrieval for Spoken Content with Acoustic Similarity Graph . . . . . 115 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 7.2 Language Modelling for Spoken Content Retrieval . . . . . . . . . . . . 116 7.2.1 Lattice-derived Document Model . . . . . . . . . . . . . . . . . 117 7.2.2 Acoustic Similarity Enhanced Document Model . . . . . . . . . 119 7.3 Document Expansion with Probabilistic Latent Semantic Analysis . . . . 122 7.4 Query Expansion with Query-regularized Mixture Model . . . . . . . . . 124 7.4.1 Word-based Query Expansion . . . . . . . . . . . . . . . . . . . 125 7.4.2 Topic-enhanced query expansion . . . . . . . . . . . . . . . . . . 127 7.5 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 7.6 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 7.6.1 Basic Language Modelling Retrieval Approach . . . . . . . . . . 129 7.6.2 Document Expansion . . . . . . . . . . . . . . . . . . . . . . . . 131 7.6.3 Query Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . 133 7.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 8 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 8.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 8.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
dc.language.iso	en
dc.title	語音數位內容檢索 ─ 相關回饋、圖論及語意	zh_TW
dc.title	Spoken Content Retrieval - Relevance Feedback, Graphs and Semantics	en
dc.type	Thesis
dc.date.schoolyear	100-2
dc.description.degree	博士
dc.contributor.oralexamcommittee	徐宏民,陳信希,貝蘇章,陳銘憲,雷少民
dc.subject.keyword	語音數位內容檢索,相關回饋,圖論,語意檢索,	zh_TW
dc.subject.keyword	Spoken Content Retrieval,Relevance Feedback,Random Walk,Semantic Retrieval,	en
dc.relation.page	162
dc.rights.note	有償授權
dc.date.accepted	2012-08-14
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	電信工程學研究所	zh_TW
顯示於系所單位：	電信工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-101-1.pdf 目前未授權公開取用	3 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。