請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/61566完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 李琳山(Lin-shan Lee) | |
| dc.contributor.author | Po-Chih Lin | en |
| dc.contributor.author | 林博智 | zh_TW |
| dc.date.accessioned | 2021-06-16T13:06:02Z | - |
| dc.date.available | 2014-08-06 | |
| dc.date.copyright | 2013-08-06 | |
| dc.date.issued | 2013 | |
| dc.date.submitted | 2013-08-02 | |
| dc.identifier.citation | [1] The 2011 Digital Universe Study: Extracting Value from Chaos,http://www.emc.com/collateral/demos/microsites/emc-digital-universe-2011/index.htm.
[2] Google, http://www.google.com. [3] Bing, http://www.bing.com. [4] Youtube, http://www.youtube.com. [5] Vimeo, http://vimeo.com/. [6] MIT OpenCourseWare, http://ocw.mit.edu/. [7] CNN Video, http://us.cnn.com/video/. [8] Robertson SE, “The probability ranking principle in IR,” Journal of Documentation, vol. 33(4), pp. 294–304, 1977. [9] Text REtrieval Conference (TREC), http://trec.nist.gov/. [10] C. Chelba, T.J. Hazen, and M. Saraclar, “Retrieval and browsing of spoken content,” IEEE Signal Processing Magazine, vol. 25, no. 3, pp. 39–49, 2008. [11] Jean-Manuel Van Thong, Pedro J. Moreno, Beth Logan, Blair Fidler, Katrina Maffey, and Matthew Moores, SPEECHBOT: An Experimental Speech-Based Search Engine for Multimedia Content in the Web, 2001. [12] Sha Meng, Peng Yu, Frank Seide, and Jia Liu, “A study of lattice-based spoken term detection for chinese spontaneous speech,” in ASRU, 2007. [13] J. Scott Olsson, Jonathan Wintrode, and Matthew Lee, “Fast unconstrained audio search in numerous human languages,” in ICASSP, 2008. [14] Yi cheng Pan, Hung lin Chang, and Lin shan Lee, “Analytical comparison between position specific posterior lattices and confusion networks based onwords and subword units for spoken document indexing,” in ASRU, 2007. [15] Y.-C. Pan, H.-L. Chang, , and L.-S. Lee, “Subword-based position specific posterior lattices (s-pspl) for indexing speech information,” in INTERSPEECH, 2007. [16] Roy Wallace, Robbie Vogt, and Sridha Sridharan, “A phonetic search approach to the 2006 nist spoken term detection evaluation,” in INTERSPEECH, 2007. [17] Ville T. Turunen, “Reducing the effect of oov query words by using morph-based spoken document retrieval,” in INTERSPEECH, 2008. [18] DongWang, Joe Frankel, Javier Tejedor, and Simon King, “A comparison of phone and grapheme-based spoken term detection,” in ICASSP, 2008. [19] Shi wook Lee, Kazuyo Tanaka, and Yoshiaki Itoh, “Combining multiple subword representations for open-vocabulary spoken document retrieval,” in ICASSP, 2005. [20] Sha Meng, Peng Yu, Jia Liu, and Frank Seide, “Fusing multiple systems into a compact lattice index for chinese spoken term detection,” in ICASSP, 2008. [21] H. Lin, A. Stupakov, and J. Bilmes, “Spoken keyword spotting via multi-lattice alignment,” in INTERSPEECH, 2008. [22] H. Lin, A. Stupakov, and J. Bilmes, “Improving multi-lattice alignment based spoken keyword spotting,” in ICASSP, 2009. [23] C. Parada, A. Sethy, and B. Ramabhadran, “Query-by-example spoken term detection for oov terms,” in ASRU, 2009. [24] W. Shen, C. M. White, and T. J. Hazen, “A comparison of query-by-example methods for spoken term detection,” in INTERSPEECH, 2009. [25] Etienne Barnard, Marelie Davel, Charl van Heerden, Neil Kleynhans, and Kalika Bali, “Phone recognition for spoken web search,” in MediaEval, 2011. [26] Timothy J. Hazen,Wade Shen, and Christopher White, “Query-by-example spoken term detection using phonetic posteriorgram templates,” in ASRU, 2009. [27] Yaodong Zhang and James R. Glass, “Unsupervised spoken keyword spotting via segmental dtw on gaussian posteriorgrams,” in ASRU, 2009. [28] Chun an Chan and Lin shan Lee, “Unsupervised spoken-term detection with spoken queries using segment-based dynamic time warping,” in INTERSPEECH, 2010. [29] Xavier Anguera, Robert Macrae, and Nuria Oliver, “Partial sequence matching using an unbounded dynamic time warping algorithm,” in ICASSP, 2010. [30] Armando Muscariello, Guillaume Gravier, and Frederic Bimbot, “Zero-resource audio-only spoken term detection based on a combination of template matching techniques,” in INTERSPEECH, 2011. [31] John E. Hopcroft, Rajeev Motwani, and Jeffrey D. Ullman, Introduction to Automata Theory, Languages, and Computation, Addison Wesley, 2nd edition, Nov. 2000. [32] Michael Sipser, Introduction to the Theory of Computation, PWS Pub. Co., 1 edition, Dec. 1996. [33] Mehryar Mohri, “Finite-state transducers in language and speech processing,” Comput. Linguist., vol. 23, no. 2, pp. 269–311, 1997. [34] Karel Culik II and Jarkko Kari, “Digital images and formal languages,” in Handbook of Formal Languages, Grzegorz Rozenberg and Arto Salomaa, Eds., pp. 599–616. Springer Berlin Heidelberg, 1997. [35] Mehryar Mohri, Fernando Pereira, and Michael Riley, “Weighted automata in text and speech processing,” in European Conference on Artificial Intelligence. 1996, pp. 46–50, John Wiley and Sons. [36] W. Kuich and A. Salomaa, Semirings, Automata, Languages, Monographs in Theoretical Computer Science. an EATCS Series. Springer Berlin Heidelberg, 1986. [37] Mehryar Mohri, “On some applications of finite-state automata theory to natural language processing,” Journal of Natural Language Engineering, vol. 2, pp. 1–20, 1996. [38] Cyril Allauzen and Mehryar Mohri, “An optimal pre-determinization algorithm for weighted transducers,” Theoretical Computer Science, vol. 328(1-2), pp. 3–18, 2004. [39] Cyril Allauzen and Mehryar Mohri, “Finitely subsequential transducers,” International Journal of Foundations of Computer Science, vol. 14(6), pp. 983–994, 2003. [40] Mehryar Mohri, “Minimization algorithms for sequential transducers,” Theor. Comput. Sci., vol. 234, no. 1-2, pp. 177–201, Mar. 2000. [41] Alfred V. Aho, John E. Hopcroft, and Jeffrey D. Ullman, The Design and Analysis of Computer Algorithms, Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1st edition, 1974. [42] D. Revuz, “Minimisation of acyclic deterministic automata in linear time,” Theoretical Computer Science, vol. 92, no. 1, pp. 181–189, Jan. 1992. [43] Fernando C. N. Pereira and Michael D. Riley, “Speech recognition by composition of weighted finite automata,” in Finite-State Language Processing. 1996, pp. 431–453, MIT Press. [44] Cyril Allauzen and Mehryar Mohri, “N-way composition of weighted finite-state transducers,” International Journal of Foundations of Computer Science, vol. 20(4), pp. 613–627, 2009. [45] Mehryar Mohri, Fernando Pereira, and Michael Riley, “Weighted finite-state transducers in speech recognition,” Computer Speech and Language, vol. 16, no. 1, pp. 69 – 88, 2002. [46] Mehryar Mohri, Fernando C. Pereira, and Michael Riley, “Speech Recognition with Weighted Finite-State Transducers,” in Springer Handbook of Speech Processing, Jacob Benesty, Mohan M. Sondhi, and Yiteng A. Huang, Eds., pp. 559–584. Springer Berlin Heidelberg, Secaucus, NJ, USA, 2008. [47] Stephan Kanthak, Hermann Ney, Michael Riley, and Mehryar Mohri, “A comparison of two lvr search optimization techniques,” in International Conference on Spoken Language Processing, 2002, pp. 1309–1312. [48] D. Rybach, R. Schluter, and H. Ney, “A comparative analysis of dynamic network decoding,” in ICASSP, 2011, pp. 5184–5187. [49] David Rybach, Hermann Ney, and Ralf Schluter, “Lexical prefix tree and wfst: A comparison of two dynamic search concepts for lvcsr,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 6, pp. 1295 –1307, June 2013. [50] Cyril Allauzen, Mehryar Mohri, and Brian Roark, “Generalized algorithms for constructing statistical language models,” in Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1, Stroudsburg, PA, USA, 2003, ACL ’03, pp. 40–47, Association for Computational Linguistics. [51] T. Hori, Daniel Willett, and Y. Minami, “Language model adaptation using wfst based speaking-style translation,” in ICASSP, 2003, vol. 1, pp. I–228–I–231 vol.1. [52] C. Chelba, J. Schalkwyk, T. Brants, V. Ha, B. Harb, W. Neveitt, C. Parada, and P. Xu, “Query language modeling for voice search,” in SLT, 2010, pp. 127–132. [53] X. Liu, M. J F Gales, J.L. Hieronymus, and P.C. Woodland, “Language model combination and adaptation usingweighted finite state transducers,” in ICASSP, 2010, pp. 5390–5393. [54] Paul R. Dixon, Chiori Hori, and Hideki Kashioka, “A specialized wfst approach for class models and dynamic vocabulary,” in INTERSPEECH, 2012. [55] Hitoshi Yamamoto, Paul R. Dixon, Shigeki Matsuda, Chiori Hori, and Hideki Kashioka, “Tied-state mixture language model for wfst-based speech recognition.,” in INTERSPEECH. 2012, ISCA. [56] C. Allauzen, M. Mohri, M. Riley, and B. Roark, “A generalized construction of integrated speech recognition transducers,” in ICASSP, 2004, vol. 1, pp. I–761–4 vol.1. [57] Philip N. Garner, “Silence models in weighted finite-state transducers,” in INTERSPEECH. 2008, pp. 1817–1820, ISCA. [58] D. Rybach, R. Schluter, and H. Ney, “Silence is golden: Modeling non-speech events in wfst-based dynamic network decoders,” in ICASSP, 2012, pp. 4205–4208. [59] Shiuan-Sung Lin and Franc﹐ois Yvon, “Discriminative training of finite state decoding graphs,” in INTERSPEECH. 2005, pp. 733–736, ISCA. [60] H.-K.J. Kuo, B. Kingsbury, and G. Zweig, “Discriminative training of decoding graphs for large vocabulary continuous speech recognition,” in ICASSP, 2007, vol. 4, pp. IV–45–IV–48. [61] ShinjiWatanabe, Takaaki Hori, and Atsushi Nakamura, “Large vocabulary continuous speech recognition using wfst-based linear classifier for structured data,” in INTERSPEECH, 2010, pp. 346–349. [62] S. Watanabe, T. Hori, E. McDermott, and A. Nakamura, “A discriminative model for continuous speech recognition based on weighted finite state transducers,” in ICASSP, 2010, pp. 4922–4925. [63] M. Lehr and I. Shafran, “Learning a discriminative weighted finite-state transducer for speech recognition,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 5, pp. 1360–1367, 2011. [64] B. Hoffmeister, G. Heigold, D. Rybach, R. Schluter, and H. Ney, “Wfst enabled solutions to asr problems: Beyond hmm decoding,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 2, pp. 551–564, 2012. [65] Yotaro Kubo, Takaaki Hori, and Atsushi Nakamura, “Large vocabulary continuous speech recognition based on wfst structured classifiers and deep bottleneck features,” in ICASSP, 2013. [66] A. Lee, T. Kawahara, and K. Shikano, “Julius — an open source real-time large vocabulary recognition engine.,” in EUROSPEECH, Aalborg, Denmark, 2001, pp. 1691–1694. [67] V. Goffin, C. Allauzen, E. Bocchieri, D. Hakkani-t‥ur, A. Ljolje, S. Parthasarathy, M. Rahim, G. Riccardi, and M. Saraclar, “The at&t watson speech recognizer,” in ICASSP, 2005. [68] David Rybach, Christian Gollan, Georg Heigold, Bj‥orn Hoffmeister, Jonas L‥o‥of, Ralf Schl‥uter, and Hermann Ney, “The rwth aachen university open source speech recognition system,” in INTERSPEECH, 2009. [69] Josef R. Novak, Nobuaki Minematsu, and Keikichi Hirose, “Painless wfst cascade construction for lvcsr - transducersaurus,” in INTERSPEECH, 2011, pp. 1537–1540. [70] Willie Walker, Paul Lamere, Philip Kwok, Bhiksha Raj, Rita Singh, Evandro Gouvea, Peter Wolf, and Joe Woelfel, “Sphinx-4: A Flexible Open Source Framework for Speech Recognition,” Sun Microsystems Technical Report, , no. TR-2004-139, Nov. 2004. [71] Paul Dixon, Tasuku Oonishi, Koji Iwano, and Sadaoki Furui, “Recent development of wfst-based speech recognition decoder,” in Asia-Pacific Signal and Information Processing Association 2009 Annual Summit and Conference, 2009. [72] Daniel Povey, Arnab Ghoshal, Gilles Boulianne, Lukas Burget, Ondrej Glembek, Nagendra Goel, Mirko Hannemann, Petr Motlicek, Yanmin Qian, Petr Schwarz, Jan Silovsky, Georg Stemmer, and Karel Vesely, “The kaldi speech recognition toolkit,” in ASRU. Dec. 2011, IEEE Signal Processing Society, IEEE Catalog No.:CFP11SRW-USB. [73] H.J.G.A. Dolfing and I.L. Hetherington, “Incremental language models for speech recognition using finite-state transducers,” in ASRU, 2001, pp. 194–197. [74] Daniel Willett and Shigeru Katagiri, “Recent advances in efficient decoding combining on-line transducer composition and smoothed language model incorporation,” in ICASSP, 2002, vol. 1, pp. I–713–I–716. [75] D. Caseiro and I. Trancoso, “A specialized on-the-fly algorithm for lexicon and language model composition,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 14, no. 4, pp. 1281–1291, 2006. [76] T. Hori, C. Hori, Y. Minami, and A. Nakamura, “Efficient wfst-based one-pass decoding with on-the-fly hypothesis rescoring in extremely large vocabulary continuous speech recognition,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 4, pp. 1352–1365, 2007. [77] Tasuku Oonishi, Paul R. Dixon, Koji Iwano, and Sadaoki Furui, “Implementation and evaluation of fast on-the-fly wfst composition algorithms,” in INTERSPEECH, 2008, pp. 2110–2113. [78] Cyril Allauzen, Mehryar Mohri, and Murat Saraclar, “General indexation of weighted automata: application to spoken utterance retrieval,” in Proceedings of the Workshop on Interdisciplinary Approaches to Speech Indexing and Retrieval at HLT-NAACL, Stroudsburg, PA, USA, 2004, SpeechIR ’04, pp. 33–40, Association for Computational Linguistics. [79] Oya Aran, Ismail Ari, Lale Akarun, Erinc Dikici, Siddika Parlak, Murat Saraclar, Pavel Campr, and Marek Hruz, “Speech and sliding text aided sign retrieval from hearing impaired sign news videos,” Journal on Multimodal User Interfaces, vol. 2, no. 2, pp. 117–131, 2008. [80] S. Parlak and M. Saraclar, “Spoken term detection for turkish broadcast news,” in ICASSP, 2008, pp. 5244–5247. [81] E. Arisoy, D. Can, S. Parlak, H. Sak, and M. Saraclar, “Turkish broadcast news transcription and retrieval,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 17, no. 5, pp. 874–883, 2009. [82] S. Parlak and M. Saraclar, “Performance analysis and improvement of turkish broadcast news retrieval,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 3, pp. 731–741, 2012. [83] D. Can and M. Saraclar, “Lattice indexing for spoken term detection,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 8, pp. 2338–2347, 2011. [84] Chao Liu, Dong Wang, and Javier Tejedor, “N-gram fst indexing for spoken term detection.,” in INTERSPEECH. 2012, ISCA. [85] T. Hori, I.L. Hetherington, T.J. Hazen, and J.R. Glass, “Open-vocabulary spoken utterance retrieval using confusion networks,” in ICASSP, 2007, vol. 4, pp. IV–73–IV–76. [86] D. Can, E. Cooper, A. Sethy, C. White, B. Ramabhadran, and M. Saraclar, “Effect of pronounciations on oov queries in spoken term detection,” in ICASSP, 2009, pp. 3957–3960. [87] C. Parada, A. Sethy, and B. Ramabhadran, “Balancing false alarms and hits in spoken term detection,” in ICASSP, 2010, pp. 5286–5289. [88] Carolina Parada, Abhinav Sethy, Mark Dredze, and Frederick Jelinek, “A spoken term detection framework for recovering out-of-vocabulary words using the web,” in INTERSPEECH, 2010, pp. 1269–1272. [89] Julien Fayolle, Murat Saraclar, Fabienne Moreau, Christian Raymond, and Guillaume Gravier, “Lexical-phonetic automata for spoken utterance indexing and retrieval.,” in INTERSPEECH. 2012, ISCA. [90] C. Parada, A. Sethy, and B. Ramabhadran, “Query-by-example spoken term detection for oov terms,” in ASRU, 2009, pp. 404–409. [91] Javier Tejedor, Michal Fapsˇo, Igor Szo‥ke, Jan “Honza” Cˇ ernocky’, and Frantisˇek Gr’ezl, “Comparison of methods for language-dependent and language-independent query-by-example spoken term detection,” ACM Trans. Inf. Syst., vol. 30, no. 3, pp. 18:1–18:34, Sept. 2012. [92] A. Blumer, J. Blumer, D. Haussler, A. Ehrenfeucht, M. T. Chen, and J. Seiferas, “The smallest automation recognizing the subwords of a text,” Theoretical Computer Science, vol. 40, 1985. [93] Maxime Crochemore, “Transducers and repetitions,” Theoretical Computer Science, vol. 45, no. 0, pp. 63 – 86, 1986. [94] Nivio Ziviani, Edleno Silva de Moura, Gonzalo Navarro, and Ricardo Baeza-yates, “Compression: A key for next generation text retrieval systems,” IEEE Computer, vol. 33, 2000. [95] Justin Zobel and Alistair Moffat, “Inverted files for text search engines,” ACM Computing Surveys, vol. 38, 2006. [96] Yi-Cheng Pan, “One-pass and word-graph-based search algorithms for large vocabulary continuous mandarin speech recognition,” M.S. thesis, National Taiwan University, Taiwan, 2002. [97] Cyril Allauzen, Michael Riley, Johan Schalkwyk, Wojciech Skut, and Mehryar Mohri, “OpenFst: A general and efficient weighted finite-state transducer library,” in International Conference on Implementation and Application of Automata, 2007. [98] J. J. Rocchio, “Relevance feedback in information retrieval,” in The Smart retrieval system - experiments in automatic document processing, G. Salton, Ed., pp. 313–323. Englewood Cliffs, NJ: Prentice-Hall, 1971. [99] Xiang Sean Zhou and Thomas S. Huang, “Relevance feedback in image retrieval: A comprehensive review,” Multimedia Systems, vol. 8, no. 6, pp. 536–544, 2003. [100] Ian Ruthven and Mounia Lalmas, “A survey on the use of relevance feedback for information access systems,” Knowl. Eng. Rev., vol. 18, no. 2, pp. 95–145, June 2003. | |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/61566 | - |
| dc.description.abstract | 加權有限狀態轉換器由於完備的理論以及高效率的演算,已廣泛地被應用在語音處理相關的研究中,例如大字彙連續語音辨識以及語音資訊檢索。本論文著重在語音資訊檢索中的口述語彙偵測,並使用加權有限狀態轉換器建立基於詞、字、音節、以及混合式的索引。
我們以上述之架構分別在文字查詢問句以及語音查詢問句兩種情境下,進行在中文廣播新聞上的實驗。在文字查詢問句情境下,我們發現僅需20% 的運算時間就可以得到比基準方法更佳的檢索效能。在語音查詢問句情境下,如果僅用單一辨識單位,辭典內查詢詞適合用基於詞的辨識結果,而辭典外查詢詞適合用基於音節的辨識結果。但如果將詞與次詞單位混合使用,則不論是辭典內或辭典外的查詢詞,都可以超越任一個基於單一辨識單位的辨識結果的效能。這證明詞與次詞單位之辨識結果在檢索程序中的明顯加成性。另一方面,實驗也證明,加權有限狀態轉換器的架構在即時時間倍率和時間複雜度方面也獲得大幅進步。 | zh_TW |
| dc.description.abstract | With well developed theory and high efficiency in computation, weighted finite state transducers have been widely used in various tasks in speech signal processing, including large vocabulary continuous speech recognition and speech information retrieval. In this thesis, we focus on spoken term detection which is a sub-task of speech information retrieval, and use weighted finite state transducers to construct word-based, character based, syllable-based, as well as hybrid indices.
We evaluated this framework with a Chinese broadcast news corpus in two scenarios, text queries and spoken queries. For text queries, we achieved better performance as compared to the baseline with only 20% of computation. For spoken queries, if only the recognition results for a single unit were used, the word-based index was better for in vocabulary (IV) queries while the syllable-based index was better for out-of-vocabulary (OOV) queries. But the hybrid index integrating results for different units outperformed every individual index based on results for the individual unit for both IV and OOV queries. This verified the obvious complimentarity between recognition results based on words and sub-word units for this task. It was also shown that the real time factor and the time complexity was dramatically improved by weighted finite state transducers. | en |
| dc.description.provenance | Made available in DSpace on 2021-06-16T13:06:02Z (GMT). No. of bitstreams: 1 ntu-102-R00921030-1.pdf: 2463997 bytes, checksum: e21fee0d2493e6973b1cb526298e0216 (MD5) Previous issue date: 2013 | en |
| dc.description.tableofcontents | 口試委員會審定書. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
中文摘要. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii 英文摘要. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii 一、導論. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 研究背景. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 研究動機. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 研究貢獻. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.4 章節安排. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 二、背景知識. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1 資訊檢索與語音資訊檢索. . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1.1 資訊檢索. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1.2 語音資訊檢索. . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.1.3 資訊檢索評估機制. . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 自動機理論(Automata Theory) . . . . . . . . . . . . . . . . . . . . . . 10 2.2.1 有限狀態機(Finite State Machines) . . . . . . . . . . . . . . . . 11 2.2.2 有限狀態轉換器(Finite State Transducers) . . . . . . . . . . . . 13 2.2.3 空轉移(Epsilon Transition) . . . . . . . . . . . . . . . . . . . . 14 2.2.4 確定性與非確定性(Determinism and Nondeterminism) . . . . . 14 2.3 本章總結. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 三、加權有限狀態轉換器的理論以及在語音處理上的應用. . . . . . . . . . . 17 3.1 理論與定義. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.1.1 半環(Semiring) . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.1.2 加權有限狀態轉換器(Weighted Finite State Transducers) . . . . 19 3.1.3 確定化運算(Determinization) . . . . . . . . . . . . . . . . . . . 20 3.1.4 最小化運算(Minimization) . . . . . . . . . . . . . . . . . . . . 21 3.1.5 組合運算(Composition) . . . . . . . . . . . . . . . . . . . . . . 22 3.1.6 聯集運算(Union) . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.2 在大字彙連續語音辨識上的應用. . . . . . . . . . . . . . . . . . . . . 23 3.2.1 簡介與相關研究. . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.2.2 聲學模型. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.2.3 發音辭典. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.2.4 語言模型. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.2.5 語音辨識. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.3 在語音資訊檢索上的應用. . . . . . . . . . . . . . . . . . . . . . . . . 29 3.3.1 簡介與相關研究. . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.3.2 建立語音文件索引. . . . . . . . . . . . . . . . . . . . . . . . 30 3.3.3 搜尋語音文件. . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.4 本章總結. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 四、使用加權有限狀態轉換器建立索引的語音資訊檢索. . . . . . . . . . . . 33 4.1 系統架構. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.2 建立索引. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.2.1 傳統的索引方法. . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.2.2 使用加權有限狀態轉換器的索引方法. . . . . . . . . . . . . . 36 4.3 使用索引轉換器的語音資訊檢索. . . . . . . . . . . . . . . . . . . . . 37 4.3.1 產生查詢問句轉換器. . . . . . . . . . . . . . . . . . . . . . . 37 4.3.2 組合索引轉換器進行檢索. . . . . . . . . . . . . . . . . . . . 39 4.4 實驗與分析. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.4.1 實驗語料與測試環境. . . . . . . . . . . . . . . . . . . . . . . 39 4.4.2 基準實驗. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.4.3 實驗結果與分析. . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.5 本章總結. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 五、使用加權有限狀態轉換器建立索引的依例查詢語音搜尋. . . . . . . . . . 45 5.1 系統架構. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 5.2 使用索引轉換器的依例查詢語音搜尋. . . . . . . . . . . . . . . . . . 46 5.2.1 以詞為單位. . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 5.2.2 以次詞為單位. . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5.2.3 混合詞與次詞單位. . . . . . . . . . . . . . . . . . . . . . . . 47 5.3 實驗與分析. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 5.3.1 實驗語料與測試環境. . . . . . . . . . . . . . . . . . . . . . . 48 5.3.2 基準實驗. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 5.3.3 實驗結果與分析. . . . . . . . . . . . . . . . . . . . . . . . . . 51 5.4 本章總結. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 六、結論與展望. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 6.1 總結. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 6.2 未來展望. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 參考文獻. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 | |
| dc.language.iso | zh-TW | |
| dc.subject | 口述語彙偵測 | zh_TW |
| dc.subject | 加權有限狀態轉換器 | zh_TW |
| dc.subject | 語音資訊檢索 | zh_TW |
| dc.subject | 依例查詢 | zh_TW |
| dc.subject | Spoken Term Detection | en |
| dc.subject | Query by Example | en |
| dc.subject | Speech Information Retrieval | en |
| dc.subject | Weighted Finite State Transducers | en |
| dc.title | 使用加權有限狀態轉換器的基於混合詞與次詞以文字及語音指令偵測口語詞彙 | zh_TW |
| dc.title | Hybrid Word/Sub-word Based Spoken Term Detection with Text/Spoken Queries Using Weighted Finite State Transducers | en |
| dc.type | Thesis | |
| dc.date.schoolyear | 101-2 | |
| dc.description.degree | 碩士 | |
| dc.contributor.oralexamcommittee | 王小川,鄭秋豫,陳信宏,簡仁宗 | |
| dc.subject.keyword | 加權有限狀態轉換器,語音資訊檢索,口述語彙偵測,依例查詢, | zh_TW |
| dc.subject.keyword | Weighted Finite State Transducers,Speech Information Retrieval,Spoken Term Detection,Query by Example, | en |
| dc.relation.page | 70 | |
| dc.rights.note | 有償授權 | |
| dc.date.accepted | 2013-08-02 | |
| dc.contributor.author-college | 電機資訊學院 | zh_TW |
| dc.contributor.author-dept | 電機工程學研究所 | zh_TW |
| 顯示於系所單位: | 電機工程學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-102-1.pdf 未授權公開取用 | 2.41 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
