Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/65735
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor李琳山(Lin-Shan Lee)
dc.contributor.authorTsung-Wei Tuen
dc.contributor.author涂宗瑋zh_TW
dc.date.accessioned2021-06-17T00:02:27Z-
dc.date.available2012-07-19
dc.date.copyright2012-07-19
dc.date.issued2012
dc.date.submitted2012-07-14
dc.identifier.citationREFERENCE
[1] http://www.google.com/
[2] http://www.bing.com/
[3] http://www.yahoo.com/
[4] http://www.youtube.com/
[5] Julie Beth Lovins, “Development of a Stemming Algorithm” in Mechanical Translation and Computational Linguistics, 1968
[6] W. B. Frakes, “Stemming Algorithm” Information Retrieval Data Structures and Algorithms, 1992
[7] Dong Wang, Simon King, and Joe Frankel, “Stochastic pronunciation modeling
for spoken term detection,” in INTERSPEECH, 2009.
[8] K.Iwata, K.Shinoda, and S.Furui, “Robust spoken term detection using combination of phone-based and word-based recognition,” in INTERSPEECH, 2008.
[9] Hung-yi Lee, Chia-ping Chen, Ching-feng Yeh, Lin-shan Lee, “Improved Spoken Term Detection by Discriminative Training of Acoustic Models based on User Relevance Feedback”, in INTERSPEECH , 2010
[10] Robertson SE, “The probability ranking principle in IR,” Journal of Documentation, vol. 33(4), pp. 294–304, 1977.
[11] Evgeniy Gabrilovich and Shaul Markovitch, “Computing semantic relatedness using wikipedia-based explicit semantic analysis,” in International Joint Conference for Artificial Intelligence, 2007, pp. 1606–1611.
[12] Vassilis Spiliopoulos, Konstantious Kotis, and George A. Vouros, “Semantic retrieval and ranking of semantic web documents using free-form queries,” in International Journal of Metadata, Semantic and Ontologies, 2008, vol. 3, pp.
95–108.
[13] Yusuke Miyao, Tomoko Onta, and Katsuya Masuda, “Semantic retrieval for theaccurate identification of relational concepts in massive textbases,” in International Conference on Computational Linguistics, 2006.
[14] G. Salton, A. Wong, and C. S. Yang, “A vector space model for automatic indexing,” Commun. ACM, vol. 18, no. 11, pp. 613–620, 1975.
[15] Amati, G. and Van Rijsbergen, C.J. Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Transactions on Information Systems 20(4):pp. 357-389, 2002.
[16] Text REtrieval Conference, http://trec.nist.gov/.
[17] M. Saraclar and R. Sproat, “Lattice-based search for spoken utterance,” in HLT, 2004.
[18] Lawrence R. Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE, 77 (2), p. 257–286, February 1989.
[19] Peng Yu, Duo Zhang, and Frank Seide, “Maximum entropy based normalization
ofword posteriors for phonetic and lvcsr lattice search,” in ICASSP, 2006.
[20] Dong Wang, Simon King, Joe Frankel, and Peter Bell, “Term-dependent confidence for out-of-vocabulary term detection,” in INTERSPEECH, 2009.
[21] Jean-Manuel Van Thong, Pedro J. Moreno, Beth Logan, Blair Fidler, Katrina Maffey, and Matthew Moores, SPEECHBOT: An Experimental Speech-Based Search Engine for Multimedia Content in the Web, 2001.
[22] B. Logan, P. Moreno, J. M. Van Thong, and E. Whittacker, “An experimental study of an audio indexing system for the web,” in ICSLP, 2000.
[23] Sha Meng, Peng Yu, Frank Seide, and Jia Liu, “A study of lattice-based spoken
term detection for chinese spontaneous speech,” in ASRU, 2007.
[24] J. Scott Olsson, Jonathan Wintrode, and Matthew Lee, “Fast unconstrained audio search in numerous human languages,” in ICASSP, 2008.
[25] Y.-C. Pan, H.-L. Chang, , and L.-S. Lee, “Subword-based position specific posterior lattices (s-pspl) for indexing speech information,” in INTERSPEECH, 2007.
[26] Yi cheng Pan, Hung lin Chang, and Lin shan Lee, “Analytical comparison between position specific posterior lattices and confusion networks based on words and subword units for spoken document indexing,” in ASRU, 2007.
[27] Roy Wallace, Robbie Vogt, and Sridha Sridharan, “A phonetic search approach to the 2006 nist spoken term detection evaluation,” in INTERSPEECH, 2007.
[28] Ville T. Turunen, “Reducing the effect of oov query words by using morph-based spoken document retrieval,” in INTERSPEECH, 2008.
[29] Dong Wang, Joe Frankel, Javier Tejedor, and Simon King, “A comparison of phone and grapheme-based spoken term detection,” in ICASSP, 2008.
[30] Alvin Garcia and Herbert Gish, “Keyword spotting of arbitrary words using minimal speech resources,” in ICASSP, 2006.
[31] Shi wook Lee, Kazuyo Tanaka, and Yoshiaki Itoh, “Combining multiple subword representations for open-vocabulary spoken document retrieval,” in ICASSP, 2005.
[32] Sha Meng, Peng Yu, Jia Liu, and Frank Seide, “Fusing multiple systems into a compact lattice index for chinese spoken term detection,” in ICASSP, 2008.
[33] Ciprian Chelba and Alex Acero, “Position specific posterior lattices for indexing speech,” in ACL, 2005.
[34] Jorge Silva, Ciprian Chelba, and Alex Acero, “Pruning analysis for the position
specific posterior lattices for spoken document search,” in ICASSP, 2006.
[35] Jorge Silva, Ciprian Chelba, and Alex Acero, “Integration of metadata in spoken document search using position specific posterior latices,” in SLT, 2006.
[36] Takaaki Hori, I. Lee Hetherington, Timothy J. Hazen, and James R. Glass, “Openvocabulary spoken utterance retrieval using confusion networks,” in ICASSP, 2007.
[37] Roy Wallace, Robbie Vogt, and Sridha Sridharan, “Spoken term detection using fast phonetic decoding,” in ICASSP, 2009.
[38] Peng Yu and Frank Seide, “Fast two-stage vocabulary-independent search in spontaneous speech,” in ICASSP, 2005.
[39] Wei-Qiang Zhang and Jia Liu, “Two-stage method for specific audio retrieval,” in ICASSP, 2007.
[40] Sha Meng, Jian Shao, Roger Peng Yu, Jia Liu, and Frank Seide, “Addressing the out-of-vocabulary problem for large-scale chinese spoken term detection,” in INTERSPEECH, 2008.
[41] J. J. Rocchio, “Relevance feedback in information retrieval,” in The SMART retrieval system - experiments in automatic document processing, 1971.
[42] X. S. Zhou and T. S. Huang, “Relevance feedback in image retrieval: A comprehensive review,” in Multimedia systems, 2003.
[43] I. Ruthven and M. Lalmas, “A survey on the use of relevance feedback for information access systems,” in The Knowledge Engineering Review, 2003.
[44] G. Salton, A. Wong, and C. S. Yang, “A vector space model for automatic indexing,” Commun. ACM, vol. 18, no. 11, pp. 613–620, 1975.
[45] Chengxiang Zhai and John Lafferty, “Model-based feedback in the language modeling approach to information retrieval,” in CIKM ’01: Proceedings of the
tenth international conference on Information and knowledge management, New York, NY, USA, 2001, pp. 403–410, ACM.
[46] S. E. Robertson and K. Sparck Jones, “Relevance weighting of search terms,” in Journal of the American Society for Information Science, 1976.
[47] S. Tong and E. Chang, “Support vector machine active learning for image retrieval,” in Proc. ACM Multimedia, 2001.
[48] K.-S. Goh, E. Y. Chang, and W.-C. Lai, “Multimodal concept-dependent active learning for image retrieval,” in Proc. ACM Multimedia, 2004.
[49] J. He, M. Li, H.-J. Zhang, H. Tong, and C. Zhang, “Mean version space: a new active learning method for content-based image retrieval,” in In Proc. MIRWorkshop, ACM Multimedia, 2004.
[50] Diane Kelly and Jaime Teevan, “Implicit feedback for inferring user preference: a bibliography,” 2003, vol. 37, pp. 18–28.
[51] Tao Tao and Cheng Xiang Zhai, “Regularized estimation of mixture models for robust pseudo-relevance feedback,” in SIGIR ’06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, 2006, pp. 162–169.
[52] Jinxi Xu and W. Bruce Croft, “Query expansion using local and global document analysis”, in Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR), 1996.
[53] Yuanhua Lv and ChengXiang Zhai, ”Positional relevance model for pseudo-relevance feedback”, in Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval (SIGIR), 2010.
[54] X. Huang, A. Acero, and H.-W. Hon, “Spoken language processing,” Pearson Education Taiwan Ltd., 2005.
[55] B.-Y. Liang, “Acoustic models for continuous mandarin speech recognition,” M.S.
thesis, NTU, 1998.
[56] Cambridge University Engineering Dept. (CUED), Machine Intelligence Laboratory,”HTK”, http://htk.eng.cam.ac.uk/.
[57] S. M. Katz, “Estimation of probabilities from sparse data for other language component of a speech recognizer,” in IEEE Trans. Acoustics, Speech and Signal Processing, 1987.
[58] Bisani, M., Ney, H., Joint-Sequence Models for Grapheme-to-Phoneme Conversion, Speech Communication , 2008.
[59] Vladimir N. Vapnik, The nature of statistical learning theory, Springer-Verlag New
York, Inc., New York, NY, USA, 1995.
[60] Christopher J.C. Burges, “A tutorial on support vector machines for pattern recognition,” Data Mining and Knowledge Discovery, vol. 2, pp. 121–167, 1998.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/65735-
dc.description.abstract語音資訊檢索在資訊爆炸的時代日益重要,然而目前語音資訊檢索的效能相
對於文字資訊檢索還有很大的差距。本論文提出以虛擬相關回饋為架構改進語音
資訊檢索的效能,亦即在得到第一次檢索結果之後,直接假設部分文件為訓練資
料,訓練支撐向量機來重新評估文件是否與查詢問句相關。虛擬相關回饋為非監
督(Unsupervised)的方法,使用者不需要提供額外的資訊即可改進檢索的效能,
我們提出兩種特徵向量來代表每篇語音文件,分別代表上下文特徵及聲學特徵。
經實驗證實,這兩種不同型態的特徵是可以互補的。
以本論文所提出的方法進行重排序,在辨識率為50.27%的環境下,平均準
確率可以由45.47%進步到54.63%,相對進步率為19.55%;在辨識率為84.08%
的環境下,平均準確率可以由80.48%進步到82.43%,相對進步率為2.42%。顯
示使用這種方式作檢索重排序,在任何辨識環境下皆有助於檢索效能的提升。另
外,當查詢問句為辭典外查詢時,平均準確率可以由42.75%進步到45.88%,相
對進步率為7.32%,因此,使用本論文所提出的方法作重排序,同樣為辭典外查
詢帶來進步。
zh_TW
dc.description.provenanceMade available in DSpace on 2021-06-17T00:02:27Z (GMT). No. of bitstreams: 1
ntu-101-R99922062-1.pdf: 2196559 bytes, checksum: 495962c720e257e4980202db1f63e6e8 (MD5)
Previous issue date: 2012
en
dc.description.tableofcontentsCONTENTS
口試委員審定書.............................................................................................................i
誌謝...............................................................................................................................ii
論文摘要.......................................................................................................................iii
一、緒論........................................................................................................................1
1.1 研究背景........................................................................................................1
1.2 研究方向........................................................................................................2
1.3 研究貢獻........................................................................................................3
1.4 章節安排........................................................................................................4
二、背景知識介紹........................................................................................................5
2.1 資訊檢索........................................................................................................5
2.2 語音資訊檢索................................................................................................6
2.3 相關回饋........................................................................................................9
2.4 重排序..........................................................................................................11
2.5 資訊檢索評估機制......................................................................................12
2.6 本章總結......................................................................................................14
三、傳統語音資訊檢索..............................................................................................15
3.1 語音辨識系統..............................................................................................15
3.1.1 抽取聲學特徵…..............................................................................15
3.1.2 訓練聲學模型…..............................................................................16
3.1.3 訓練語言模型…..............................................................................18
3.1.4 辨識結果…......................................................................................19
3.2 檢索系統…..................................................................................................22
3.2.1 以詞為單位詞圖…......................................................................22
3.2.2 以音節為單位詞圖…..................................................................23
3.3 本章總結......................................................................................................24
四、語音資料庫及實驗環境設定…..........................................................................25
4.1 檢索測試語料…..........................................................................................25
4.2 聲學模型…..................................................................................................25
4.3 語言模型與辭典…......................................................................................28
4.4 辨識結果…..................................................................................................29
4.5 基準實驗…..................................................................................................29
4.5.1 辭典內查詢問句..............................................................................29
4.5.2 辭典外查詢問句..............................................................................30
4.6 本章總結......................................................................................................32
五、虛擬相關回饋重估檢索分數…..........................................................................33
5.1 系統架構…..................................................................................................33
5.1.1 虛擬相關回饋…..............................................................................33
5.1.2 抽取特徵向量…..............................................................................34
5.1.3 利用支撐向量機(Support Vector Machine, SVM)訓練模型....35
5.1.4 支撐向量機信心分數…..................................................................36
5.1.5 與原始相關分數整合…..................................................................37
5.2 上下文特徵…..............................................................................................38
5.2.1 前後詞…..........................................................................................39
5.2.2 完整語句…......................................................................................39
5.2.3 前詞與後詞…..................................................................................40
5.2.4 整合前詞後詞與完整語句…..........................................................40
5.2.5 實驗與分析…..................................................................................41
5.3 聲學特徵…..................................................................................................45
5.3.1 以詞為基礎…..................................................................................45
5.3.2 以音素為基礎 …..........................................................................46
5.3.3 以隱藏式馬可夫模型為基礎…......................................................47
5.3.4 實驗與分析…..................................................................................47
5.4 結合上下文特徵與聲學特徵…..................................................................50
5.5 本章總結…..................................................................................................51
六、辭典外查詢…......................................................................................................53
6.1 以強制對齊尋找假定相符區域…..............................................................53
6.2 上下文特徵…..............................................................................................56
6.3 聲學特徵…..................................................................................................56
6.4 實驗與分析…..............................................................................................56
6.5 本章總結......................................................................................................58
七、結論與展望…......................................................................................................59
7.1 總結…..........................................................................................................59
7.2 未來展望…..................................................................................................60
參考文獻…..................................................................................................................61
圖目錄
2.1 資訊檢索系統基本架構 ................................................................................ 5
2.2 語音資訊檢索系統基本架構 ........................................................................ 7
2.3 相關回饋基本架構 ........................................................................................ 9
2.4 重排序基本架構圖 ...................................................................................... 11
2.5 準確率、召回率與平均準確率之關係 ...................................................... 13
3.1 梅爾倒頻譜係數抽取流程 .......................................................................... 16
3.2 隱藏式馬可夫模型 ...................................................................................... 17
3.3 詞圖示意圖 .................................................................................................. 21
3.4 以音節為單位的詞圖 .................................................................................. 24
5.1 虛擬相關回饋系統架構 .............................................................................. 34
5.2 支撐向量機做二元分類問題 ...................................................................... 35
5.3 前後詞特徵向量 .......................................................................................... 38
5.4 完整語句特徵向量 ...................................................................................... 39
5.5 前詞與後詞特徵向量 .................................................................................. 40
5.6 前後詞特徵在各種辨識環境與不同相關回饋數量下的平均準確率 ...... 41
5.7 語者不特定模型在各種上下文特徵與不同相關回饋數量下的平均準
確率 .............................................................................................................. 43
5.8 初階語者調適模型在各種上下文特徵與不同相關回饋數量下的平均
準確率 .......................................................................................................... 43
5.9 進階語者調適模型在各種上下文特徵與不同相關回饋數量下的平均
準確率 .......................................................................................................... 44
5.10 語者特定模型在各種上下文特徵與不同相關回饋數量下的平均準確
率 .................................................................................................................. 44
5.11 以詞為基礎聲學特徵 .................................................................................. 46
5.12 以音素為基礎聲學特徵 .............................................................................. 46
5.13 以隱藏式馬可夫模型狀態為基礎聲學特徵 .............................................. 47
5.14 語者不特定模型在各種聲學特徵與不同相關回饋數量下的平均準確
率 .................................................................................................................. 48
5.15 初階語者調適模型在各種聲學特徵與不同相關回饋數量下的平均準
確率 .............................................................................................................. 48
5.16 進階語者調適模型在各種聲學特徵與不同相關回饋數量下的平均準
確率 .............................................................................................................. 49
5.17 語者特定模型在各種聲學特徵與不同相關回饋數量下的平均準確率 .. 49
6.1 強制對齊 ...................................................................................................... 54
6.2 以音素為基礎強制對齊 .............................................................................. 55
6.3 以假定相符區域抽取前後文特徵 .............................................................. 55
6.4 以假定相符區域抽取隱藏式馬可夫模型狀態為基礎聲學特徵 .............. 57
表目錄
4.1 不同聲學模型的辨識率 .............................................................................. 28
4.2 辭典內查詢範例 .......................................................................................... 30
4.3 辭典內查詢在不同聲學模型的平均準確率 .............................................. 30
4.4 辭典外查詢範例 .......................................................................................... 31
4.5 辭典外查詢使用不同音節的平均準確率 .................................................. 31
5.1 比較不同特徵與辨識環境時平均準確率 .................................................. 51
6.1 比較辭典外查詢在不同特徵與音節的平均準確率 .................................. 57
dc.language.isozh-TW
dc.subject語音資訊檢索zh_TW
dc.subject支撐向量機zh_TW
dc.subjectSVMen
dc.subjectRetrievalen
dc.title使用上下文及聲學特徵之支撐向量機之語音資訊檢索zh_TW
dc.titleSpeech Information Retrieval Using Support Vector Machines with Context and Acoustic Featuresen
dc.typeThesis
dc.date.schoolyear100-2
dc.description.degree碩士
dc.contributor.oralexamcommittee王小川,鄭秋豫,陳信宏,簡仁宗
dc.subject.keyword語音資訊檢索,支撐向量機,zh_TW
dc.subject.keywordRetrieval,SVM,en
dc.relation.page66
dc.rights.note有償授權
dc.date.accepted2012-07-16
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept資訊工程學研究所zh_TW
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
ntu-101-1.pdf
  未授權公開取用
2.15 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved