Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電機工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/16104
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor鄭士康(Shyh-Kang Jeng)
dc.contributor.authorYu-Chin Shihen
dc.contributor.author施羽芩zh_TW
dc.date.accessioned2021-06-07T18:01:10Z-
dc.date.copyright2012-08-10
dc.date.issued2012
dc.date.submitted2012-08-06
dc.identifier.citation[1] M. P. Lewis, Ethnologue: Languages of the World, 16 ed. Dallas, Tex.: SIL International, 2009.
[2] M. A. Zissman and K. M. Berkling, 'Automatic language identification,' Speech Communication, vol. 35, pp. 115-124, 2001.
[3] B. Ma, C. Guan, H. Li, and C.-H. Lee, 'Multilingual speech recognition with language identification,' Proc. ICSLP, pp. 505-508, 2002.
[4] V. W. Zue and J. R. Glass, 'Conversational interfaces: advances and challenges,' Proc. IEEE, vol. 88, pp. 1166-1180, 2000.
[5] A. Waibel, P. Geutner, L. M. Tomokiyo, T. Schultz, and M. Woszczyna, 'Multilinguality in speech and spoken language systems,' Proc. IEEE, vol. 88, pp. 1297-1313, 2000.
[6] P. Dai, U. Iurgel, and G. Rigoll, 'A novel feature combination approach for spoken document classification with support vector machines,' Proc. Multimedia Information Retrieval Workshop, 2003.
[7] E. Ambikairajah, H. Li, L. Wang, B. Yin, and V. Sethu, 'Language identification: a tutorial,' IEEE Circuits and Systems Magazine, vol. 11, pp. 82-108, 2011.
[8] H. Li, B. Ma, and C.-H. Lee, 'A vector space modeling approach to spoken language identification,' IEEE Trans. Audio, Speech, and Language Process., vol. 15, pp. 271-284, 2007.
[9] J. B. Allen, 'How do humans process and recognize speech?,' IEEE Trans. Speech and Audio Proc., vol. 2, 1994.
[10] Y. Muthusamy, K. Berkling, T. Arai, R. Cole, and E. Barnard, 'A comparison of approaches to automatic language identification using telephone speech,' Proc. Eurospeech, vol. 2, pp. 1307-1310, 1993.
[11] T. J. Hazen and V. W. Zue, 'Segment-based automatic language identification,' J. Acoust. Soc. Amer., vol. 101, pp. 2323-2331, 1997.
[12] M. A. Zissman and E. Singer, 'Automatic language identification of telephone speech messages using phoneme recognition and N-gram modeling,' Proc. ICASSP, vol. 1, pp. 305-308, 1994.
[13] M. Penagarikano, A. Varona, L. J. Rodriguez-Fuentes, and G. Bordel, 'Improved modeling of cross-decoder phone co-occurrences in SVM-based phonotactic language recognition,' IEEE Trans. Audio, Speech, and Language Process., vol. 19, pp. 2348-2363, 2011.
[14] J. L. Gauvain, A. Messaoudi, and H. Schwenk, 'Language recognition using phone lattices,' Proc. ICSLP, pp. 1215-1218, 2004.
[15] J. Hamm and D. D. Lee, 'Grassmann discriminant analysis: a unifying view on subspace-based learning,' Proc. ICML, pp. 376-383, 2008.
[16] I. J. Good, 'The population frequencies of species and the estimation of population parameters,' Biometrika, vol. 40, pp. 237-264, 1953.
[17] L. F. Lamel and J.-L. Gauvain, 'Cross-lingual experiments with phone recognition,' Proc. ICASSP, vol. 2, pp. 507-510, 1993.
[18] M. A. Zissman, 'Comparison of : four approaches to automatic language identification of telephone speech,' IEEE Trans. Speech and Audio Proc., vol. 4, pp. 31-44, 1996.
[19] T. J. Hazen and V. W. Zue, 'Automatic language identification using a segment-based approach,' Proc. Eurospeech, vol. 2, pp. 1303-1306, 1993.
[20] T. J. Hazen and V. W. Zue, 'Recent improvements in an approach to segment-based automatic language identification,' Proc. ICASSP, vol. 4, pp. 1883-1886, 1994.
[21] F. S. Richardson and W. M. Campbell, 'Language recognition with discriminative keyword selection,' Proc. ICASSP, pp. 4145-4148, 2008.
[22] T. Mikolov, O. Plchot, O. Glembek, P. Matejka, L. Burget, and J. Cernocky, 'PCA-based feature extraction for phonotactic language recognition,' Proc. Odyssey, pp. 251-255, 2010.
[23] M. Soufifar, M. Kockmann, L. Burget, O. Plchot, O. Glembek, and T. Svendsen, 'iVector approach to phonotactic language recognition,' Proc. Interspeech, pp. 2913-2916, 2011.
[24] E. Oja and T. Kohonen, 'The subspace learning algorithm as a formalism for pattern recognition and neural networks,' Proc. IEEE Int. Conf. on Neural Networks, vol. 1, pp. 277-284, 1988.
[25] A. Bjoerck and G. H. Golub, 'Numerical methods for computing angles between linear subspaces,' Math. Computation, vol. 27, pp. 579-594, 1973.
[26] I. J. Good, 'Some applications of the singular decomposition of a matrix,' Technometrics, vol. 11, pp. 823-831, 1969.
[27] S.-W. Kim and R. P. W. Duin, 'An empirical comparison of kernel-based and dissimilarity-based feature spaces,' Proc. SSPR&SPR, pp. 559-568, 2010.
[28] J. Hamm, 'Subspace-Based Learning with Grassmann Kernels,' Ph.D. Dissertation, 2008.
[29] B. W. Silverman, Density Estimation for Statistics and Data Analysis. London: Chapman and Hall, 1986.
[30] S. Young, G. Evermann, M. Gales, T. Hain, D. Kershaw, X. A. Liu, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev, and P. Woodland, The HTK book (for HTK version 3.4), 2006.
[31] C.-H. Lee, F. K. Soong, and B.-H. Juang, 'A segment model based approach to speech recognition,' Proc. ICASSP, vol. 1, pp. 501-541, 1988.
[32] B. Hayes and C. Wilson, 'A maximum entropy model of phonotactics and phonotactic learning,' Linguistic Inquiry, vol. 39, pp. 379-440, 2008.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/16104-
dc.description.abstract本論文提出了一個嶄新且基於子空間的方法來實現基於音素結構的自動語言辨識。整個方法分為兩大部分,語音訊號的特徵表示法和基於子空間的學習演算法。前者利用了語音訊號中前後音素的關係與限制,透過自動語音辨識器的解碼、音素序列中各個音素的概似度計算,以及特徵串接,擷取出富含音素資訊的音素框。假設擷取出的音素框分布於一個低維度的特徵子空間,在這個空間中每段語音的結構幾乎可以完全被保留,因此每段語音又可進一步表示成固定維度的子空間。後者以非歐式距離的度量方法測量兩段語音(子空間)之間的相似性或距離,再利用基於距離或基於核的鑑別式分析進行特徵處理,最後使用後端的分類器,像是k鄰近分類法,來進行分類。實驗於OGI-TS和NIST LRE 2005這兩套資料庫上,結果顯示我們提出的方法在相等錯誤率上均勝過以向量空間模型為基礎的方法。zh_TW
dc.description.abstractThis thesis presents a novel subspace-based approach for phonotactic language recognition. The whole framework is divided into two parts: speech feature representations and the subspace-based learning algorithms. First, the phonetic information as well as the contextual relationship, possessed by spoken utterances, are more abundantly retrieved by likelihood computation and feature concatenation through the decoding processed by an automatic speech recognizer. It is assumed that the extracted phone frames reside in a lower dimensional eigen-subspace, in which the structure of data can be approximately captured. Each utterance is further represented by a fixed-dimensional linear subspace. Second, to measure the similarity between two utterances, suitable non-Euclidean metrics are explored and applied to linear discriminant analysis in two kinds of mechanisms: the distance-based and kernel-based learning algorithms, followed by a back-end classifier, such as the k-nearest neighbor (KNN) classifier. The results of experiments on the OGI-TS and the NIST LRE 2005 databases demonstrate that the proposed framework outperforms the well-known vector space modeling based method in equal error rate (EER).en
dc.description.provenanceMade available in DSpace on 2021-06-07T18:01:10Z (GMT). No. of bitstreams: 1
ntu-101-R99921045-1.pdf: 2581385 bytes, checksum: 52835548c34d92667ee399bcebc11d47 (MD5)
Previous issue date: 2012
en
dc.description.tableofcontents口試委員會審定書 i
誌謝 iii
摘要 v
Abstract vii
Contents ix
List of Figures xiii
List of Tables xv
Chapter 1 Introduction 1
1.1 Motivations 1
1.2 Background Knowledge 3
1.2.1 Verification and Identification 4
1.2.2 Evaluation Measures 5
1.2.3 Statistical Spoken Language Recognition Systems 8
1.2.4 Five Levels of Information 10
1.2.5 Why Phonotactic Information? 12
1.3 Literature Survey and Contributions 13
1.4 Organization of the Thesis 16
Chapter 2 Previous Work 17
2.1 Language Modeling Based Methods 17
2.1.1 Parallel Phone Recognition (PPR) 18
2.1.2 Phone Recognition Followed by Language Modeling (PRLM) 19
2.1.3 Parallel PRLM (PPRLM) 21
2.2 Vector Space Modeling (VSM) Based Methods 23
2.3 Cross-Decoder Phone Co-occurrences 25
2.4 iVectors 25
Chapter 3 Proposed Method 27
3.1 Data Representation in Subspace 27
3.1.1 Phonotactic Feature Extraction 27
3.1.2 Frame Concatenation 29
3.1.3 Subspace Generation 30
3.2 Subspace Learning 33
3.2.1 The Grassmann Metric and Kernel 33
3.2.2 Dissimilarity-based Learning Scheme 37
3.2.3 Kernel-based Learning Scheme 38
Chapter 4 Experiments 41
4.1 Experimental Setup 41
4.1.1 Corpora 41
4.1.2 Feature Extraction 44
4.1.3 Universal Phone Set (UPS) 45
4.1.4 Universal Phone Recognizer (UPR) 47
4.2 Experiments on ASLR 48
4.2.1 The Baseline 48
4.2.2 The Subspace-based Method: Eigenphones 51
4.2.3 The Subspace-based Method: Contextual Windows 54
4.2.4 Comparisons 55
Chapter 5 Conclusions 59
Reference 61
The Author’s Publication 65
dc.language.isoen
dc.subject基於子空間學習法zh_TW
dc.subject語言辨識zh_TW
dc.subjectlanguage recognitionen
dc.subjectsubspace-based learningen
dc.title基於子空間之口說語言辨識zh_TW
dc.titleSubspace-based Spoken Language Recognitionen
dc.typeThesis
dc.date.schoolyear100-2
dc.description.degree碩士
dc.contributor.coadvisor王新民(Hsin-Min Wang)
dc.contributor.oralexamcommittee蔡偉和(Wei-Ho Tsai)
dc.subject.keyword語言辨識,基於子空間學習法,zh_TW
dc.subject.keywordlanguage recognition,subspace-based learning,en
dc.relation.page65
dc.rights.note未授權
dc.date.accepted2012-08-06
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept電機工程學研究所zh_TW
顯示於系所單位:電機工程學系

文件中的檔案:
檔案 大小格式 
ntu-101-1.pdf
  未授權公開取用
2.52 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved