Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電信工程學研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/4884
標題: 使用跨語言聲學模型及音框層級語言識別來辨識高度不平衡雙語混合課程之整合性架構
An Integrated Framework for Recognizing Highly Imbalanced Bilingual Code-switched Lectures with Cross-language Acoustic Modeling and Frame-level Language Identification
作者: Ching-Feng Yeh
葉青峰
指導教授: 李琳山
關鍵字: 語音辨識,雙語混合,聲學模型,語言識別,
Speech Recognition,Code-switching,Acoustic Modeling,Language Identification,
出版年 : 2015
學位: 博士
摘要: 本論文探討一種常見的雙語混合語音的辨識:語者所使用的語句中大部分的語音訊號是用主語言(通常是語者的母語)>所說,但其中包含小部分的詞或片語是用客語言(通常是語者的第二語言)所說的。在此狀況下,不只因為語言在語句>內頻繁切換而造成語音辨識困難,而且客語言的資料量少得多,造成客語言的辨識正確率明顯甚低。本論文提出了一個>辨識這種高度不平衡的雙語混合語音的整合性辨識系統架構。這其中包含了在聲學模型上進行不同層級(模型、狀態、>高斯)的單位融合做到跨語言語料共享,語音單位的恢復加強以重建融合後的聲學模型,依據單位佔用度排序提供更彈>性的跨語言以及語言內的語料共享,以及使用模糊事後機率特徵估測音框層級的語言事後機率等。此外,本論文也將這>些方法延伸到今日最成功的用深層類神經網路作為瓶頸特徵抽取器以及隱藏式馬可夫模型狀態模擬器的兩種方法上。我>們用一套在真實情境下錄製的語料進行統一條件下的測試,將所有提出方法做了完整的比較。實驗結果顯示本論文所提>出的系統架構能夠大幅改善雙語混合語音辨識的正確率。
This thesis considers the recognition of a widely observed type of bilingual code-switched speech: the speaker speaks primarily the host language (usually his native language), but with a few words or phrases in the guest language (usually his second language) inserted in many utterances of the host language. In this case, not only the languages are switched back and forth within an utterance so the language identification is difficult, but much less data are available for the guest language, which results in poor recognition accuracy for the guest language part. In this thesis, we propose an integrated overall framework for recognizing such highly imbalanced code-switched speech. This includes unit merging approaches on three levels of acoustic modeling (triphone models, HMM states and Gaussians) for cross-lingual data sharing, unit recovery for reconstructing the identity for units of the two languages after being merged, unit occupancy ranking to offer much more flexible data sharing between units both across languages and within the language based on the accumulated occupancy of the HMM states, and estimation of frame-level language posteriors using Blurred Posteriorgram Features (BPFs) to be used in decoding. In addition, we also evaluated two approaches extending above approaches based on HMMs to the state-of-the-art deep neural networks (DNNs), including using bottleneck features in HMM/GMM and modeling context-dependent HMM states. We present a complete set of experimental results comparing all approaches involved for a real-world application scenario under unified conditions, and show very good improvement achieved with the proposed approaches.
URI: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/4884
全文授權: 同意授權(全球公開)
顯示於系所單位:電信工程學研究所

文件中的檔案:
檔案 大小格式 
ntu-104-1.pdf10.53 MBAdobe PDF檢視/開啟
顯示文件完整紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved