Skip navigation

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More
DSpace logo
English
中文
  • Browse
    • Communities
      & Collections
    • Publication Year
    • Author
    • Title
    • Subject
    • Advisor
  • Search TDR
  • Rights Q&A
    • My Page
    • Receive email
      updates
    • Edit Profile
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電信工程學研究所
Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/65632
Title: 使用二維特徵音框權重法及調變頻譜正規化之強健型語音辨識
Robust Speech Recognition with Two-dimensional Frame-and-feature Weighting and Modulation Spectrum Normalization
Authors: Yang Chang
張暘
Advisor: 李琳山
Keyword: 語音處理,強健化,
speech recognition,robustness,
Publication Year : 2012
Degree: 碩士
Abstract: 本篇論文先概括性的介紹了各種語音強健化的演算法,諸如: 倒頻譜平均值消去法(Cepstral Moment Subtraction, CMS)、倒頻譜正規化法(Cepstral Mean and Variance Normalization, CMVN)、倒頻譜分佈等化法(Histogram Equalization, HEQ)、高階倒頻譜動差正規化法(Higher Order Cepstral Moment Normalization, HOCMN)……等方式,同時也介紹了目前在語音強健化領域所公訂的國際標準語料Aurora 2和Aurora 4,並報告了在這兩套語料上基礎的實驗結果。
以亂度為基礎之特徵權重法(entropy-based feature weighting)、以混淆為基礎之特徵權重法(confusion-based feature weighting)考慮不同參數的辨別能力,在辨識時給予不同參數不同的權重,加強那些擁有較好辨別能力的參數,並利用混淆矩陣(confusion matrix)多加考慮了各種音素之間可能發生錯誤的情形,此種方法除了可以直接應用在梅爾倒頻上,更可和許多現行的語音強健化法做結合。
以支撐向量辨識器為基礎之音框權重法(SVM-based frame weighting),使用機器學習(machine learning)中的支撐向量機器(Support Vector Machine)作為機器,利用音框的能量分佈(energy distribution)及諧波率析(harmonicity estimation)將測試資料分為可信賴音框(reliable frames)、不可信賴音框(unreliable frames),在做辨識過程中時較為依賴可信賴音框來幫助辨識,因此利用支撐向量辨識器的分數來給予可信賴音框較大的權重、不可信賴音框較低的權重。
最後試著結合以混淆為基礎之特徵權重法(confusion-based feature weighting)及以支撐向量辨識器為基礎之音框權重法(SVM-based frame weighting)而成為二維特徵音框權重維特比演算法(Two-dimensional frame-and-feature weighted Viterbi decoding),給予不同參數不同比重、不同音框不同權重。此種方法結合了上述兩種方法的優點而得到更好的進步表現。
In this paper we propose a new approach of two-dimensional frame-and-feature weighted Viterbi decoding performed at the recognizer back-end for robust speech recognition. The frame weighting is based on an Support Vector Machine (SVM) classifier considering the energy distribution and cross-correlation spectrum of the frame. The basic idea is that voiced frames with higher harmonicity is in general more reliable than other frames in noisy speech and therefore should be weighted higher. The feature weighting is based on an entropy measure considering confusion between phoneme classes. The basic idea is that the scores obtained with more discriminating features causing less confusion between phonemes should be weighted higher. These two different weighting schemes on the two different dimensions, frames and features, are then properly integrated in Viterbi decoding. Very significant improvements were achieved in extensive experiments performed with the Aurora 4 testing environment for all types of noise and all SNR values.
URI: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/65632
Fulltext Rights: 有償授權
Appears in Collections:電信工程學研究所

Files in This Item:
File SizeFormat 
ntu-101-1.pdf
  Restricted Access
9.88 MBAdobe PDF
Show full item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved