使用基於發音方式與位置的多任務學習來改進華語大詞彙語音辨識

Yueh-Ting Lee; 李岳庭

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/7291

標題:	使用基於發音方式與位置的多任務學習來改進華語大詞彙語音辨識 Improving Mandarin LVCSR Using Place and Manner Based Multi-task Learning
作者:	Yueh-Ting Lee 李岳庭
指導教授:	張智星
關鍵字:	多任務學習,發音特徵,時延神經網路,大詞彙語音辨識, multi-task learning,articulatory attributes,time-delay neural networks,LVCSR,
出版年 :	2019
學位:	碩士
摘要:	在大詞彙語音辨識的領域中，以DNN-HMM取代GMM-HMM作為聲學模型效果已經有顯著提升。本篇論文使用多任務學習的神經網路模型（multi-task learning，MTL-DNN），除了主要的senone分類之外，我們以發音方式與位置的發音特徵，作為子任務來同時訓練DNN模型，使辨識結果效果提升。相較於前人的研究，我們提出三個改進方法，第一是將發音特徵的標籤分為四個區塊，每個區塊內的特徵彼此互斥，以取代傳統多重標籤（multi-label）的方式，作為子任務的輸出層來訓練MTL-TDNN模型。第二是以時延神經網路（time-delay neural networks，TDNN）來取代傳統神經網路。TDNN的特性可以將較多的前後文資訊加入訓練，第三是將子任務的輸出層接到較底層的隱藏層。實驗的語料為中文廣播新聞語料庫（MATBN），分為小資料集MATBN-20與大資料集MATBN-200，評估方式為字符錯誤率（character error rate，CER），與傳統單任務的TDNN模型做比較，最好的模型在MATBN-20與MATBN-200的相對進步幅度為3.33%與1%。 In large vocabulary continuous speech recognition (LVCSR), it is well known that the recognition performance has been improved by using DNN-HMM instead of GMM-HMM. In this thesis, we use multi-task learning model (MTL-DNN), aiming at simultaneously minimizing the cross-entropy losses with respect to the output scores of senones and articulatory attributes, such as place and manner. The proposed framework has three novelties when compared with previous studies. First, the subtasks designed for articulation classification assure that all attributes are mutually exclusive. Second, instead of fully-connected multilayer perceptrons, the well-known structure of time-delay neural networks is adopted to efficiently model long temporal contexts. Finally, in the proposed MTL-TDNN architecture, layer-wise neuron sharing of subtasks only occurs in the first few layers. We performed experiments on the Mandarin Chinese broadcast news corpus (MATBN), including a small dataset (MATBN-20) and a large dataset (MATBN-200). Compared with the conventional single-task learning TDNN model, the experiments show that the proposed framework achieves relative character error rate (CER) reductions of 3.3\% and 1\% on the small and big datasets, respectively.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/7291
DOI:	10.6342/NTU201901599
全文授權:	同意授權(全球公開)
電子全文公開日期:	2024-07-25
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-108-1.pdf	4.5 MB	Adobe PDF	檢視/開啟

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。