用於端到端中文自動語音辨識的語境偏移

張開; Kai Zhang

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/87722

標題:	用於端到端中文自動語音辨識的語境偏移 Contextual Biasing for End-to-End Chinese ASR
作者:	張開 Kai Zhang
指導教授:	張智星 Jyh-Shing Roger Jang
關鍵字:	自動語音辨識,語境偏移,端到端語音辨識,自監督訓練模型,意圖辨識,意圖分類,熱詞, Automatic Speech Recognition,Contextual Bias,Context Biasing,End-to-end Speech Recognition,CATSLU,Intent Classification,Hot Word,
出版年 :	2023
學位:	碩士
摘要:	端到端語音辨識方法相較於傳統方法，魯棒性較高，能在多種情境下提高辨識準確率。但因缺乏獨立語言模型，無法辨識訓練資料外的詞彙，影響部分專有名詞的辨識效果。要適應不同場合，必須對特定領域進行偏移。本研究以CATSLU 資料集為基礎，建構了兩項中文語境偏移的任務，分別針對專有名詞和混合領域的語句。並且探討了四種在語音辨識流程的不同階段進行語境偏移的方法，包括辨識前、模型、解碼和後處理四個階段。實驗結果顯示，所有的偏移方法都在一定程度上提升了語音辨識模型在特定領域的辨識效果。 Compared with traditional methods, end-to-end speech recognition methods havehigher robustness and can improve recognition accuracy in various contexts. However,due to the lack of an independent language model, they are unable to recognize vocabularyoutside of the training data, which affects the recognition effect of some specializedterms. Specific domain adaptation is necessary to adapt to different situations. Basedon the CATSLU dataset, this study constructed two Chinese context biasing tasks targetingspecialized terms and mixed-domain sentences, respectively. Four different methodsof context biasing were explored at different stages of the speech recognition process,including preprocessing, model refinement, decoding strategy, and postprocessing. The experimental results showed that all context biasing methods improved the recognition effect of the speech recognition model in specific domains to some extent.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/87722
DOI:	10.6342/NTU202300681
全文授權:	同意授權(全球公開)
顯示於系所單位：	資訊網路與多媒體研究所

文件中的檔案：

檔案	大小	格式
ntu-111-2.pdf	3.58 MB	Adobe PDF	檢視/開啟

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。