通過半監督學習改進端到端台語至中文語音翻譯

林育駿; Yu-Chun Lin

Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89881

Title:	通過半監督學習改進端到端台語至中文語音翻譯 Improving End-to-end Taiwanese-to-Chinese Speech Translation by Semi-supervised Learning
Authors:	林育駿 Yu-Chun Lin
Advisor:	張智星 Jyh-Shing Roger Jang
Keyword:	自動語音辨識,自監督式學習,端到端語音辨識,機器翻譯, Automatic speech recognition,Self-supervised learning,End-to-end speech recognition,Machine translation,
Publication Year :	2023
Degree:	碩士
Abstract:	台語語音辨識主要面對問題分為: 1. 缺乏大量且公開的台語語料集，2. 台語文字書寫系統不統一，前者導致進行語音辨識的任務上面臨資料不足，後者造成輸出格式不統一且難以讀解。本研究以台語語音辨識結合中文翻譯為任務，透過預訓練語音模型結合端到端深度學習模型的架構，建立台語語音翻譯模型。以少量台語語音配對中文文本語料為基礎，透過大量蒐集網路台語語音資料進行半監督式學習，並設計資料清洗演算法，改善台語語音翻譯系統以及台語語料。研究探討主要分為端到端語音翻譯模型、預訓練語音模型特徵、疊代訓練方法以及語料清洗四種改進方向。根據實驗結果，驗證上述方法皆能有效改善台語語音翻譯中文的表現。 The challenges in Taiwanese speech recognition can be primarily categorized into two aspects: 1) the lack of abundant and publicly available Taiwanese speech corpora, and 2) the inconsistency in the written system of Taiwanese. The former results in insufficient data for speech recognition tasks, while the latter leads to inconsistent output formats and difficulties in interpretation. In this study, we focus on the task of combining Taiwanese speech recognition with Chinese translation and propose a framework that integrates pretrained speech models with end-to-end deep learning models to build a Taiwanese speech translation system. Based on a limited amount of Taiwanese speech-Chinese text paired data, we utilize semi-supervised learning through a large collection of Taiwanese speech data gathered from the internet and design data cleaning algorithms to improve both the Taiwanese speech translation system and the Taiwanese speech corpora. The research explores four main improvement directions: end-to-end speech translation models, pretrained speech model features, iterative training methods, and data cleaning. Experimental results validate the effectiveness of the aforementioned approaches in improving the performance of Taiwanese speech translation to Chinese.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89881
DOI:	10.6342/NTU202301825
Fulltext Rights:	同意授權(全球公開)
metadata.dc.date.embargo-lift:	2028-08-08
Appears in Collections:	資訊工程學系

Files in This Item:

File	Size	Format
ntu-111-2.pdf Until 2028-08-08	2.8 MB	Adobe PDF

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets