分析基於雙工序列到序列模型之語言鏈

陳柏文; Bo-Wen Chen

Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/83100

Title:	分析基於雙工序列到序列模型之語言鏈 An Analysis of Duplex Sequence-to-Sequence Learning for Speech Chain
Other Titles:	An Analysis of Duplex Sequence-to-Sequence Learning for Speech Chain
Authors:	陳柏文 Bo-Wen Chen
Advisor:	李宏毅 Hung-Yi Lee
Keyword:	語言鏈,可逆性,雙工性, Speech chain,Reversibility,Duplexity,
Publication Year :	2022
Degree:	碩士
Abstract:	本論文之主軸在於探討利用可逆之類神經網路層建構雙工語言鏈模型，並藉此讓源自平行資料集的雙向監督訊號充分發揮其功效。目前使用雙向監督訊號的方法，主要分成兩種類型：一般的多任務學習以及循環一致性。兩者雖然都使用到雙向監督訊號，但這些方法都有各自的缺點。為了賦予模型雙工性，並且實踐在由語音合成及語音辨識所組成的雙向語言鏈任務上，我們提出了各種可逆的模組及操作，同時也解決了文字與語音長度不匹配的這項挑戰。而本論文所提出之模型是第一個能夠同時處理語音合成及語音辨識的雙工模型，也是第一篇將可逆類神經網路運用在語音任務上的文獻。我們將透過實驗分析是否使用雙向監督訊號將對雙工模型的效能造成何種影響。 The main point of this paper is to explore how to use reversible neural network layers to construct a duplex speech chain model, and thereby make full use of bidirectional supervision signals from parallel datasets. Current methods using bidirectional supervision signals are mainly divided into two categories: general multi-task learning and cycle consistency. Although both categories use bidirectional supervision signals, these methods have their own shortcomings. In order to make the model duplex and apply on the bidirectional speech chain task consisting of speech synthesis and speech recognition, we propose several reversible modules and operations that also tackle the challenge of mismatching text and speech lengths. The proposed model is the first duplex sequence-to-sequence model that can handle both speech synthesis and speech recognition problems, and this is also the first research that applies reversible neural networks to tasks related to speech. And we will analyze how the performance of the duplex model is affected by the use of bidirectional supervision signals.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/83100
DOI:	10.6342/NTU202210030
Fulltext Rights:	同意授權(限校園內公開)
metadata.dc.date.embargo-lift:	2027-11-02
Appears in Collections:	電信工程學研究所

Files in This Item:

File	Size	Format
U0001-1051221107542094.pdf Restricted Access	4.13 MB	Adobe PDF	View/Open

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets