Please use this identifier to cite or link to this item:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/74924
Title: | 自監督式語音表徵分解之研究 A Study of Self-supervised Speech Representation Decomposition |
Authors: | Yao-Wen Mao 茅耀文 |
Advisor: | 李琳山(Lin-shan Lee) |
Keyword: | 語音,自監督式,表徵, speech,self-supervised,representation, |
Publication Year : | 2019 |
Degree: | 碩士 |
Abstract: | 本論文探討如何只使用沒有人工標記的語音訊號來分離訊號中全局性和局部性的資訊使其呈現在不同的表徵上。在普遍的認知中,對於同一個人講出來的語音訊號而言,語者特徵 (Speaker Characteristics) 是一個不隨時間變化的資訊,反過來說,語音內容 (Speech Content) 則是與語者特徵無關,且隨著時間變化的資訊。若能將這兩種資訊分離並產生比較容易進行操作的表徵,則有助於各種語音相關的應用。
本論文先重新檢視特性互相獨立的定義,整理語者特徵與語音內容獨立所需要的假設為何。並根據這些假設,以自編碼器 (Autoencoder) 為基本架構,討論要如何對表徵做限制才有辦法控制其性質,將表徵分解的成全局和局部兩個部分。實驗中以語者識別 (Speaker Identification) 和語音辨識 (Speech Recognition) 為主要的檢驗手段,以系統性的方式來觀察不同方法所造成的影響,比較這些方法在不同面向上的優缺點。 This thesis explores how to separate global and local information in the speech signal without human annotation. For speech signals spoken by the same person, speaker characteristics is a time-invariant information. In contrast, the speech content is a time-varying information which is independent of speaker characteristics. Separating these two types of information into different representations that are easier to manipulate can contribute to a variety of speech-related applications. This thesis first re-examines the definition of the independence of properties and what assumptions are needed. Based on these assumptions, we use Autoencoder as the basic architecture and discuss how to restrict the representations in order to control its properties, and decompose them into global and local parts. In the experiments, we use speaker identification and pseech recognition as the main evaluation methods. We systematically investigate the effect of different methods and compare the advantages and disadvantages of these method in different aspects. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/74924 |
DOI: | 10.6342/NTU201904140 |
Fulltext Rights: | 有償授權 |
Appears in Collections: | 電機工程學系 |
Files in This Item:
File | Size | Format | |
---|---|---|---|
ntu-108-1.pdf Restricted Access | 670.96 kB | Adobe PDF |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.