源自聲學語音學、用於有伴奏歌唱音訊分析之概似模型

Yu-Ren Chien; 簡御仁

Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/4020

Title:	源自聲學語音學、用於有伴奏歌唱音訊分析之概似模型 Acoustic-Phonetic Likelihood Models for Analysis of Accompanied Singing Audio
Authors:	Yu-Ren Chien 簡御仁
Advisor:	鄭士康,王新民
Keyword:	旋律抽取,歌詞對齊,歌聲,聲學語音學,基頻修改,聲帶波形,共振峰頻率, melody extraction,lyrics alignment,singing voice,acoustic phonetics,F0 modification,glottal pulse shape,formant frequency,
Publication Year :	2016
Degree:	博士
Abstract:	本論文所探討的主題是有伴奏歌唱錄音的旋律分析以及歌詞分析。為了有效進行此分析，本論文提出一種獨特的概似模型作為方法的核心，它巧妙地結合了聲學語音學的知識以及實際蒐集而得的資料。此模型的基本要素是一套音色吻合度以及發聲狀態吻合度的量化評估方式，可為任一候選基本頻率（基頻）或者候選母音�發聲狀態進行評分。音色吻合度意指某個基頻值的諧波振幅序列所呈現之音色與參考音色之間的相似程度，而參考音色的定義則來自一小組歌聲音色範例。為特定基頻估算音色吻合度時，需要對所有音色範例進行基頻的修改，本論文提出的修改方式利用聲學語音學的模型，將修改前的聲帶波形以及共振峰頻率予以保留。此一概似模型在發聲狀態的部份，對弦波進行偵測、追蹤以及刪減的處理，以便在估計歌聲音量的同時，將伴奏的干擾減至最低。最後基頻或音節的估計值，是由概似模型與事前的順序模型共同決定。在使用多個資料集進行系統測試之前，此方法所涉及的所有數值參數均已完成最佳化，且使用的是數個不與測試資料有任何重複的發展資料集。對照實驗顯示，音色吻合度的使用與否，會在整體旋律正確率上面造成 13% 的差距，同時也會在平均標準化歌詞對齊誤差上面，造成 7% 的差距。 This dissertation addresses melodic and lyrics analysis of accompanied singing recordings. Central to my approach are likelihood models that integrate acoustic-phonetic knowledge and real-world data. These models are based on a timbral fitness score and a voicing fitness score evaluated for each fundamental frequency (F0) or vowel/voicing candidate. Timbral fitness is measured for the partial amplitudes of an F0 value, with respect to a small set of vocal timbre examples. This F0-specific measurement of timbral fitness depends on an acoustic-phonetic F0 modification of each timbre example, which preserves glottal pulse shape and formant frequencies. In the voicing part of the likelihood models, sinusoids are detected, tracked, and pruned to give loudness values that minimize interference from the accompaniment. A final F0 or syllable estimate is determined by a prior sequential model in addition to the likelihood model. The numerical parameters involved in my approach were optimized on several development sets from different sources before the system was evaluated on multiple test sets separate from these development sets. Controlled experiments show that use of the timbral fitness score accounts for a 13% difference in overall melodic accuracy, and a 7% difference in average normalized lyrics alignment error.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/4020
Fulltext Rights:	同意授權(全球公開)
Appears in Collections:	電信工程學研究所

Files in This Item:

File	Size	Format
ntu-105-1.pdf	1.45 MB	Adobe PDF	View/Open

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets