專為旋律提取設計的流線型編碼器/解碼器架構

Tsung-Han Hsieh; 謝宗翰

Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/66127

Title:	專為旋律提取設計的流線型編碼器/解碼器架構 A streamlined encoder/decoder architecture for melody extraction
Authors:	Tsung-Han Hsieh 謝宗翰
Advisor:	李琳山(Lin-shan Lee)
Keyword:	旋律提取,編碼/解碼器, melody extraction,encoder/decoder,
Publication Year :	2019
Degree:	碩士
Abstract:	在音樂信號處理的領域中，旋律提取一直是很重要的任務。在本論文中，我們提出了一個專為此設計的流線型編碼/解碼器網路模型。我們有兩項技術貢獻。首先，啟發於一個最先進的語意像素分割模型，我們通過向下池化層和向上池化層之間的池化索引來定位旋律頻率。我們用更少的卷機層與更簡單的卷積模塊就可以達到接近最先進水平的結果。第二，我們提出了一種使用神經網路中瓶頸層來預測每ㄧ楨中旋律是否存在的方法，並且使得我們不需要取闕值，可以用簡單的arg-max函數來獲得最終結果。我們的實驗在人聲旋律提取及主旋律旋律提取上，兩者都驗證了模型的有效性。 Melody extraction in polyphonic musical audio is important for music signal processing. In this paper, we propose a novel streamlined encoder/decoder network that is designed for the task. We make two technical contributions. First, drawing inspiration from a state-of-the-art model for semantic pixelwise segmentation, we pass through the pooling indices between pooling and un-pooling layers to localize the melody in frequency. We can achieve result close to the state-of-the-art with much fewer convolutional layers and simpler convolution modules. Second, we propose a way to use the bottleneck layer of the network to estimate the existence of a melody line for each time frame, and make it possible to use a simple argmax function instead of ad-hoc thresholding to get the final estimation of the melody line. Our experiments on both vocal melody extraction and general melody extraction validate the effectiveness of the proposed model.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/66127
DOI:	10.6342/NTU202000419
Fulltext Rights:	有償授權
Appears in Collections:	資料科學學位學程

Files in This Item:

File	Size	Format
ntu-108-1.pdf Restricted Access	2.07 MB	Adobe PDF

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets