使用多重擴張卷積 MMDenseNet 於即時歌曲伴奏分離

李學翰; Hsueh-Han Lee

Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89938

Title:	使用多重擴張卷積 MMDenseNet 於即時歌曲伴奏分離 Real-Time Accompaniment Extraction with Multi-Dilated Convolution MMDenseNet
Authors:	李學翰 Hsueh-Han Lee
Advisor:	張智星 Roger Jang
Keyword:	音樂聲部分離,歌曲人聲分離,MMDenseNet,多重擴張卷積,頻譜遮罩預測,即時分離, Music Source Separation,Singing Voice Separation,MMDenseNet,Multi-dilated Convolution,Spectral mask estimation,Real-time separation,
Publication Year :	2023
Degree:	碩士
Abstract:	「音樂聲部分離」為音樂資訊檢索領域中重要研究方向，其目標為將一由多部聲源混合而成之音樂訊號，還原回各自混合前的訊號。而音樂聲部分離的子任務「歌曲人聲分離」，則致力於將音樂訊號還原為「人聲」和「伴奏」兩個音軌，即使已有許多研究提出架構達到良好的分離效果，卻都伴隨相當龐大的運算資源與時間，並不適用於即時分離系統的應用，因此如何即時進行伴奏音軌的分離，即為本文研究方向。本文使用音樂聲部分離領域中一輕量模型架構 MMDenseNet，先以遮罩預測、多重擴張卷積、增加模型複雜度等方式提升分離效果，再以縮短模型輸入長度和上下文聚合等方式降低延遲時間，以達到擁有良好分離效果且低延遲之模型。 Music source separation (MSS) is an important research task in the music information retrieval (MIR) domain which aims to recover the mixing of musical signals to individual audio tracks. And its subtask, singing voice separation (SVS), is dedicated to recovering the signal to vocals and accompaniment tracks merely. Although several studies proposed their methods to achieve outstanding performances, the massive computing power and processing time limit the applications on edge devices. Therefore, extracting the accompaniment track in real-time with limited resources is the main target in this article. A lightweight MSS model, MMDenseNet, is used in this study. With mask estimation, multi-dilated convolution, and model complexity increasing, the separation performance is enhanced. And with shorter model input duration and context aggregation, the latency is decreased. Therefore the separation can be performed in real time and the performance is sustained.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89938
DOI:	10.6342/NTU202301173
Fulltext Rights:	同意授權(全球公開)
Appears in Collections:	資訊網路與多媒體研究所

Files in This Item:

File	Size	Format
ntu-111-2.pdf	5.85 MB	Adobe PDF	View/Open

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets