使用多重擴張卷積 MMDenseNet 於即時歌曲伴奏分離

李學翰; Hsueh-Han Lee

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89938

標題:	使用多重擴張卷積 MMDenseNet 於即時歌曲伴奏分離 Real-Time Accompaniment Extraction with Multi-Dilated Convolution MMDenseNet
作者:	李學翰 Hsueh-Han Lee
指導教授:	張智星 Roger Jang
關鍵字:	音樂聲部分離,歌曲人聲分離,MMDenseNet,多重擴張卷積,頻譜遮罩預測,即時分離, Music Source Separation,Singing Voice Separation,MMDenseNet,Multi-dilated Convolution,Spectral mask estimation,Real-time separation,
出版年 :	2023
學位:	碩士
摘要:	「音樂聲部分離」為音樂資訊檢索領域中重要研究方向，其目標為將一由多部聲源混合而成之音樂訊號，還原回各自混合前的訊號。而音樂聲部分離的子任務「歌曲人聲分離」，則致力於將音樂訊號還原為「人聲」和「伴奏」兩個音軌，即使已有許多研究提出架構達到良好的分離效果，卻都伴隨相當龐大的運算資源與時間，並不適用於即時分離系統的應用，因此如何即時進行伴奏音軌的分離，即為本文研究方向。本文使用音樂聲部分離領域中一輕量模型架構 MMDenseNet，先以遮罩預測、多重擴張卷積、增加模型複雜度等方式提升分離效果，再以縮短模型輸入長度和上下文聚合等方式降低延遲時間，以達到擁有良好分離效果且低延遲之模型。 Music source separation (MSS) is an important research task in the music information retrieval (MIR) domain which aims to recover the mixing of musical signals to individual audio tracks. And its subtask, singing voice separation (SVS), is dedicated to recovering the signal to vocals and accompaniment tracks merely. Although several studies proposed their methods to achieve outstanding performances, the massive computing power and processing time limit the applications on edge devices. Therefore, extracting the accompaniment track in real-time with limited resources is the main target in this article. A lightweight MSS model, MMDenseNet, is used in this study. With mask estimation, multi-dilated convolution, and model complexity increasing, the separation performance is enhanced. And with shorter model input duration and context aggregation, the latency is decreased. Therefore the separation can be performed in real time and the performance is sustained.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89938
DOI:	10.6342/NTU202301173
全文授權:	同意授權(全球公開)
顯示於系所單位：	資訊網路與多媒體研究所

文件中的檔案：

檔案	大小	格式
ntu-111-2.pdf	5.85 MB	Adobe PDF	檢視/開啟

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。