請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89938
標題: | 使用多重擴張卷積 MMDenseNet 於即時歌曲伴奏分離 Real-Time Accompaniment Extraction with Multi-Dilated Convolution MMDenseNet |
作者: | 李學翰 Hsueh-Han Lee |
指導教授: | 張智星 Roger Jang |
關鍵字: | 音樂聲部分離,歌曲人聲分離,MMDenseNet,多重擴張卷積,頻譜遮罩預測,即時分離, Music Source Separation,Singing Voice Separation,MMDenseNet,Multi-dilated Convolution,Spectral mask estimation,Real-time separation, |
出版年 : | 2023 |
學位: | 碩士 |
摘要: | 「音樂聲部分離」為音樂資訊檢索領域中重要研究方向,其目標為將一由多部聲源混合而成之音樂訊號,還原回各自混合前的訊號。而音樂聲部分離的子任務「歌曲人聲分離」,則致力於將音樂訊號還原為「人聲」和「伴奏」兩個音軌,即使已有許多研究提出架構達到良好的分離效果,卻都伴隨相當龐大的運算資源與時間,並不適用於即時分離系統的應用,因此如何即時進行伴奏音軌的分離,即為本文研究方向。本文使用音樂聲部分離領域中一輕量模型架構 MMDenseNet,先以遮罩預測、多重擴張卷積、增加模型複雜度等方式提升分離效果,再以縮短模型輸入長度和上下文聚合等方式降低延遲時間,以達到擁有良好分離效果且低延遲之模型。 Music source separation (MSS) is an important research task in the music information retrieval (MIR) domain which aims to recover the mixing of musical signals to individual audio tracks. And its subtask, singing voice separation (SVS), is dedicated to recovering the signal to vocals and accompaniment tracks merely. Although several studies proposed their methods to achieve outstanding performances, the massive computing power and processing time limit the applications on edge devices. Therefore, extracting the accompaniment track in real-time with limited resources is the main target in this article. A lightweight MSS model, MMDenseNet, is used in this study. With mask estimation, multi-dilated convolution, and model complexity increasing, the separation performance is enhanced. And with shorter model input duration and context aggregation, the latency is decreased. Therefore the separation can be performed in real time and the performance is sustained. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89938 |
DOI: | 10.6342/NTU202301173 |
全文授權: | 同意授權(全球公開) |
顯示於系所單位: | 資訊網路與多媒體研究所 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-111-2.pdf | 5.85 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。