Skip navigation

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More
DSpace logo
English
中文
  • Browse
    • Communities
      & Collections
    • Publication Year
    • Author
    • Title
    • Subject
    • Advisor
  • Search TDR
  • Rights Q&A
    • My Page
    • Receive email
      updates
    • Edit Profile
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊網路與多媒體研究所
Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89938
Title: 使用多重擴張卷積 MMDenseNet 於即時歌曲伴奏分離
Real-Time Accompaniment Extraction with Multi-Dilated Convolution MMDenseNet
Authors: 李學翰
Hsueh-Han Lee
Advisor: 張智星
Roger Jang
Keyword: 音樂聲部分離,歌曲人聲分離,MMDenseNet,多重擴張卷積,頻譜遮罩預測,即時分離,
Music Source Separation,Singing Voice Separation,MMDenseNet,Multi-dilated Convolution,Spectral mask estimation,Real-time separation,
Publication Year : 2023
Degree: 碩士
Abstract: 「音樂聲部分離」為音樂資訊檢索領域中重要研究方向,其目標為將一由多部聲源混合而成之音樂訊號,還原回各自混合前的訊號。而音樂聲部分離的子任務「歌曲人聲分離」,則致力於將音樂訊號還原為「人聲」和「伴奏」兩個音軌,即使已有許多研究提出架構達到良好的分離效果,卻都伴隨相當龐大的運算資源與時間,並不適用於即時分離系統的應用,因此如何即時進行伴奏音軌的分離,即為本文研究方向。本文使用音樂聲部分離領域中一輕量模型架構 MMDenseNet,先以遮罩預測、多重擴張卷積、增加模型複雜度等方式提升分離效果,再以縮短模型輸入長度和上下文聚合等方式降低延遲時間,以達到擁有良好分離效果且低延遲之模型。
Music source separation (MSS) is an important research task in the music information retrieval (MIR) domain which aims to recover the mixing of musical signals to individual audio tracks. And its subtask, singing voice separation (SVS), is dedicated to recovering the signal to vocals and accompaniment tracks merely. Although several studies proposed their methods to achieve outstanding performances, the massive computing power and processing time limit the applications on edge devices. Therefore, extracting the accompaniment track in real-time with limited resources is the main target in this article. A lightweight MSS model, MMDenseNet, is used in this study. With mask estimation, multi-dilated convolution, and model complexity increasing, the separation performance is enhanced. And with shorter model input duration and context aggregation, the latency is decreased. Therefore the separation can be performed in real time and the performance is sustained.
URI: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89938
DOI: 10.6342/NTU202301173
Fulltext Rights: 同意授權(全球公開)
Appears in Collections:資訊網路與多媒體研究所

Files in This Item:
File SizeFormat 
ntu-111-2.pdf5.85 MBAdobe PDFView/Open
Show full item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved