Skip navigation

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More
DSpace logo
English
中文
  • Browse
    • Communities
      & Collections
    • Publication Year
    • Author
    • Title
    • Subject
    • Advisor
  • Search TDR
  • Rights Q&A
    • My Page
    • Receive email
      updates
    • Edit Profile
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89896
Title: 使用機器學習進行無伴奏合唱的歌聲分離
Machine Learning for Source Separation of A Cappella Music
Authors: Quinn Myles McGarry
Quinn Myles McGarry
Advisor: 張智星
Jyh-Shing Roger Jang
Keyword: 語音分離,歌唱分離,音樂源分離,
Machine learning,Audio source separation,Music source separation,Speech separation,Music information retrieval,Singing separation,A cappella separation,U-Net,TasNet,
Publication Year : 2023
Degree: 碩士
Abstract: None
In recent years there have been many studies done on the problem of speech separation, which attempts to separate audio of multiple people speaking simultaneously into the audio of each speaker individually. However, audio source separation of multiple simultaneous singers is still not well explored and remains a challenge. This is mainly due to the fact that when people are singing their voices tend to “blend” together much more than when speaking, and multiple vocal lines are often singing the same words, and potentially frequencies, in unison. In order to deal with these issues, we propose a new U-Net based model specifically for the purpose of a cappella singing separation of two singers and compare it to three state-of-the-art speech separation models.

There is a large variety in the results of our experiments. The U-Net based network excels at separating music taken from choir datasets, with a max mean SDR of 9.76 dB, but achieves poor results at separating random combinations of singers. The best speech separation network is capable of separating random combinations of singers quite well, with a max mean SDR of 7.64 dB after finetuning but is uncapable of separating samples where the singers are singing the same lyrics simultaneously. This singing separation score is also much lower than the same model’s mean SDR for speech separation of 9.04 dB.

These results are quite nuanced and show that singing separation is a different, and overall, more difficult task than speech separation. However, they also show that both a U-Net based network, and one based on contemporary speech separation networks may certainly be capable of performing well on it.
URI: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89896
DOI: 10.6342/NTU202303444
Fulltext Rights: 同意授權(全球公開)
Appears in Collections:資訊工程學系

Files in This Item:
File SizeFormat 
ntu-112-1.pdf1.93 MBAdobe PDFView/Open
Show full item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved