Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/79698
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor張智星(Jyh-Shing Jang)
dc.contributor.authorYu-Li Wangen
dc.contributor.author王俞禮zh_TW
dc.date.accessioned2022-11-23T09:07:57Z-
dc.date.available2021-09-02
dc.date.available2022-11-23T09:07:57Z-
dc.date.copyright2021-09-02
dc.date.issued2021
dc.date.submitted2021-08-24
dc.identifier.citationF. Bao and W. H. Abdulla. Noise masking method based on an effective ratio mask estimation in gammatone channels. APSIPA Transactions on Signal and Information Processing, 7:3–4, 2018. R. M. Bittner, J. Salamon, M. Tierney, M. Mauch, C. Cannam, and J. P. Bello. Med­leydb: A multitrack dataset for annotation­-intensive mir research. In ISMIR, vol­ume 14, pages 155–160, 2014. S. Boll. Suppression of acoustic noise in speech using spectral subtraction.IEEETransactionsonacoustics,speech,andsignalprocessing, 27(2):113–120, 1979. S. Braun and I. Tashev. A consolidated view of loss functions for supervised deep learning­ based speech enhancement.arXivpreprintarXiv:2009.12286, 2020. T.­S. Chan, T.­C. Yeh, Z.­C. Fan, H.­W. Chen, L. Su, Y.­H. Yang, and R. Jang. Vocal activity informed singing voice separation with the ikala dataset. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),pages 718–722. IEEE, 2015. H.­S. Choi, J.­H. Kim, J. Huh, A. Kim, J.­W. Ha, and K. Lee. Phase­-aware speech enhancement with deep complex u­net. In International Conference on Learning Representations, 2018. F. Chollet. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1251–1258, 2017. A. Défossez, N. Usunier, L. Bottou, and F. Bach. Music source separation in the waveform domain.arXivpreprintarXiv:1911.13254, 2019. M. Dukhan, Y. Wu, and H. Lu. Qnnpack: open source library for optimized mobile deep learning, 2018. F. A. Gers, J. Schmidhuber, and F. Cummins. Learning to forget: Continual predic­tion with lstm.na, 1999. R. Giri, U. Isik, and A. Krishnaswamy. Attention wave­u­net for speech enhance­ment. In 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics(WASPAA), pages 249–253. IEEE, 2019. E. Gusó.On Loss Functions for Music Source Separation. PhD thesis, Zenodo, Aug 2020. K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. R. Hennequin, A. Khlif, F. Voituret, and M. Moussallam. Spleeter: a fast and effi­cient music source separation tool with pre­trained models. Journal of Open Source Software, 5(50):2154, 2020. A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam. Mobilenets: Efficient convolutional neural networks for mo­bile vision applications.arXivpreprintarXiv:1704.04861, 2017. A. Jansson, E. Humphrey, N. Montecchio, R. Bittner, A. Kumar, and T. Weyde. Singing voice separation with deep u­net convolutional networks. NA, 2017. S. Jetley, N. A. Lord, N. Lee, and P. H. Torr. Learn to pay attention. arXivpreprintarXiv:1804.02391, 2018. M. Khened, V. A. Kollerathu, and G. Krishnamurthi. Fully convolutional multi­-scale residual densenets for cardiac segmentation and automated cardiac diagnosis using ensemble of classifiers. Medical image analysis, 51:21–45, 2019. D. P. Kingma and J. Ba. Adam: A method for stochastic optimization.arXivpreprintarXiv:1412.6980, 2014. S.B.Kotsiantis, I.Zaharakis, andP.Pintelas. Supervised machine learning: A review of classification techniques. Emerging artificial intelligence applications in computer engineering, 160(1):3–24, 2007. H. Liu, L. Xie, J. Wu, and G. Yang. Channel­wise subband input for bettervoice and accompaniment separation on high resolution music.arXivpreprintarXiv:2008.05216, 2020. Y. Liu, B. Thoshkahna, A. Milani, and T. Kristjansson. Voice and accompanimentseparation in music using self­attention convolutional neural network.arXivpreprintarXiv:2003.08954, 2020. A. Liutkus and F. Stater. sigsep/norbert: First official norbert release, 2019. A. Liutkus, F.­R. Stöter, Z. Rafii, D. Kitamura, B. Rivet, N. Ito, N. Ono, and J. Fonte­cave. The 2016 signal separation evaluation campaign. In P. Tichavský, M. Babaie­Zadeh, O. J. Michel, and N. Thirion­Moreau, editors,Latent Variable Analysisand Signal Separation - ­12th International Conference, LVA/ICA2015, Liberec, CzechRepublic, August25­28, 2015, Proceedings, pages 323–332, Cham, 2017. Springer International Publishing. A. Liutkus, F.­R. Stöter, Z. Rafii, D. Kitamura, B. Rivet, N. Ito, N. Ono, and J. Fonte­cave. The 2016 signal separation evaluation campaign. In International conference on latent variable analysis and signal separation, pages 323–332. Springer, 2017. E. Manilow, G. Wichern, P. Seetharaman, and J. Le Roux. Cutting music source separation some slakh: A dataset to study the impact of training data quality andquantity. In 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics(WASPAA), pages 45–49. IEEE, 2019. H. Nakajima, Y. Takahashi, K. Kondo, and Y. Hisaminato. Monaural source enhancement maximizing source­ to ­distortion ratio via automatic differentiation.arXivpreprintarXiv:1806.05791, 2018. O. Oktay, J. Schlemper, L. L. Folgoc, M. Lee, M. Heinrich, K. Misawa, K. Mori,S. McDonagh, N. Y. Hammerla, B. Kainz, et al. Attention u­net: Learning where tolook for the pancreas.arXivpreprintarXiv:1804.03999, 2018. L. Prétet, R. Hennequin, J. Royo­Letelier, and A. Vaglio. Singing voice separation:A study on training data. In ICASSP 2019­-2019 ieee international conference on acoustics, speech and signal processing(icassp), pages 506–510. IEEE, 2019. Z. Rafii, A. Liutkus, F.­R. Stöter, S. I. Mimilakis, and R. Bittner. Musdb18­a corpusfor music separation.na, 2017. Z. Rafii, A. Liutkus, F.­R. Stöter, S. I. Mimilakis, and R. Bittner. The MUSDB18 corpus for music separation, Dec. 2017. Z. Rafii and B. Pardo. Repeating pattern extraction technique (repet): A sim­ple method for music/voice separation. IEEE transactions on audio, speech, and language processing, 21(1):73–84, 2012. B. P. Rohman, K. Paramayudha, and A. Y. Hercuadi. A novel scheme ofspeech enhancement using power spectral subtraction ­multi­-layer perceptron net­work.Telkomnika, 14(1):181, 2016. O. Ronneberger, P. Fischer, and T. Brox. U­net: Convolutional networks for biomed­ical image segmentation. In International Conference on Medical image computing and computer ­assisted intervention, pages 234–241. Springer, 2015. H. R. Roth, L. Lu, N. Lay, A. P. Harrison, A. Farag, A. Sohn, and R. M. Summers. Spatial aggregation of holistically-­nested convolutional neural networks for auto­mated pancreas localization and segmentation. Medical image analysis, 45:94–107,2018. D. Samuel, A. Ganeshan, and J. Naradowsky. Meta­-learning extractors for mu­sic source separation. In ICASSP 2020-­2020 IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP), pages 816–820. IEEE, 2020. M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.­C. Chen. Mobilenetv2:Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4510–4520, 2018. P. Shaw, J. Uszkoreit, and A. Vaswani. Self­attention with relative position repre­sentations.arXivpreprintarXiv:1803.02155, 2018. D. Stoller, S. Ewert, and S. Dixon. Adversarial semi­-supervised audio source sepa­ration applied to singing voice extraction. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP), pages 2391–2395. IEEE, 2018. D. Stoller, S. Ewert, and S. Dixon. Wave­u­net: A multi­scale neural network for end­-to-­end audio source separation.arXivpreprintarXiv:1806.03185, 2018. F.­R. Stöter and A. Liutkus. museval.na, 2018. F.­R. Stöter, A. Liutkus, and N. Ito. The 2018 signal separation evaluation campaign. In International Conference on Latent Variable Analysis and Signal Separation, pages 293–305. Springer, 2018. N. Takahashi, N. Goswami, and Y. Mitsufuji. Mmdenselstm: An efficient combi­nation of convolutional and recurrent neural networks for audio source separation. In 2018 16th International Workshop on Acoustic Signal Enhancement(IWAENC),pages 106–110. IEEE, 2018. E. Tzinis, S. Wisdom, J. R. Hershey, A. Jansen, and D. P. Ellis. Improving uni­versal sound separation using sound classification. In ICASSP 2020­-2020 IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP),pages 96–100. IEEE, 2020. S. Uhlich, M. Porcu, F. Giron, M. Enenkl, T. Kemp, N. Takahashi, and Y. Mit­sufuji. Improving music source separation based on deep neural networks through data augmentation and network blending. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP), pages 261–265. IEEE, 2017. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser,and I. Polosukhin. Attention is all you need.arXivpreprintarXiv:1706.03762, 2017. E. Vincent, R. Gribonval, and C. Févotte. Performance measurement in blind audio source separation. IEEE transactions on audio, speech, and language processing, 14(4):1462–1469, 2006. N. Wiener. Extrapolation, interpolation and smoothing of stationary. Time Series, with Engineering Applications, 1949. C.­J. Wu, D. Brooks, K. Chen, D. Chen, S. Choudhury, M. Dukhan, K. Hazelwood,E. Isaac, Y. Jia, B. Jia, et al. Machine learning at facebook: Understanding inference at the edge. In 2019 IEEE International Symposiumon High Performance Computer Architecture(HPCA), pages 331–344. IEEE, 2019. H. Zhang, I. Goodfellow, D. Metaxas, and A. Odena. Self­-attention generative adver­sarial networks. In International conference on machine learning, pages 7354–7363.PMLR, 2019.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/79698-
dc.description.abstract歌聲分離領域旨在將音樂中的「主唱音軌」與「伴奏音軌」分離出,可以在 time domain 或是 frequency domain 實現,後者是本研究的重點。深度學習已在現今聲音分離領域中是不可或缺的方法,本研究主要基於 Ronneberger 等人的 U-Net 架構,用於分割生物醫學影像有很好的效果,本論文基於此架構,用於訓練頻譜圖的切割。基於 ratio mask filter 與 Wiener filter 理論,改善現有的 U-Net 模型,在模型的輸出有凸波異常時,可以適時矯正(伴奏 SDR 由 13.805 提升至 14.288);以注意力機制的 attention gate 與 self-attention 改善 U-Net 模型,讓模型可以學到有規律節奏的聲音(伴奏 SDR 由 13.805 提升至 14.457);基於先前頻譜刪減(spectral subtraction)的研究,調整各頻段刪減幅度至最佳,以提升模型輸出,但本研究提出的方法與先前研究提出的刪減幅度相較起來,並無有效提升(伴奏 SDR:baseline—13.805、先前研究—14.031、本次研究—13.895);對 U-Net 進行模型剪枝(model pruning)並最大化保留效能(模型大小由 118.9MB 減少至 59.8MB,伴奏 SDR 由 12.989 降低至 12.771);調整最佳的模型量化(model quantization)參數,以不損失太多效能(模型大小由 118.9MB 減少至 4.75MB,伴奏 SDR 由 12.989 降低至 11.184)。實驗使用到公開的資料集包含:MUSDB18、DSD100、MedleyDB、iKala,非公開的資料集包含:Ke(捷奏錄音室-柯老師)。zh_TW
dc.description.provenanceMade available in DSpace on 2022-11-23T09:07:57Z (GMT). No. of bitstreams: 1
U0001-2408202115100900.pdf: 3846493 bytes, checksum: 7ab42a1c38595946e1aa084a087a6f63 (MD5)
Previous issue date: 2021
en
dc.description.tableofcontents口試委員審定書 — i 致謝 — iii 摘要 — v Abstract — vii 目錄 — ix 圖目錄 — xiii 表目錄 — xvii 第一章 緒論 1 1.1 動機. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 研究方向與主要貢獻. . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.3 章節概要. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 第二章 文獻探討 3 2.1 傳統方法. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.1.1 重複結構擷取. . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.2 深度學習法. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2.1 濾波處理. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2.1.1 Ratio Mask Filter. . . . . . . . . . . . . . . . . . . . 5 2.2.1.2 Wiener Filter. . . . . . . . . . . . . . . . . . . . . . . 5 2.2.1.3 頻譜刪減法. . . . . . . . . . . . . . . . . . . . . . . 6 2.2.2 深度神經模型U­Net. . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2.2.1 Spleeter. . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2.2.2 Demucs. . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2.3 注意力模型. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2.3.1 Self-­attention. . . . . . . . . . . . . . . . . . . . . . 9 2.2.3.2 Attention Gate. . . . . . . . . . . . . . . . . . . . . . 10 2.3 模型壓縮方法. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3.1 模型剪枝. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3.1.1 深度可分卷積. . . . . . . . . . . . . . . . . . . . . . 12 2.3.1.2 Inverted Residuals與Linear Bottlenecks. . . . . . . . 13 2.3.2 模型量化. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.3.2.1 Quantized Neural Networks package. . . . . . . . . 15 第三章 資料集簡介 17 3.1 MusDB18. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.1.1 DSD100. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.1.2 MedleyDB. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.1.3 Museval模型測試指標. . . . . . . . . . . . . . . . . . . . . . . . 19 3.2 iKala. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.3 捷奏錄音室­柯老師. . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.4 其餘資料收集. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 第四章 研究方法 23 4.1 問題定義. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.2 實驗環境. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.3 評量指標. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.4 實驗設計與方法. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.4.1 神經模型訓練設定. . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.4.2 濾波實驗. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.4.2.1 頻譜刪減法. . . . . . . . . . . . . . . . . . . . . . . 29 4.4.3 注意力模型實驗. . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.4.3.1 Self­attention架構實驗. . . . . . . . . . . . . . . . . 33 4.4.3.2 Attention Gate架構實驗. . . . . . . . . . . . . . . . 34 4.4.4 模型剪枝實驗. . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.4.5 模型量化實驗. . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 第五章 實驗結果討論與錯誤分析 39 5.1 實驗一:比較Ratio Mask與Wiener Filter的效果比較. . . . . . . . 39 5.2 實驗二:頻譜刪減法效果比較. . . . . . . . . . . . . . . . . . . . . 41 5.3 實驗三:不同注意力模型效果比較. . . . . . . . . . . . . . . . . . 45 5.4 實驗四:模型剪枝效果比較. . . . . . . . . . . . . . . . . . . . . . 48 5.5 實驗五:模型量化效果比較. . . . . . . . . . . . . . . . . . . . . . 50 第六章 結論與未來展望 53 6.1 結論. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 6.2 未來展望. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 參考文獻 57 附錄A — 提出模型的完整訓練 65 A.1 U­-Net6 (Sattn) 的 L1 loss 下降趨勢. . . . . . . . . . . . . . . . . . . 65 A.2 U­-Net6 (DSConB) 的 L1 loss 下降趨勢. . . . . . . . . . . . . . . . . 65 A.3 U­-Net6 (IRB) 的 L1 loss 下降趨勢. . . . . . . . . . . . . . . . . . . . 66 A.4 以 Museval 指標與目前技術比較. . . . . . . . . . . . . . . . . . . . 66 A.5 有無使用Musdb18資料集訓練之差異. . . . . . . . . . . 68
dc.language.isozh-TW
dc.title使用 U­-Net 及其壓縮版本來進行歌聲分離zh_TW
dc.titleSinging Voice Separation Using U-­Net and Its Compressed Versionen
dc.date.schoolyear109-2
dc.description.degree碩士
dc.contributor.oralexamcommittee李宏毅(Hsin-Tsai Liu),楊奕軒(Chih-Yang Tseng)
dc.subject.keyword歌聲分離,U-­Net,注意力模型,頻譜刪減,深度模型壓,zh_TW
dc.subject.keywordsinging voice separation,U-Net,attention based model,spectrum subtraction,network compression,en
dc.relation.page68
dc.identifier.doi10.6342/NTU202102677
dc.rights.note同意授權(全球公開)
dc.date.accepted2021-08-25
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept資訊工程學研究所zh_TW
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
U0001-2408202115100900.pdf3.76 MBAdobe PDF檢視/開啟
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved