Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/79437
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor許永真(Yung-Jen Hsu)
dc.contributor.authorChu-Ying Chanen
dc.contributor.author詹居穎zh_TW
dc.date.accessioned2022-11-23T09:00:26Z-
dc.date.available2021-11-05
dc.date.available2022-11-23T09:00:26Z-
dc.date.copyright2021-11-05
dc.date.issued2021
dc.date.submitted2021-10-19
dc.identifier.citationR. Bittner, J. Salamon, M. Tierney, M. Mauch, C. Cannam, and J. P. Bello. Medleydb: A multitrack dataset for annotation-intensive mir research. In 15th International Society for Music Information Retrieval Conference, Oct 2014. W. Choi, M. Kim, J. Chung, and S. Jung. LaSAFT: Latent source attentive frequency transformation for conditioned source separation. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 171–175, 2021. W. Choi, M. Kim, J. Chung, D. Lee, and S. Jung. Investigating U-Nets with various intermediate blocks for spectrogram-based singing voice separation. In 21th International Society for Music Information Retrieval Conference, OCTOBER 2020. A. Defossez, N. Usunier, L. Bottou, and F. Bach. Music source separation in the waveform domain. arXiv preprint arXiv:1911.13254, 2019. K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, Los Alamitos, CA, USA, Jun 2016. IEEE Computer Society. R. Hennequin, A. Khlif, F. Voituret, and M. Moussallam. Spleeter: a fast and efficient music source separation tool with pre-trained models. Journal of Open Source Software, 5(50):2154, 2020. Deezer Research. Y. Ikemiya, K. Yoshii, and K. Itoyama. Singing voice analysis and editing based on mutually dependent f0 estimation and source separation. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 574–578, 2015. M. A. Islam, S. Jia, and N. D. B. Bruce. How much position information do convolutional neural networks encode? In International Conference on Learning Representations, 2020. A.Jansson,E.J.Humphrey,N.Montecchio,R.M.Bittner,A.Kumar,andT.Weyde. Singing voice separation with deep U-Net convolutional networks. In ISMIR, 2017. D. P. Kingma and J. L. Ba. Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations (ICLR), 2015. J. Liu and Y. Yang. Denoising auto-encoder with recurrent skip connections and residual regression for music source separation. CoRR, abs/1807.01898, 2018. A. Liutkus, F.-R. Stoter, Z. Rafii, D. Kitamura, B. Rivet, N. Ito, N. Ono, and J. Fontecave. The 2016 signal separation evaluation campaign. In P. Tichavsky ́, M. Babaie-Zadeh, O. J. Michel, and N. Thirion-Moreau, editors, Latent Variable Analysis and Signal Separation 12th International Conference, LVA/ICA 2015, Liberec, Czech Republic, August 25-28, 2015, Proceedings, pages 323–332, Cham, 2017. Springer International Publishing. Y. Luo and N. Mesgarani. Conv-tasnet: Surpassing ideal time–frequency magnitude masking for speech separation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 27(8):1256–1266, 2019. G. Meseguer-Brocal and G. Peeters. Conditioned-U-Net: Introducing a control mechanism in the U-Net for multiple source separations. In ISMIR, editor, 20th International Society for Music Information Retrieval Conference, November 2019. E. Perez, F. Strub, H. de Vries, V. Dumoulin, and A. C. Courville. Film: Visual reasoning with a general conditioning layer. In AAAI, 2018. Z. Rafii, A. Liutkus, F.-R. Stoter, S. I. Mimilakis, and R. Bittner. The MUSDB18 corpus for music separation, Dec. 2017. O. Ronneberger, P. Fischer, and T. Brox. U-Net: Convolutional networks for biomedical image segmentation. In N. Navab, J. Hornegger, W. M. Wells, and A. F. Frangi, editors, Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, pages 234–241, Cham, 2015. Springer International Publishing. S. T. Roweis. One microphone source separation. In NIPS, 2000. D. Samuel, A. Ganeshan, and J. Naradowsky. Meta-learning extractors for music source separation. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 816–820, 2020. M. Senior. Mixing Secrets For The Small Studio. Routledge, 2011. D. Stoller, S. Ewert, and S. Dixon. Wave-U-Net: A multi-scale neural network for end-to-end audio source separation, 2018. F.-R. Stter, S. Uhlich, A. Liutkus, and Y. Mitsufuji. Open-unmix a reference implementation for music source separation. Journal of Open Source Software, 4(41):1667, 2019. N. Takahashi and Y. Mitsufuji. D3net: Densely connected multidilated densenet for music source separation, 2021. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. u. Kaiser, and I. Polosukhin. Attention is all you need. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. E. Vincent, R. Gribonval, and C. Fevotte. Performance measurement in blind audio source separation. IEEE Transactions on Audio, Speech, and Language Processing, 14(4):1462–1469, 2006. Z. Wang and J. Liu. Translating mathematical formula images to latex sequences using deep neural networks with sequence-level training. International Journal on Document Analysis and Recognition (IJDAR), 24:63–75, 2021. Y. Wu, H. Mao, and Z. Yi. Audio classification using attention-augmented convolutional neural network. Knowledge-Based Systems, 161:90–100, 2018. M. Zibulevsky and B. A. Pearlmutter. Blind Source Separation by Sparse Decomposition in a Signal Dictionary. Neural Computation, 13(4):863–882, 04 2001.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/79437-
dc.description.abstract音樂聲部分離 (music source separation) 的目的是將一個由多個聲源 (source) 混合而成的混合音訊,分離回當初混合該音訊的多個聲源。在出現深度學習後,音樂聲部分離 (music source separation) 領域中的模型幾乎都使用了深度學習,有些模型會將音訊透過時頻轉換 (time-frequency transformation) 以時頻譜 (spectrogram) 的形式下進行分離,並且會把它當作是圖像來進行處理,但是時頻譜與處理一般圖像不同的點是,絕對位置的資訊對於時頻譜來說意義非凡,其每一個點的絕對位置都對應其時間點以及頻率,對於使用卷積 (convolution) 的模型,若是堆疊的卷積不夠深,會使模型無法掌握完整的位置資訊。所以此篇論文,提出了 Pos-LaSAFT,其為 LsSAFT 的變形,藉由使用二維位置編碼協助模型可以掌握絕對位置的資訊,來提升模型的表現。此篇論文的結果都使用 SDR 這個度量 (metric) 做評定,且皆使用 MUSDB18 當做資料集,在同樣的實驗參數下, Pos-LaSAFT 的 SDR 平均分數比原始版的 LaSAFT 提高了 0.28 dB,其中在低音吉他這個分部 (stems) 中改善最多,進步了 0.56 dB。zh_TW
dc.description.provenanceMade available in DSpace on 2022-11-23T09:00:26Z (GMT). No. of bitstreams: 1
U0001-1510202117015400.pdf: 5573916 bytes, checksum: 2025e69f99f32af209c0140b6c968543 (MD5)
Previous issue date: 2021
en
dc.description.tableofcontentsAcknowledgments .............................. i Abstract ..................................... iii List of Figures ............................. viii List of Tables .............................. ix Chapter 1 Introduction ...................... 1 1.1 背景與動機 ................................ 1 1.2 論文目標 ................................. 2 1.3 名詞解釋 ................................. 3 1.4 論文架構 ................................. 3 Chapter 2 Related Work ...................... 4 2.1 遮罩.................................... 6 2.2 生成.................................... 6 Chapter 3 Methodology ...................... 9 3.1 基礎架構 ................................. 9 3.1.1 複數轉換頻道........................... 10 3.1.2 LaSAFT中的條件式U-Net ................... 11 3.2 鑲嵌位置資訊............................... 14 3.2.1 位置編碼 ............................. 16 3.2.2 二維位置編碼........................... 17 3.2.3 Pos-LaSAFT ........................... 17 Chapter 4 Experiments ...................... 19 4.1 資料集................................... 19 4.2 度量.................................... 20 4.3 實驗設置 ................................. 21 4.4 結果.................................... 22 Chapter 5 Discussion ........................ 28 5.1 關於資料集 ................................ 28 5.2 聲音的變化性............................... 29 Chapter 6 Conclusion and Future Work .........31 6.1 總結與貢獻 ................................ 31 6.2 位置編碼的極限.............................. 32 6.3 下一步呢? ................................ 33 Bibliography ................................ 34
dc.language.isozh-TW
dc.title使用二維位置編碼改善以時頻譜為基礎的聲源分離模型zh_TW
dc.titlePos-LaSAFT: Improving Spectrogram-based Source Separation Model by Leveraging 2-D Positional Encodingen
dc.date.schoolyear109-2
dc.description.degree碩士
dc.contributor.oralexamcommittee楊奕軒(Hsin-Tsai Liu),陳宏銘(Chih-Yang Tseng),楊智淵,蔡偉和
dc.subject.keyword音樂聲部分離,位置編碼,zh_TW
dc.subject.keywordmusic source separation,positional encoding,en
dc.relation.page37
dc.identifier.doi10.6342/NTU202103763
dc.rights.note同意授權(全球公開)
dc.date.accepted2021-10-20
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept資訊工程學研究所zh_TW
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
U0001-1510202117015400.pdf5.44 MBAdobe PDF檢視/開啟
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved