基於等效矩形頻寬非負矩陣分解與遞歸神經網路的人聲訊號分離方法

Chao-Wei Su; 蘇兆為

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/19229

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	陳文進(Wen-Chin Chen)
dc.contributor.author	Chao-Wei Su	en
dc.contributor.author	蘇兆為	zh_TW
dc.date.accessioned	2021-06-08T01:49:43Z	-
dc.date.copyright	2016-09-13
dc.date.issued	2016
dc.date.submitted	2016-07-28
dc.identifier.citation	[1] Te-Won Lee, Michael S. Lewicki, Mark Girolami, and Terrence J. Sejnowski, “Blind source separation of more sources than mixtures using overcomplete representations,” in IEEE Signal Processing Letters, pp. 87–90, Apr. 1999. [2] Nicholas J. Bryan and Gautham J. Mysore, “Interactive refinement of supervised and semi-supervised sound source separation estimates,” in IEEE Transaction on Acoustics, Speech, and Signal Processing, pp. 883-887, May 2013 [3] S. Ewert and M. Muller, “Score informed source separation for music signals,” in Multimodel Music Processing, pp. 73-93, 2012 [4] Mathieu Parvaix, Laurent Girin and Jean-Marc Brossier, “A watermarking-based method for informed source separation of audio signals with a single sensor,” in IEEE Transactions on Audio, Speech and Language Processing, Institute of Electrical and Electronics Engineers, pp. 1464-1475, Aug. 2010 [5] Alexey Ozerovm Pierrick Philippe, Frederic Bimbot, and Remi Gribonval , “Adaptation of Bayesian model for single-channel source separation and its application to voice/music separation in popular songs,” in IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, No. 5, pp. 1564-1578, July 2007 [6] Alexey Ozerov, Emmanuel Vincent, and Frederic Bimbot, “A general flexible framework for the handling of prior information in audio source separation,” in IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, No. 4, pp. 1118-1133, May 2012 [7] Daniel D. Lee and H. Sebastian Seung, “Learning the parts of objects by non-negative matrix factorization” in Nature, vol. 401, pp. 788-791, Oct. 1999 [8] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” J. R. Statist. Soc. Series B (Methodological), vol. 39, pp. 1-38, 1977 [9] Emmanuel Vincent, Nancy Bertin, Remi Gribonval, and Frederic Bimbot, “From blind to guided audio source separation: How models and side information can improve the separation of sound”, in IEEE Signal Processing Magazine, vol. 31, No. 3, pp. 107-115, May 2014 [10] Alexey Ozerov, Antoine Liutkus, and Gael Richard, “A tutorial on informed audio source separation,” slides of ICASSP 2014 tutorial on informed audio source separation [11] Julius O. Smith, Jonathan S. Abel, “Bark and ERB Bilinear Transforms”, in IEEE Transactions on Speech and Audio Processing, pp. 1-32, November 1999 [12] Daniel D. Lee and H. Sebastian Seung, “Algorithms for non-negative matrix factorization,” in Neural Information Processing Systems, pp. 556-562, 2000 [13] Cedric Fevotte, Nancy Bertin, and Jean-Louis Durrieu, “Nonnegative matrix factorization with the Itakura-Saito Divergence: With application to music analysis,” in Neural Computation, vol. 21, No. 3, pp. 793-830, March 2009 [14] Laurent Benaroya, Lorcan M. Donagh, Frederic Bimbot, and Remi Gribonval, “Non negative sparse representation for Wiener based source separation with a single sensor,” in IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 6, pp. VI - 613-16, Apr. 2003 [15] https://en.wikipedia.org/wiki/Recurrent_neural_network [16] Christian Plahl, Michael Kozielski, Ralf Schluter, and Hermann Ney, “Feature combination and stacking of recurrent and non-recurrent neural networks for LVCSR,” in IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6714-6718, May 2013 [17] Po-Sen Huang, Minje Kim, Mark Hasegawa-Johnson, and Paris Smaragdis, “Singing-voice separation from monaural recordings using deep recurrent neural networks,” in Proceedings of the 15th International Society for Music Information Retrieval (ISMIR), 2014 [18] https://en.wikipedia.org/wiki/Rectifier_(neural_networks) [19] Chao-Ling Hsu and Jyh-Shing Roger Jang, https://sites.google.com/site/unvoicedsoundseparation/mir-1k [20] Emmanuel Vincent, Remi Gribonval, and Cedric Fevotte, “Performance measurement in blind audio source separation,” in IEEE Transactions on Audio, Speech and Language Processing, Institute of Electrical and Electronics Engineers, 14(4), pp. 1462-1469, 2006 [21] White, L.S. and King, S. 2003. The EUSTACE speech corpus (http://www.cstr.ed.ac.uk/projects/eustace). Centre for Speech Technology Research, University of Edinburgh.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/19229	-
dc.description.abstract	音樂資訊檢索在現今的科技中扮演著重要的角色，許多音樂的相關應用，如Shazam、Soundhound、Spotify、及Apple Music等，都需要聽音辨曲、音樂推薦等技術的支援，而這些應用與技術都屬於音樂資訊檢索的範疇。其中，音樂訊號分離是一個被廣為研究的主題。當混合的聲音訊號能夠被分離成原始的組成成分時，在後續的辨識、檢索、再創造等工作上都能在功效上得到很好的改善。傳統上，訊號分離技術是對音訊的短時傅立葉轉換(STFT)做非負矩陣分解(NMF)，而較新的研究顯示，若使用等效矩形頻寬(ERB)所做的頻譜轉換，可以進一步提升非負矩陣分解法的效能。此外，隨著電腦效能的改善，近年來深度學習法(Deep learning)在機器學習領域中取得了非常突出的成績。其中，遞歸神經網路(RNN)對於處理有時間連續性的資料有特別好的效果，且正好與音樂訊號的時間特性吻合，因此也開始被應用在音訊分離的工作上。本篇論文分別研究了前述新舊兩種不同的方式，並將其結合，結果顯示在適當的訓練迭代次數下，兩種方法的結合能夠得到更好的分離效果，以及更快的收斂速度。	zh_TW
dc.description.abstract	Music information retrieval (MIR) plays an important role in today’s society. Many music applications, such as Shazam, Soundhound, Spotify, or Apple Music, in need of “query by humming” or “music recommendation” technique’s help. These techniques are all in the range of MIR domain. One of the most widely discussed topic is source separation of music signal. When the mixed signal can be separated into the components that consist of them, the performance of music recognition, retrieval, or re-creation can be greatly improved. Traditionally, source separation are done by using non-negative matrix factorization (NMF) technique on short-time Fourier transform (STFT) of signals. Recent researches showed that using the spectra of equivalent rectangular bandwidth (ERB) could further improve the performance of NMF. Besides, as the improvement of computing power, deep learning techniques do a great job on the machine learning researches. Among then, recurrent neural network (RNN) has better result on the data which has time continuous feature. This thesis studied both old and new works respectively, and proposed an integral structure. The results show that within the proper amount of iterations, the combination of two methods has better performance and convergence time.	en
dc.description.provenance	Made available in DSpace on 2021-06-08T01:49:43Z (GMT). No. of bitstreams: 1 ntu-105-R03922126-1.pdf: 1386091 bytes, checksum: 68b3fb8897a7c65cf044aae9ea6aa5bf (MD5) Previous issue date: 2016	en
dc.description.tableofcontents	誌謝 ii 摘要 iii Abstract iv 圖目錄 1 表目錄 2 論文簡介 3 1.1 訊號分離(Source Separation) 3 1.2 人聲分離(Singing Voice Separation : SVS) 4 1.3 訊號分離的挑戰(Why Source Separation Problem is Difficult?) 4 1.4 本文之貢獻 5 1.5 本文之組織架構 5 相關基礎知識簡述 6 2.1 短時傅立葉轉換(Short-time Fourier Transform : STFT) 6 2.2 等效矩形頻寬(Equivalent Rectangular Bandwidth : ERB) 8 2.3 非負矩陣分解(Non-negative Matrix Factorization : NMF) 8 2.4 維納濾波器(Wiener filter) 11 2.5 遞歸神經網路RNN 11 基於等效矩形頻寬非負矩陣分解與遞歸神經網路的人聲訊號分離方法 13 3.1 基本架構 13 3.2 ERB的相關計算 16 3.3 NMF 的訓練 16 3.4 NMF 分離 17 3.5 維納濾波 17 3.6 特徵結合(Feature Combination) 18 3.7 RNN 訓練 18 3.8 訊號還原 19 實驗方法與結果 20 4.1 測試資料庫 20 4.2 評估準則(尺規) 20 4.3 功效評估 20 4.4 男女人聲分離的測試 25 結論 27 參考文獻 28
dc.language.iso	zh-TW
dc.title	基於等效矩形頻寬非負矩陣分解與遞歸神經網路的人聲訊號分離方法	zh_TW
dc.title	Singing Voice Separation Using Equivalent Rectangular Bandwidth NMF and Recurrent Neural Network	en
dc.type	Thesis
dc.date.schoolyear	104-2
dc.description.degree	碩士
dc.contributor.coadvisor	吳家麟(Ja-Ling Wu)
dc.contributor.oralexamcommittee	李明穗(Ming-Sui Lee),林裕訓
dc.subject.keyword	音樂資訊檢索,訊號分離,非負矩陣分解,等效矩形頻寬,傅立葉轉換,深度學習,類神經網路,遞歸神經網路,	zh_TW
dc.subject.keyword	Music information retrieval,Source separation,Nonnegative matrix factorization,Equivalent rectangular bandwidth,Fourier transform,Deep learning,Neural network,Recurrent neural network,	en
dc.relation.page	29
dc.identifier.doi	10.6342/NTU201601491
dc.rights.note	未授權
dc.date.accepted	2016-07-29
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-105-1.pdf 目前未授權公開取用	1.35 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。