基於等效矩形頻寬非負矩陣分解與遞歸神經網路的人聲訊號分離方法

Chao-Wei Su; 蘇兆為

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/19229

標題:	基於等效矩形頻寬非負矩陣分解與遞歸神經網路的人聲訊號分離方法 Singing Voice Separation Using Equivalent Rectangular Bandwidth NMF and Recurrent Neural Network
作者:	Chao-Wei Su 蘇兆為
指導教授:	陳文進(Wen-Chin Chen)
共同指導教授:	吳家麟(Ja-Ling Wu)
關鍵字:	音樂資訊檢索,訊號分離,非負矩陣分解,等效矩形頻寬,傅立葉轉換,深度學習,類神經網路,遞歸神經網路, Music information retrieval,Source separation,Nonnegative matrix factorization,Equivalent rectangular bandwidth,Fourier transform,Deep learning,Neural network,Recurrent neural network,
出版年 :	2016
學位:	碩士
摘要:	音樂資訊檢索在現今的科技中扮演著重要的角色，許多音樂的相關應用，如Shazam、Soundhound、Spotify、及Apple Music等，都需要聽音辨曲、音樂推薦等技術的支援，而這些應用與技術都屬於音樂資訊檢索的範疇。其中，音樂訊號分離是一個被廣為研究的主題。當混合的聲音訊號能夠被分離成原始的組成成分時，在後續的辨識、檢索、再創造等工作上都能在功效上得到很好的改善。傳統上，訊號分離技術是對音訊的短時傅立葉轉換(STFT)做非負矩陣分解(NMF)，而較新的研究顯示，若使用等效矩形頻寬(ERB)所做的頻譜轉換，可以進一步提升非負矩陣分解法的效能。此外，隨著電腦效能的改善，近年來深度學習法(Deep learning)在機器學習領域中取得了非常突出的成績。其中，遞歸神經網路(RNN)對於處理有時間連續性的資料有特別好的效果，且正好與音樂訊號的時間特性吻合，因此也開始被應用在音訊分離的工作上。本篇論文分別研究了前述新舊兩種不同的方式，並將其結合，結果顯示在適當的訓練迭代次數下，兩種方法的結合能夠得到更好的分離效果，以及更快的收斂速度。 Music information retrieval (MIR) plays an important role in today’s society. Many music applications, such as Shazam, Soundhound, Spotify, or Apple Music, in need of “query by humming” or “music recommendation” technique’s help. These techniques are all in the range of MIR domain. One of the most widely discussed topic is source separation of music signal. When the mixed signal can be separated into the components that consist of them, the performance of music recognition, retrieval, or re-creation can be greatly improved. Traditionally, source separation are done by using non-negative matrix factorization (NMF) technique on short-time Fourier transform (STFT) of signals. Recent researches showed that using the spectra of equivalent rectangular bandwidth (ERB) could further improve the performance of NMF. Besides, as the improvement of computing power, deep learning techniques do a great job on the machine learning researches. Among then, recurrent neural network (RNN) has better result on the data which has time continuous feature. This thesis studied both old and new works respectively, and proposed an integral structure. The results show that within the proper amount of iterations, the combination of two methods has better performance and convergence time.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/19229
DOI:	10.6342/NTU201601491
全文授權:	未授權
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-105-1.pdf 目前未授權公開取用	1.35 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。