音訊標準格式轉換與音樂搜尋系統

Fang-Chun Yeh; 葉芳均

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/38896

標題:	音訊標準格式轉換與音樂搜尋系統 Efficient Standard Audio Format Conversion and Music Retrieval System
作者:	Fang-Chun Yeh 葉芳均
指導教授:	貝蘇章
關鍵字:	音樂格式轉換,無理倍數升頻,無理倍數降頻,內涵式音樂搜尋系統,音高,兩次傅利葉轉換,走音修正, music format conversion,irrational decomation,irrational interpolation,cobtebt based music retrieval system,pitch,Fourier of Fourier transform,off-key adjustment,
出版年 :	2005
學位:	碩士
摘要:	在這個電腦科技迅速成長的時代，數位多媒體資訊儲存以及搜尋技術成為各大公司兵家必爭的事業版圖，因此各種不同的數位音樂格式也相繼產生。但由於數位音樂資料格式制定的差異，使得音樂資料在檔案類型轉換時必須要做取樣頻率的轉換。此外，隨著網路普及度以及傳輸頻寬不斷成長，數位多媒體資料庫搜尋系統也開始受到重視，為了提供使用者方便且直覺性的搜尋引擎，內涵式搜尋系統一直是學者們致力研究的主要課題。在本論文中涵蓋了音訊標準格式轉換及內涵式音樂搜尋系統兩個主題。由於音訊取樣頻率轉換過程中最重要的參數就是比例縮放參數(scaling factor)，以傳統的多速率(Multi-rate)信號處理方式，比例縮放參數的設定必須為有理數，亦即先將兩個音樂格式的取樣頻率比例約分到最簡，再進行整數的升頻與降頻，以完成取樣頻率轉換的動作。然而對於無理數的比例縮放參數則沒有定論。所以我們在第一個主題-音訊標準格式轉換中，利用信號頻域與時域的關係，在頻譜上作處理，以實踐無理數比例縮放的目的。在第二個主題-內涵式音樂搜尋系統中，我們以音樂旋律當做搜尋的特徵依據，利用兩次傅利葉轉換進行音高追蹤，再將所得的音高資訊與資料庫中做比對。在比對的過程中，我們使用的是以第一個音高為基礎的相對音高輪廓讓系統大幅度加速，並配合走音修正，以提昇搜尋準確度。目前系統資料庫中共有96首音樂資料，本系統使用樂句搜尋，讓使用者可以不必被限制從歌曲的一開頭哼唱，因此系統中共計約有745段樂句。在使用音高輪廓特徵使系統加速並配合走音修正後，平均一首八秒鐘的哼唱輸入，所需要的搜尋時間約為1.9秒，而在500首測試歌曲情況下，搜尋出正確歌曲在前十名的比例約為74.60%。 In the last few years, digital multimedia storage and searching technique become more and more important as personal computer facilities progressing rapidly. For this reason, several digital music formats were published in succession. Because of the difference between music formats, sampling rate conversion (SRC) becomes an important process when we want to convert from one music format into another. Furthermore, due to the popularity of broadband network, multimedia database searching is also a technique eager to be developed. In order to provide users a convenient and intuitional searching engine, content-based music information retrieval system is always an interesting topic. There are two subjects covered in this thesis. In the first subject, we want to introduce an efficient algorithm for standard audio format conversion. Note that the sampling rate conversion process entirely depends on the scaling factor. In conventional multi-rate system, scaling factor must be a rational number. When we want to convert form one audio format into another, we should simplify the sampling frequency ratio to the minimum integer and perform interpolation and decimation. As to the irrational scaling factor, there is no steadfast definition for a long time. In this thesis, we linearly operate the signal in the frequency domain and implement irrational scaling process for audio format conversion efficiently. In the second subject, we want to introduce a different content-based music retrieval system. The system takes humming melody as query input, and use Fourier of Fourier transform algorithm for pitch tracking. In matching scheme, we use the first note based pitch contour as the matching feature and plus an off-key adjustment. Combine these two methods, the system can be greatly speeded up and the recall rate can also be improved. The music database includes 96 music so far, and we use the phrase query mechanism so that users can hum a query starting from any phrase within a song. Thus there are about 745 phrases in out database. After applying first note based pitch contour and off-key adjustment to the matching features, the required searching time is about 1.9 second for an 8 seconds humming query. And for 500 test humming inputs, the top 10 recall rate is about 74.60%.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/38896
全文授權:	有償授權
顯示於系所單位：	電信工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-94-1.pdf 目前未授權公開取用	3.42 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。