以雙向檢索及排序學習演算法來改進音訊指紋辨識

Tzu-Hsiang Tang; 唐子翔

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/7107

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	張智星(Jyh-Shing Jang)
dc.contributor.author	Tzu-Hsiang Tang	en
dc.contributor.author	唐子翔	zh_TW
dc.date.accessioned	2021-05-17T15:59:38Z	-
dc.date.available	2020-02-18
dc.date.available	2021-05-17T15:59:38Z	-
dc.date.copyright	2020-02-18
dc.date.issued	2020
dc.date.submitted	2020-02-12
dc.identifier.citation	[1] A. L. Wang, “An industrial-strength audio search algorithm,” in ISMIR 2003, 4th Symposium Conference on Music Information Retrieval, 2003, pp. 7–13. [2] Hyoung-Gook Kim, Hye-Seung Cho, and Jin Young Kim, “Robust audio fingerprinting using peak-pair-based hash of non-repeating foreground audio in a real environment,” Cluster Computing, vol. 19, pp. 315–323, 2016. [3] Hsin-Fu Liao, “Improvement of landmark-based audio fingerprinting with target zone and hash table tuning,” M.S. thesis, Department of Computer Science and Information Engineering, College of Electrical Engineering and Computer Science, National Taiwan University, 2018. [4] “音樂史,” https://zh.wikipedia.org/wiki/%E9%9F%B3%E6%A8%82%E5% 8F%B2, Accessed: 2019-11-12. [5] “Soundhound,” https://soundhound.com/, Accessed: 2019-08-30. [6] “Shazam,” https://www.shazam.com/, Accessed: 2019-08-30. [7] “Echonest,” http://the.echonest.com/, Accessed: 2019-08-30. [8] Chris J.C. Burges, Dan Plastina, John Platt, Erin Renshaw, and Rico Malvar, “Using audio fingerprinting for duplicate detection and thumbnail generation,” https://www.microsoft.com/en-us/research/publication/using-audio-fingerprinting-for-duplicate-detection-and-thumbnail-generation/, Accessed: 2019-10-21. [9] Dan Ellis, “Robust landmark-based audio fingerprinting,” http://labrosa.ee.columbia.edu/matlab/fingerprint/, 2009, Accessed: 2019-08-30. [10] Jijun Deng, Wanggen Wan, Ram Swaminathan, Xiaoqing Yu, and Xueqian Pan, “An audio fingerprinting system based on spectral energy structure,” in IET International Conference on Smart and Sustainable City (ICSSC 2011), 2011. [11] Xueqian Pan, Xiaoqing Yu, Jijun Deng, Wei Yang, and Hongxue Wang, “Audio fingerprinting based on local energy centroid,” in IET International Communication Conference on Wireless Mobile and Computing (CCWMC 2011), 2011. [12] Jun Xu and Hang Li, “AdaRank: A Boosting Algorithm for Information Retrieval,” in Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR 2007), 2007.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/7107	-
dc.description.abstract	音訊指紋辨識是一種快速且成熟的音樂檢索手段，使用者輸入其藉由麥克風錄製的一段音訊，讓系統抽取該音訊片段的特徵，再與資料庫中的歌曲特徵進行比對，最後輸出符合程度最高的結果給使用者。本篇論文將以在噪音環境下隨機錄製的音訊當作查詢片段，模擬在現實生活中錄製的音訊的過程與結果，嘗試找出針對抽取歌曲特徵這一步驟的改良方法。在一個以地標為特徵基礎的音訊指紋系統中，我們嘗試改變組成地標方式與內容，也進一步改良地標的檢索方式。其中雜湊表中包含的訊息越多，就可以用更多的條件進行過濾，需要比對的地標數量也會隨之減少，輸出配對成果的速度也會隨之提高。我們也改進了檢索地標的方式，藉由雙向檢索 (bi-directionalretrieve) 得到更多對辨識結果有正向幫助的資訊，來將初始的配對結果進行二次評分，初步提高辨識結果的準確率，接著利用排序學習演算法 (learning to rank) 來重新排序評分結果，使得辨識率進一步提高。	zh_TW
dc.description.abstract	Audio Fingerprint (AFP) Recognition is well known as a rapid and mature strategy in audio information retrieval. End user records an audio snippet as the input of our AFP system, the system would extract the features of the input snippet, then it would compare the features of snippet with the features in database which is formed by selected audio data set (known as ground truth). Finally the system returns the most likely match with details (song name, author, ..., etc) from database. In this thesis, we would randomly record voices with highly noise-affected environment, and set these audio snippets as query piece (input). We use these query piece to simulate the audio record in real life, and try to find a method to improve the way we used for feature extraction. Based on the AFP system which uses landmark as basic feature, we try to change the content in the landmark to format different kind of Hash table. The more information contained in a Hash table, the more criteria we can use to filter the landmarks, and then we can check fewer landmark to get match result, this reduces the query time. We also improve the method for retrieving the hash table. Via Bi-directional Retrieve, we can get much more positive information from the same hash table to re-rank the match result, and increase the accuracy of match result. Further more, we use the algorithm from learning to rank to re-rank the match result, and then get the better accuracy.	en
dc.description.provenance	Made available in DSpace on 2021-05-17T15:59:38Z (GMT). No. of bitstreams: 1 ntu-109-P05922006-1.pdf: 9194362 bytes, checksum: 4beafebdc1be7536811b47ab458be35c (MD5) Previous issue date: 2020	en
dc.description.tableofcontents	誌謝. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i 中文摘要. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii 英文摘要. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii 一、導論. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 研究背景. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 研究方向. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 章節概要. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 二、音訊指紋的相關研究. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1 Wang’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1.1 建立頻域－時域群集圖(constellations map) . . . . . . . . . . . 4 2.1.2 根據地標建構雜湊表. . . . . . . . . . . . . . . . . . . . . . . 5 2.1.3 比對及評分. . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Kim’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2.1 從STFT (Short-time Fourier transform) 到MCLT (Modulated complex lapped transform) . . . . . . . . . . . . . . . . . . . . . 9 2.2.2 利用Cosine Similarity 判別重複的峰點. . . . . . . . . . . . . 10 2.2.3 利用雜湊表處理Time-stretching 以及Pitch-shifting . . . . . . 11 2.2.4 比對及評分. . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 三、實作音訊指紋系統. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.1 系統架構及運作流程簡介. . . . . . . . . . . . . . . . . . . . . . . . . 14 3.2 抽取音訊特徵作為地標. . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.2.1 訊號前處理. . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.2.2 利用高斯衰減篩選突出點. . . . . . . . . . . . . . . . . . . . 17 3.2.3 組合地標與建構雜湊表. . . . . . . . . . . . . . . . . . . . . . 22 3.2.4 以雜湊表建構資料庫. . . . . . . . . . . . . . . . . . . . . . . 25 3.3 辨識查詢片段. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.3.1 比對查詢片段與資料庫. . . . . . . . . . . . . . . . . . . . . . 26 3.3.2 計算偏移時間(offset time) . . . . . . . . . . . . . . . . . . . . 27 3.3.3 統計偏移時間相同之歌曲與評分. . . . . . . . . . . . . . . . 28 3.3.4 回傳分數前十高之歌曲資訊. . . . . . . . . . . . . . . . . . . 30 四、改良方法與實驗結果分析. . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.1 改良概念與測試環境簡介. . . . . . . . . . . . . . . . . . . . . . . . . 31 4.1.1 改良目的與方法概念. . . . . . . . . . . . . . . . . . . . . . . 31 4.1.2 測試環境. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.2 改進GPU 平行運算所造成的辨識率下降. . . . . . . . . . . . . . . . 33 4.2.1 調整動機. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.2.2 調整方法. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.2.3 調整結果. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.3 建構複數雜湊表進行比對. . . . . . . . . . . . . . . . . . . . . . . . . 38 4.3.1 改進動機. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.3.2 引進能量資訊建構第二張雜湊表. . . . . . . . . . . . . . . . 39 4.3.3 改進結果與分析. . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.4 利用雙向檢索比對雜湊表. . . . . . . . . . . . . . . . . . . . . . . . . 43 4.4.1 調整動機. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4.4.2 交換雜湊值與雜湊鍵進行逆向檢索. . . . . . . . . . . . . . . 44 4.4.3 改進結果與分析. . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.5 利用排序學習演算法提升辨識率. . . . . . . . . . . . . . . . . . . . . 49 4.5.1 改進動機. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.5.2 AdaRank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.5.3 改進結果與分析. . . . . . . . . . . . . . . . . . . . . . . . . . 52 五、結論與展望. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 5.1 總結. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 5.2 未來展望. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 參考文獻. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
dc.language.iso	zh-TW
dc.title	以雙向檢索及排序學習演算法來改進音訊指紋辨識	zh_TW
dc.title	Improving Audio Fingerprinting by Bi-directional Retrieval and Learning to Rank	en
dc.type	Thesis
dc.date.schoolyear	108-1
dc.description.degree	碩士
dc.contributor.oralexamcommittee	傅楸善(Chiou-Shann Fuh),王崇?(Chung-Che Wang)
dc.subject.keyword	音樂檢索,音訊指紋系統,地標,雙向檢索,排序學習演算法,AdaRank,	zh_TW
dc.subject.keyword	audio retrieval,audio fingerprint system,landmark,bi-directional retrieval,learning to rank,AdaRank,	en
dc.relation.page	58
dc.identifier.doi	10.6342/NTU202000440
dc.rights.note	同意授權(全球公開)
dc.date.accepted	2020-02-12
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-109-1.pdf	8.98 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。