請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/99071| 標題: | 基於聲紋辨識方法之改良式哼唱檢索系統 Improved Query by Humming System based on Audio Fingerprinting Method |
| 作者: | 龔鈺翔 Yu-Hsiang Kung |
| 指導教授: | 丁建均 Jian-Jiun Ding |
| 關鍵字: | 哼唱檢索,深度學習,聲紋辨識, Query by humming,deep learning,Audio fingerprinting, |
| 出版年 : | 2025 |
| 學位: | 碩士 |
| 摘要: | 哼唱檢索系統(Query by Humming)是設計用在不知道傳統歌曲搜尋的資訊(如:歌名、歌手、歌詞等)的情況下,透過哼出一段旋律來搜尋出期望之歌曲。與常見的歌曲辨識不同,哼唱檢索是使用者哼出一段旋律,而非從背景聲音中找出撥放中的歌曲,這樣可能會導致哼唱的音高、速度都與使用者期望得到的歌曲有所出入。
常見的哼唱檢索系統分為三個部分:音符切割(或稱為發端檢測)、音高辨識、資料比對,其中音符切割又分為兩種做法:音框導向及音符導向。 音框導向透過將輸入切割成固定長度的片段,辨識這個片段的音高後,透過所有片段的音符序列與資料庫的序列做比較。 另一種做法是音符導向,為了提升音高辨識的準確率,降低哼唱時的節奏差異以及音高抖動帶來的影響。音符導向透過偵測每個音的開始,藉此來切割出不同的音符片段,用來做音高的辨識。 相較於傳統將問題分成三個子問題來完成。也有些論文透過機器學習的方式來改善前兩個子問題的準確性,但大多受限於公開訓練資料的不足,導致效果不慎理想。 不過近年有論文提出將哼唱檢索系統視為是翻唱歌曲辨識的特殊情況,可以藉此透過翻唱歌曲辨識更多的公開資料來改善訓練資料的不足。本篇論文基於以上的假設,藉由機器學習的方式,將輸入的哼唱音訊轉換成一個高維的特徵,透過比對資料庫內的特徵相似度,來獲得最相近的歌曲排序,能獲得比傳統的方法更加準確的結果,同時也能規避哼唱檢索系統的公開資料不足的影響。 A Query by Humming (QBH) system is designed for situations where traditional song search information (such as title, artist, or lyrics) is unknown, allowing a user to find a desired song by humming a part of its melody. Unlike common song recognition, which identifies a song playing from a background source, QBH involves the user producing the melody themselves. This can result in discrepancies in pitch and tempo compared to the original song the user is trying to find. Conventional Query by Humming systems are typically composed of three main parts: note segmentation (or onset detection), pitch recognition, and data matching. Within note segmentation, there are two common approaches: frame-based and note-based. The frame-based approach segments the input audio into fixed-length frames. After identifying the pitch of each frame, the resulting sequence of notes is compared against sequences in the database. The other approach is note-based, which aims to improve pitch recognition accuracy and reduce the impact of rhythmic variations and pitch fluctuations inherent in humming. The note-based method works by detecting the start of each note, thereby segmenting the audio into distinct note fragments that are then used for pitch recognition. In contrast to the traditional method of dividing the problem into these three sub-problems, some recent studies have leveraged machine learning to improve the performance of the first two components. However, these approaches are often limited by the scarcity of large-scale, publicly available QBH datasets, resulting in suboptimal performance. To address this limitation, recent research has proposed treating QBH as a special case of cover song identification, allowing the use of more abundant public cover song datasets for training. Based on this assumption, this work employs a machine learning approach that transforms input humming audio into a high-dimensional feature vector. The system then obtains a ranked list of the most similar songs by comparing feature similarity within the database. This method can achieve more accurate results than traditional approaches and also helps to circumvent the challenges posed by the limited availability of public data for Query by Humming systems. |
| URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/99071 |
| DOI: | 10.6342/NTU202503400 |
| 全文授權: | 同意授權(全球公開) |
| 電子全文公開日期: | 2025-08-22 |
| 顯示於系所單位: | 電信工程學研究所 |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-113-2.pdf | 858.92 kB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
