Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電信工程學研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/99071
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor丁建均zh_TW
dc.contributor.advisorJian-Jiun Dingen
dc.contributor.author龔鈺翔zh_TW
dc.contributor.authorYu-Hsiang Kungen
dc.date.accessioned2025-08-21T16:16:33Z-
dc.date.available2025-08-22-
dc.date.copyright2025-08-21-
dc.date.issued2025-
dc.date.submitted2025-08-02-
dc.identifier.citation林巧薇 (2016). "應用節奏與頻率資訊之改良式哼唱檢索系統及改良式發端偵測與旋律匹配"
陳秉鴻 (2020). "深度學習, 池化運算及改良式動態規劃應用於哼唱檢索系統."
洪譽承 (2022). "基於深度學習原音自編碼器去噪應用於哼唱式系統"
胡哲銘 (2010). "歌聲檢索系統:改良式發端識別以及修正式旋律比對"
K. -Y. Chen and J. -J. Ding, "Chromagram Features Analysis for Learning-Based Query by Humming Systems," 2025 International Conference on Electronics, Information, and Communication (ICEIC), Osaka, Japan, 2025, pp. 1-4, doi: 10.1109/ICEIC64972.2025.10879656.
S. Ranjan and V. Arora, "A Bioinformatic Method Of Semi-Global Alignment For Query-By-Humming," 2020 IEEE 4th Conference on Information & Communication Technology (CICT), Chennai, India, 2020, pp. 1-5, doi: 10.1109/CICT51604.2020.9312085.
M. Ulfi and R. Mandala, "Improving Query by Humming System using Frequency-Temporal Attention Network and Partial Query Matching," 2022 9th International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA), Tokoname, Japan, 2022, pp. 1-6, doi: 10.1109/ICAICTA56449.2022.9933001.
X. Du, P. Zou, M. Liu, X. Liang, M. Chu and B. Zhu, "ByteHum: Fast and Accurate Query-by-Humming in the Wild," ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Korea, Republic of, 2024, pp. 1111-1115, doi: 10.1109/ICASSP48485.2024.10448117.
A. N. Dwi Triastanto and R. Mandala, "Query by Humming Music Information Retrieval using DNN-LSTM based Melody Extraction and Noise Filtration," 2022 5th International Conference on Information and Communications Technology (ICOIACT), Yogyakarta, Indonesia, 2022, pp. 503-508, doi: 10.1109/ICOIACT55506.2022.9972121.
A. Amatov, D. Lamanov, M. Titov, I. Vovk, I. Makarov, and M. Kudinov, “A Semi-Supervised Deep Learning approach to dataset collection for Query-By-Humming task,” arXiv.org, Dec. 02, 2023. https://arxiv.org/abs/2312.01092
E. Alfaro-Paredes, L. Alfaro-Carrasco, and W. Ugarte, “Query by humming for song identification using voice isolation,” in Lecture notes in computer science, 2021, pp. 323–334. doi: 10.1007/978-3-030-79463-7_27.
S. Ranjan and V. Srivastava, “Incorporating Total Variation Regularization in the design of an intelligent Query by Humming system,” arXiv.org, Feb. 09, 2023. https://arxiv.org/abs/2302.04577
M. Li, Z. Zhao and P. Shi, "Query by humming based on the hierarchical matching algorithm," 2015 IEEE International Conference on Computer and Communications (ICCC), Chengdu, China, 2015, pp. 82-86, doi: 10.1109/CompComm.2015.7387545.
N. Mostafa and P. Fung, “A Note Based Query By Humming System Using Convolutional Neural Network,” Interspeech 2017, pp. 3102–3106, Aug. 2017, doi: https://doi.org/10.21437/interspeech.2017-1590.
S. Yu, X. He, K. Chen, and Y. Yu, “HKDSME: Heterogeneous Knowledge Distillation for Semi-supervised Singing Melody Extraction Using Harmonic Supervision,” pp. 545–553, Oct. 2024, doi: 10.1145/3664647.3681288.
T. -H. Hsieh, L. Su and Y. -H. Yang, "A Streamlined Encoder/decoder Architecture for Melody Extraction," ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 2019, pp. 156-160, doi: 10.1109/ICASSP.2019.8682389.
S. Yong, L. Su and J. Nam, "A Phoneme-Informed Neural Network Model For Note-Level Singing Transcription," ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 2023, pp. 1-5, doi: 10.1109/ICASSP49357.2023.10096707.
X. Wang, W. Xu, W. Yang and W. Cheng, "Musicyolo: A Sight-Singing Onset/Offset Detection Framework Based on Object Detection Instead of Spectrum Frames," ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, Singapore, 2022, pp. 396-400, doi: 10.1109/ICASSP43922.2022.9746684.
B. Agüera y Arcas et al., “Now Playing: Continuous low-power music recognition,” Nov. 2017, [Online]. Available: https://arxiv.org/abs/1711.10958
F. Liu, D. Tuo, Y. Xu and X. Han, "CoverHunter: Cover Song Identification with Refined Attention and Alignments," 2023 IEEE International Conference on Multimedia and Expo (ICME), Brisbane, Australia, 2023, pp. 1080-1085, doi: 10.1109/ICME55011.2023.00189.
J. Xun et al., “DisCover: Disentangled Music Representation Learning for Cover Song Identification,” Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 453–463, Jul. 2023, doi: https://doi.org/10.1145/3539618.3591664.
F. Schroff, D. Kalenichenko and J. Philbin, "FaceNet: A unified embedding for face recognition and clustering," 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 2015, pp. 815-823, doi: 10.1109/CVPR.2015.7298682.
A. Gulati et al., “Conformer: Convolution-augmented transformer for speech recognition,” arXiv.org, May 16, 2020. https://arxiv.org/abs/2005.08100
T. -Y. Lin, P. Goyal, R. Girshick, K. He and P. Dollár, "Focal Loss for Dense Object Detection," 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 2017, pp. 2999-3007, doi: 10.1109/ICCV.2017.324.
S. Rouard, F. Massa, and A. Défossez, “Hybrid transformers for music source separation,” arXiv.org, Nov. 15, 2022. https://arxiv.org/abs/2211.08553
S. Wang, X. Kong, H. Huang, K. Wang and Y. Hu, "HANet: A Harmonic Attention-Based Network for Singing Melody Extraction from Polyphonic Music," ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Hyderabad, India, 2025, pp. 1-5, doi: 10.1109/ICASSP49660.2025.10889955.
R. Jang “MIR-QBSH-corpus,” MIR Lab, CS Dept., Tsing Hua Univ., Taiwan. Link: http://mirlab.org/dataSet/public/MIR-QBSH.zip
R. Jang “MIR-ST500,” MIR Lab, CS Dept., Tsing Hua Univ., Taiwan. Link: http://mirlab.org/dataset/public/MIR-ST500_20201014.zip
J. Z. M. Lim, "Query by Humming (QBH) audio dataset," Kaggle, 2021. [Dataset]. [Online]. Available: https://www.kaggle.com/datasets/limzhiminjessie/query-by-humming-qbh-audio-dataset
K. Chen, S. Yu, C. -i. Wang, W. Li, T. Berg-Kirkpatrick and S. Dubnov, "Tonet: Tone-Octave Network for Singing Melody Extraction from Polyphonic Music," ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, Singapore, 2022, pp. 621-625, doi: 10.1109/ICASSP43922.2022.9747304.
Yaroslav Ganin et al., “Domain-Adversarial Training of Neural Networks,” Journal of Machine Learning Research, vol. 17, no. 59, pp. 1–35, 2016. Available: http://www.jmlr.org/papers/v17/15-239.html
J. Salamon, J. Serrà, and E. Gómez, “Tonal representations for music retrieval: from version identification to query-by-humming,” International Journal of Multimedia Information Retrieval, vol. 2, no. 1, pp. 45–58, Dec. 2012, doi: 10.1007/s13735-012-0026-0.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/99071-
dc.description.abstract哼唱檢索系統(Query by Humming)是設計用在不知道傳統歌曲搜尋的資訊(如:歌名、歌手、歌詞等)的情況下,透過哼出一段旋律來搜尋出期望之歌曲。與常見的歌曲辨識不同,哼唱檢索是使用者哼出一段旋律,而非從背景聲音中找出撥放中的歌曲,這樣可能會導致哼唱的音高、速度都與使用者期望得到的歌曲有所出入。
常見的哼唱檢索系統分為三個部分:音符切割(或稱為發端檢測)、音高辨識、資料比對,其中音符切割又分為兩種做法:音框導向及音符導向。
音框導向透過將輸入切割成固定長度的片段,辨識這個片段的音高後,透過所有片段的音符序列與資料庫的序列做比較。
另一種做法是音符導向,為了提升音高辨識的準確率,降低哼唱時的節奏差異以及音高抖動帶來的影響。音符導向透過偵測每個音的開始,藉此來切割出不同的音符片段,用來做音高的辨識。
相較於傳統將問題分成三個子問題來完成。也有些論文透過機器學習的方式來改善前兩個子問題的準確性,但大多受限於公開訓練資料的不足,導致效果不慎理想。
不過近年有論文提出將哼唱檢索系統視為是翻唱歌曲辨識的特殊情況,可以藉此透過翻唱歌曲辨識更多的公開資料來改善訓練資料的不足。本篇論文基於以上的假設,藉由機器學習的方式,將輸入的哼唱音訊轉換成一個高維的特徵,透過比對資料庫內的特徵相似度,來獲得最相近的歌曲排序,能獲得比傳統的方法更加準確的結果,同時也能規避哼唱檢索系統的公開資料不足的影響。
zh_TW
dc.description.abstractA Query by Humming (QBH) system is designed for situations where traditional song search information (such as title, artist, or lyrics) is unknown, allowing a user to find a desired song by humming a part of its melody. Unlike common song recognition, which identifies a song playing from a background source, QBH involves the user producing the melody themselves. This can result in discrepancies in pitch and tempo compared to the original song the user is trying to find.
Conventional Query by Humming systems are typically composed of three main parts: note segmentation (or onset detection), pitch recognition, and data matching. Within note segmentation, there are two common approaches: frame-based and note-based.
The frame-based approach segments the input audio into fixed-length frames. After identifying the pitch of each frame, the resulting sequence of notes is compared against sequences in the database.
The other approach is note-based, which aims to improve pitch recognition accuracy and reduce the impact of rhythmic variations and pitch fluctuations inherent in humming. The note-based method works by detecting the start of each note, thereby segmenting the audio into distinct note fragments that are then used for pitch recognition.
In contrast to the traditional method of dividing the problem into these three sub-problems, some recent studies have leveraged machine learning to improve the performance of the first two components. However, these approaches are often limited by the scarcity of large-scale, publicly available QBH datasets, resulting in suboptimal performance.
To address this limitation, recent research has proposed treating QBH as a special case of cover song identification, allowing the use of more abundant public cover song datasets for training. Based on this assumption, this work employs a machine learning approach that transforms input humming audio into a high-dimensional feature vector. The system then obtains a ranked list of the most similar songs by comparing feature similarity within the database. This method can achieve more accurate results than traditional approaches and also helps to circumvent the challenges posed by the limited availability of public data for Query by Humming systems.
en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-08-21T16:16:33Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2025-08-21T16:16:33Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontents誌謝 i
中文摘要 ii
ABSTRACT iii
CONTENTS v
LIST OF FIGURES vii
LIST OF TABLES viii
Chapter 1 Introduction 1
1.1 Terms Definitions 3
Chapter 2 Related work 5
2.1 Traditional approach 5
2.1.1 Frame-based approach 5
2.1.2 Note-based approach 5
2.1.3 Pitch estimation 7
2.1.4 Sequence Matching Algorithms 8
2.2 Machine learning approach 8
2.3 Audio Fingerprinting and Metric Learning 10
Chapter 3 Method 12
3.1 Overall system 12
3.2 Data preprocess 13
3.3 Model Architecture 13
3.3.1 Down sampling and projecting to higher dimension space 14
3.3.2 Harmonic block 14
3.3.3 Conformer 17
3.4 Loss function 18
3.4.1 Focal loss 18
3.4.2 Triplet loss 19
3.4.3 Domain loss 19
Chapter 4 Experiment 21
4.1 Experimental Setup 21
4.2 Evaluation Metrics 21
4.2.1 Top-K ratio 22
4.2.2 Mean Reciprocal Rank (MRR) 22
4.3 Main Results 22
4.4 Ablation Studies 23
4.4.1 Effect of Conformer vs. Transformer Encoder 24
4.4.2 Effect of Harmonic Block 26
4.4.3 Effect of convolution reshape 27
4.4.4 Effect of Domain Adversarial Loss 28
Chapter 5 Conclusion 30
Chapter 6 Reference 32
-
dc.language.isoen-
dc.subject深度學習zh_TW
dc.subject哼唱檢索zh_TW
dc.subject聲紋辨識zh_TW
dc.subjectAudio fingerprintingen
dc.subjectQuery by hummingen
dc.subjectdeep learningen
dc.title基於聲紋辨識方法之改良式哼唱檢索系統zh_TW
dc.titleImproved Query by Humming System based on Audio Fingerprinting Methoden
dc.typeThesis-
dc.date.schoolyear113-2-
dc.description.degree碩士-
dc.contributor.oralexamcommittee許文良;余執彰zh_TW
dc.contributor.oralexamcommitteeWen-Liang Hsue;Chih-Chang Yuen
dc.subject.keyword哼唱檢索,深度學習,聲紋辨識,zh_TW
dc.subject.keywordQuery by humming,deep learning,Audio fingerprinting,en
dc.relation.page36-
dc.identifier.doi10.6342/NTU202503400-
dc.rights.note同意授權(全球公開)-
dc.date.accepted2025-08-06-
dc.contributor.author-college電機資訊學院-
dc.contributor.author-dept電信工程學研究所-
dc.date.embargo-lift2025-08-22-
顯示於系所單位:電信工程學研究所

文件中的檔案:
檔案 大小格式 
ntu-113-2.pdf858.92 kBAdobe PDF檢視/開啟
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved