Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電信工程學研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/47921
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor丁建均(Jian-Jiun Ding)
dc.contributor.authorTa Hsienen
dc.contributor.author冼達zh_TW
dc.date.accessioned2021-06-15T06:43:00Z-
dc.date.available2013-07-18
dc.date.copyright2011-07-18
dc.date.issued2011
dc.date.submitted2011-07-06
dc.identifier.citationA. Query-by-Humming System
[1] S Pauws, “CubyHum: a fully Operational Query by Humming System,” in Proc. of ISMIR, pp. 187-196, Citeseer, 2002.
[2] A. Ghias, J. Logan, D. Chamberlain, and B.C. Smith, “Query By Humming: Musical Information Retrieval in an Audio Database,” in Proc. of the ACM international Multimedia conference and exhibition, pp. 231-236, San Francisco, California, Nov. 1995.
[3] N. Kosugi, Y. Nishihara, S. Kon'ya, M. Yamamuro, and K. Kushima, “Music Retrieval by Humming,” in Proc. of PACRIM'99. IEEE, Aug. 1999.
[4] A. M. Barbancho, A. Jurado, I. Barbancho, “Identification of Rhythm and Sound in Polyphonic Piano Recordings,” in Proc. of Forum on Acousticum, Sevilla, 2002.
B. Onset Detection
[5] A. M. Barbancho, A. Jurado, I. Barbancho, “Identification of Rhythm and Sound in Polyphonic Piano Recordings,” in Proc. of Forum on Acousticum, Sevilla, 2002.
[6] Jonathan Foote, “Visualizing Music and Audio using Self-Similarity,” Proceedings of the seventh ACM international, 1999.
[7] Simon Dixon, “ONSET DETECTION REVISITED, ” Proc. of the 9th Int. Conference on Digital Audio Effects (DAFx’06), Montreal, Canada, September 18-20, 2006
[8] Juan Pablo Bello, Laurent Daudet, “A Tutorial on Onset Detection in Music Signals,” IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 13, NO. 5, SEPTEMBER 2005.
[9] Juan Pablo Bello and Mark Sander, “Phase-based note onset detection for music
signals,” in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing(ICASSP-03), Hong Kong, 2003, pp. 49–52.
[10] J. P. Bello, C. Duxbury,M. Davies, and M. Sandler, “On the use of phase and energy for musical onset detection in the complex domain,” IEEE Signal Proces. Lett., vol. 11, no. 6, pp. 553–556, Jun. 2004.
[11] A. Klapuri, “Sound onset detection by applying psychoacoustic knowledge,” in Proc IEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP-99), Phoenix, AZ, 1999, pp. 115-118.
[12] C. Duxbury, J. P. Bello, M. Davies, and M. Sandler, “A combined phase and amplitude based approach to onset detection for audio segmentation,” in Proc. 4th European Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS-03), London, U.K., Apr. 2003, pp.275–280.
[13] Stephen Hainsworth and Malcolm Macleod, “Onset Detection in Musical Audio Signals,” In Proc. Int. Computer Music Conference, pages 163-6, 2003.
[14] C. Duxbury, M. Sandler, and M. Davies, “A hybrid approach to musical note onset detection,” in Proc. of the 5th International Conference on Digital Audio Effects, pp. 33–38, 2002.
[15] R. Zhou, M.Mattavelli, and G. Zoia: “Music Onset Detection based on Resonator Time Frequency Image,” IEEE Trans. On audio, Speech, and Language Processing, Vol. 16(8), 1685–1695, 2008.
[16] M. Gainza, E. Coyle, and B. Lawyor, “Onset detection using comb filters,” in Proc. IEEE Workshop Applicat. Signal Process. Audio Acoust, New York, 2005, pp. 263–266.
[17] Nick Collins, “ A comparison of sound onset detection Algorithms with Emphasis on Psychoacoustically motivated detection functions,” 118th Convention Audio Engineering Society, Barcelona, Spain, 2005 May 28–31.
[18] N. Collins, “Using a pitch detector as an onset detector,” in Proc. Int. Conf. Music Inf. Retrieval, London, U.K., Sep. 1999, pp. 100–106.
[19] J. Foote, “Automatic audio segmentation using a measure of audio novelty,” in Proc. of IEEE International Conference on Multimedia and Expo, issue 1, pp. 452–455, 1999
[20] P. Masri, and A. Bateman, ” Improved modelling of attack transients in music analysis-resynthesis,” in Proc. of International Computer Music Conference (ICMC 96), Hong-Kong, Aug 1996,
[21] X. Rodet and F. Jaillet, “Detection and modeling of fast attack transients,” in Proc. Int. Computer Music Conf., Havana, Cuba, 2001, pp. 30-33.
[22] P. Bello J, G. Monti, M. Sandler, “Techniques for Automatic Music Transcription,” in Proc. of International Symposium on MIR, Polymouth, Massachusetts, 2000.
[23] Brossier P., 'Automatic Annotation of Musical Audio for Interactive Applications', Centre for Digital Music Queen Mary University of London, PhD Thesis (2006) - Chapter 2.
[24] Lacoste, A., and Eck, D. “Onset Detection with Artificial Neural Networks for MIREX 2005,” Music Information Retrieval Exchange, MIREX 2005.
[25] Axel R‥obel, “Onset detection in polyphonic signals by means of transient peak classification,” in MIREX Online Proceedings (ISMIR 2005), London, Great Britain, Septembre 2005.
[26] W. Lee, Y. Shiu, and C. Kuo, “Musical onset detection with linear prediction and joint features,” in Proceedings of the Music Information Retrieval Evaluation EXchange (MIREX ’07), Vienna, Austria, 2007.
[27] Robel A., 'ONSET DETECTION BY MEANS OF TRANSIENT PEAK CLASSIFICATION IN HARMONIC BANDS,' in MIREX Online Proceedings (ISMIR 2009).
[28] Bock S., Eyben F. and Schuller B.,'MIREX 2010 SUBMISSION: ONSET DETECTION WITH BIDIRECTIONAL LONG SHORT-TERM MEMORY NEURAL NETWORKS,' In MIREX 2010.
C. Pitch estimation
[29] A. de Cheveigne and H. Kawahara, “Yin, a fundamental frequency estimator for speech and music,” in Proc. of Acoust. Soc. Am., vol, 111, Issue, 4 pp. 1917-1930, April 2002.
[30] A. Klapuri, “Multipitch analysis of polyphonic music and speech signals using an auditory model,” in Proc. of IEEE Trans. Audio, Speech and Language, vol. 16, pp.255–266, February 2008.
[31] D. Hermes, “Measurement of pitch by subharmonic summation,” in Proc. of JASA, vol. 83, no. 1, pp. 257–273, 1988.
[32] D.J. Levitin, ”Absolute memory for musical pitch: Evidence from the production of learned melodies,” Perception & Psychophysics, vol. 58, pp. 927-935. 1994
[33] E. S. Tsau, N Cho, CCJ Kuo, “Fundamental Frequency Estimation for Music Signals with Modified Hilbert-Huang Transform,” in Proc. of ICME’09, June, 2009.
[34] Y.-R. Chien, S.-K. Jeng, “An automatic transcription system with octave detection,“ in: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, May 2002, pp. 1865–1868.
[35] Alex Kobzantsev, “Automatic transcription of piano polyphonic music,” Proceedings of the 4th International Symposium on Image and Signal processing and Analysis, 2005.
[36] E. Pollastri, ”Melody-retrieval based on pitch-tracking and string-matching methods,” in Proc. of the 12th Colloquium on Musical Informatics, 1999.
[37] H. Hajimolahoseini, M.R. Taban, Abutalebi, H.R. “Automatic Transcription of Music Signal Using Harmonic Elimination Method,” Telecommunications, 2008.
[38] L.R. Rabiner, and B. Juang, ”Fundamentals of Speech Recognition,” Prentice-Hall Inc. 1993.
[39] N.E. Huang, “The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis,” et al. (1998).
D. Others
[40] E.D. Scheirer , “Tempo and Beat Analysis of Musical Signals,” In J. Acoust. Soc. Am, vol. 103, Issue 1, pp. 588-601, January 1998.
[41] J.C. Brown, and J.C. Vaughn, ”Pitch center of stringed instrument vibrato tones,” In J. Acoust. Soc. Am, , vol. 100, pp. 1728-1735, 1996
[42] Jensen, J.H., Christensen,M.G., Ellis,D.P.W., Jensen, S.H.:A tempo-insensitive distance measure for cover song identification based on chroma features. In: IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), April 2008, pp. 2209–2212 (2008).
[43] Typke, P. Giannopoulos, R. C. Veltkamp, F. Wiering, and R. van Oostrum. Using transportation distances for measuring melodic similarity. In ISMIR Proceedings, pages 107–114, 2003.
[44] 王小川, ”語音訊號處理,” pp. 3-8, 2005.
[45] W. Schloss, “On the Automatic Transcription of Percussive Music: From Acoustic Signal to High Level Analysis,” PhD Thesis, Department of Music, Report No. STAN-M-27, Stanford University, CCRMA. 1985
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/47921-
dc.description.abstract現今的歌聲檢索系統用途十分廣泛。從1維訊號的聲音當中利用不同的轉換理論分析和分解訊號的特徵,進而對聲音之間的分析、合成技術、或是聲音的壓縮是關鍵的技術。應用到音樂訊號: 解析分類聲音的種類。應用到哼唱式語音訊號: 抓取相類似歌曲,進而對語音訊號做檢索功能。應用到說話語音訊號: 可以辨識語者的講話內容,進而將語音對應到文字做翻譯、進而達到語音檢索多層次應用目的。所以依照聲音的訊號的處理的目的、用途多方面等等不同,系統能有多方面設計。
歌聲檢索系統主要可以依照不同的數學理論,將語音訊號強化、回復,之後將之切割分段分類,然後將每一段的特徵部分擷取出,進而比對特徵分布之間的關聯性當成某段音訊的編碼,之後將可以利用到多種動態規劃法、或是馬可夫模型等等的排序比對演算法將哼唱式語音訊號對於資料庫中存在的樂譜編碼做檢索並排序輸出。以下便是我們提出的改善的敘述與摘要。
(Filter Design)
針對哼唱式語音訊號來源,本篇論文設計出新的演算法架構,能達到比過去演算法更迅速與更精準的特徵分析。 現今的語音訊號解析在前處理步驟上往往就是利用一些去雜訊的演算法,好比Wiener filter等等,將訊號回復或是強化其語音訊號特徵,接著透過特徵擷取演算法將訊號中切割出不同長度的單位,通常依照人耳聽覺感受是以”音節”作為切割的單位。然後將切割後的訊號做特徵擷取,可能取出”音色”、”基頻”特徵模型等等以供後續排序演算法做前處理。
對於歌聲檢索系統而言,有多篇的文獻探討於哼唱式語音強化、以及特徵取樣的部分。對於雜訊的去除與強化訊號的部分,目前在通訊處理領域有多種先進的演算法如:轉變成雜訊、訊號的數學模型去分析;但是針對於哼唱式人耳聽覺感受的語音訊號,本論文提供了將哼唱聲音、雜訊對應於新的數學模型並提出新的理論和實作將訊號的回復和強化並提供完善的測試實驗數據。
(Onset detection)
第二個改進的地方。對於系統中的歌聲切割擷取部分目前文獻探討很多種關於每片段聲音的相位、頻率、或音量變化值當成切割擷取的重要依據,根據多篇論文的實作結果、目前以頻率特徵改變量或是相位變化當成語音切割的依據,往往會過度切割受到高頻雜訊的影響、並且耗時耗資源;如利用到音量變化作為語音切割的依據,又會往往受到背景雜訊或是斷句不明顯的語音訊號而大受影響。本篇論文在此根據歌聲數學模型提供了新的突破,將統合頻率、音量等特徵做聲音切割演算法,並且大量提升切割準確度,更符合人耳聽覺感受理論,並且本演算法能讓複雜度更低於過去文獻單純利用頻率變化特徵作為依據的切割並測試出比較數據。
(Pitch estimation)
這是第三個改進地方。單位語音片段之中基頻音頻可說是非常重要的關角色,不僅僅是語音特徵、人音辨識,還可以用在輔助語音切割,語音音頻追蹤方面的關鍵。在速度上和準確度上往往不能兼顧,我們提出的一種改善法能夠在大量背景雜訊干擾下仍有非常準確的表現,並且有完善的測試與比較。
(Adaptive MIDI number)
這是第四個改進地方。對於單位語音訊號中特徵往往是利用基頻部分作為代表。但是最多文獻利用到的Midi number作為將聲音每八度音切割成12份的音符作為代表,需要步驟是先將單位區間的聲音基頻部分擷取,然後將之對應至MIDI符號作為比對。根據實驗結果,一來基頻擷取演算法往往不夠精確,常受到音色分布、背景雜訊等等影響、平均只能達到75%,其後對應到的MIDI符號本意上設計是針對不同樂器機器之間的音準做調教而設計出,用來比對人耳聽覺的音準差往往和所期望的有所差距,導致後方的音樂檢索演算法結果會不如預期的出現錯誤。本篇論文對此提供了新的基頻偵測演算法,讓平均準確度高達95%,且新的自適應式理論修正MIDI符號、並依據人耳聽覺模型提出的音準差評估演算法,修改了過去單純利用MIDI符號差當成訊號間特徵的問題,能讓即使音準不好的個人都能建構出各自的自適應式MIDI符號。
我們目前針對於現今的哼唱式語音訊號切割與特徵演算法提供大幅度的多種修正、並且根據實驗模擬的結果,能大幅度提升準確性、複雜度下降、並且滿足系統穩定性等等的優化條件,讓本論文演算法技術不僅僅能利用到哼唱式語音訊號上,往後還能針對不同的語音合成、編碼、到語音檢索等等應用有更佳的實驗結果。
zh_TW
dc.description.abstractThere are many applications of the query by humming (QBH) system. It combines the techniques of feature selection, MIDI number analysis, and melody match processes for the 1-D voice signal. The core techniques include the signal transform theory, feature analysis, and the segmentation of voice signal, which can make us understand and classify the voice signal for more applications. Applying these analysis techniques in the QBH system, the similar songs in the database can be retrieved. Moreover, in the related applications, such as speaking-to-text, speaking translation and multi-lingual transcriptions can be included after speech recognition.
The QBH system can be majorly separated into several processes. First, it emphasizes the features in the spectrum and removes the irrelevant noise. The onsets are obtained by the classification of the segmentation with different pitch features. Then the pitches are transformed into MIDI numbers as a series of code sequences. The outputs of the QBH system are obtained from comparing the pitches of humming signal with those of the songs in our database. It is called melody match, which utilizes dynamic programming, hidden Markov model…etc. for the arrangement and the similarity measurements. Besides, other improvements we proposed are shown below.
(Filter design)
Focusing on humming signal restoration, we proposed a new adaptive algorithm for filter design. It has the advantages of high analysis efficiency, high SNR ratio and small MSE with reliable stability. Compared with the conventional signal restoration algorithms, such as the Wiener filter and the Butterworth filter, it can improve the SNR ratio and reduce the reconstruction error.
Many researches in tele-communication engineering focus on signal and noise analysis, transformation, and the feature extraction of voice signal. The FT transforms single signal into the freq domain and removes the noise. However, according to psychoacoustics, we proposed a new math model for representing the humming voice and used it for the signal restoration. After the implementation of our algorithms, we showed a variety of simulation results and compared the performance with the existing filters in Chapter 4.
(Onset detection)
The second improvement is related to the onset architecture. The “amplitude”, ”frequency”, and ”phase” based segmentations were proposed in many reference papers. According to the implementation and lots of comparisons in papers, the results have the trend of over-detection due to the specific noise characteristic. Moreover, amplitude fluctuation may cause under-detection due to the background noise interference and the attached successive sound. To overcome the above problems, we proposed a new onset architecture, which involved the features in both the spectrum and the time domains. It improves the accuracy to meet the human perception and has less complexity in implementation. Afterward, the complete test results and comparisons are shown in Chapter 4.
(Pitch estimation)
The third improvement is related to instantaneous frequency detection. The pitch extraction method is very important for the entire system. The pitch feature can be utilized for the speaker identification, classification, onset detection, and voice tracking. Therefore, in Chapter 6, we proposed a new improvement based on the sub-harmonics summation and has high accuracy under noise interference.
(Adaptive MIDI)
The fourth improvement is related to adaptive pitch representation. The most common pitch representation method is to use the MIDI number. It separates each octave into 12 notes and the instantaneous frequency can be easily mapped into its corresponding number. However, the standard MIDI numbers are designed for the connection among different musical instruments. There exists the difference between standard MIDI numbers and hearing perception. The accuracy rate of the new pitch estimation method is 95% and the adaptive MIDI numbers revise the measurement according to individuals to construct adaptive MIDI mapping and prevent the off-key cases.
We also focused on the improvement of the entire onset detection system for retrieving the correct pitches. After many tests in a variety of aspects, it shows that the proposed method has the high accuracy rate of the onset detection, lower complexity, and high stability. Therefore, the algorithm we proposed can improve the QBH system, the voice signal analysis system, the music signal coding system. It can also further improve speech recognition system in the future.
en
dc.description.provenanceMade available in DSpace on 2021-06-15T06:43:00Z (GMT). No. of bitstreams: 1
ntu-100-R98942120-1.pdf: 10025320 bytes, checksum: baeee2d40b572b29ee1dd4612574a1c9 (MD5)
Previous issue date: 2011
en
dc.description.tableofcontents誌謝 i
中文摘要 ii
ABSTRACT v
CONTENTS viii
LIST OF FIGURES xi
LIST OF TABLES xxii
Chapter 1 Introduction 2
1.1 Introduction to Humming Signal 2
1.2 The Signal Features and Its Representation 3
1.3 Terminology, Psychoacoustic and Musicology 5
1.4 The Transform Methods of Stave and Humming Signal 11
1.5 Common 2-D Representations of the Voice 13
1.5.1 Short-Time Fourier Transform (STFT). 13
1.5.2 Discrete Wavelet Transform (DWT). 14
1.5.3 Linear Predicted Coefficients (LPC) 16
1.5.4 Mel-frequency Cepstrum Coefficients (MFCC) 18
1.6 Noise Type, the Humming Signal in the Practical Environment. 20
1.6.1 Additional White Gaussian Noise (AWGN) 20
1.6.2 The Sub-band Noise 24
1.6.3 Attached successively sound 27
Chapter 2 Introduction to the QBH System 29
2.1 The Phonetic Structure of English and Mandarin 32
Chapter 3 Onset Detection 34
3.1 Definition and Representation of the Onsets 34
3.2 Design Principle of the Onset Detection 35
3.3 Mehods of onset detection 37
3.3.1 Short-term Energy Method 37
3.3.2 High frequency content (HFC) 39
3.3.3 Surf Method 40
3.3.4 Linear Predicted Coefficients (LPC) 43
3.4 Discussion on These Methods 45
Chapter 4 Proposed Onset Algorithm 50
4.1 Proposed Pre-processing Algorithm: two Gaussian Mixture Filter 51
4.2 Proposed Onset Detection (1/2): Match Filter 63
4.3 Proposed Onset Detection (2/2): Syllable Separation 70
4.4 Comparison and Discussion 72
4.4.1 Signal Restoration Comparison of the filters 73
4.4.2 Comparison of Onset Detections (??) 82
4.4.3 Discussion and Future Work 89
Chapter 5 Pitch Estimation 91
5.1 Autocorrelation Function for Pitch Tracking 92
5.2 The Hilbert Huang transform (HHT) 93
5.2.1 Hilbert Transform 93
5.2.2 Hilbert Huang Transform (HHT) 94
5.2.3 Modified HHT 100
5.3 Sub-Harmonic Summation (SHS) 103
Chapter 6 Proposed Pitch Estimation 107
6.1 The First Proposed Algorithm: Improved SHS 108
6.2 The Second Proposed Algorithm: Points Relationship 110
6.3 Discussion and Simulation 113
Chapter 7 The Proposed Adaptive MIDI Number 118
Chapter 8 The Overall Test of the QBH System 123
Chapter 9 Conclusion and Future Work 131
9.1 Onset Algorithms 131
9.2 The Melody Match Algorithm 134
9.3 The QBH System 137
REFERENCE 138
CURRICULUM VITAE 145
dc.language.isoen
dc.title新式發端與音階識別演算法於音樂信號處理zh_TW
dc.titleNew Onset and Pitch Detection Algorithms in Music Signal Processingen
dc.typeThesis
dc.date.schoolyear99-2
dc.description.degree碩士
dc.contributor.oralexamcommittee郭景明(Jing-Ming Guo),王鵬華(Peng-Hua Wang),葉敏宏(Min-Hong Ye)
dc.subject.keyword音樂訊息檢索,聲音發端識別,發端偵測,MIDI,梅爾頻率倒頻係數,線性預估係數,時頻分析轉換,基頻信號偵測,希爾伯特黃轉換,Sub-harmonic Summation,Surf,zh_TW
dc.subject.keywordMusic information retrieval,voice activity detection,onset detection,MIDI,MFCC,LPC,STFT,pitch estimation,HHT,SHS,HFC,Surf,en
dc.relation.page152
dc.rights.note有償授權
dc.date.accepted2011-07-07
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept電信工程學研究所zh_TW
顯示於系所單位:電信工程學研究所

文件中的檔案:
檔案 大小格式 
ntu-100-1.pdf
  目前未授權公開取用
9.79 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved