以隱藏式馬可夫模型為基礎之哼唱轉譜演算法

Yan-Hsing Chen; 陳彥興

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/50796

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	陳中平(Zhong-Ping Chen)
dc.contributor.author	Yan-Hsing Chen	en
dc.contributor.author	陳彥興	zh_TW
dc.date.accessioned	2021-06-15T12:58:55Z	-
dc.date.available	2016-08-02
dc.date.copyright	2016-08-02
dc.date.issued	2016
dc.date.submitted	2016-07-13
dc.identifier.citation	[1] M. Ryynanen, 'Singing transcription,' in Signal Processing Methods for Music Transcription (A. Klpuri and M. Davy, eds.), pp.361-390, Springer Science + Business Media LLC, 2006 [2] Burkholder, J., King, J., & Grout, D. (2010). Study and listening guide for A history of western music, pp. 35, eighth edition, by J. Peter Burkholder, Donald Jay Grout and Claude V. Palisca and Norton anthology of western music, sixth edition, by J. Peter Burkholder and Claude V. Palisca. New York: W.W. Norton & Co. Inc. [3] B. Pardo, J. Shifrin, and W. Birmingham, “Name that tune: A pilot study in ﬁnding a melody from a sung query,” J. Amer. Soc. Inf. Sci. Technol., vol. 55, no. 4, pp. 283–300, 2004. [4] C. De La Bandera, A. M. Barbancho, L. J. Tardón, S. Sammartino, and I. Barbancho, “Humming method for content-based music information retrieval,” in Proc. 12th Int. Soc. Music Inf. Retrieval Conf. ISMIR, 2011. [5] D. M. Howard, G. Welch, J. Brereton, E. Himonides, M. Decosta, J. Williams, and A. Howard, “WinSingad: A real-time display for the singing studio,” Logopedics Phoniatrics Vocology, vol. 29, no. 3, pp. 135–144, 2004. [6] C. Dittmar, H. Gromann, E. Cano, S. Grollmisch, H. M. Lukashevich, and J. Abeer, “Songs2see and globalmusic2one: Two applied research projects in music information retrieval at Fraunhofer IDMT,” in Proc. 7th Int. Conf. Exploring Music Contents (CMMR’10), S. Ystad, M. Aramaki, R. Kronland-Martinet, and K. Jensen, Eds. New York, NY, USA: Springer, 2010, pp. 259–272, vol. 6684 of Lecture Notes in Computer Science. [7] “Singstar game, by Sony Computer Entertainment Europe,” [Online]. Available: http://www.singstar.com/ 2004 [8] M. Ryynänen and A. Klapuri, “Modelling of note events for singing transcription,” in Proc. ISCA Tutorial Res. Workshop Statist. Percept. Audio Process. SAPA, Jeju, Korea, Oct. 2004. [9] E. Molina, I. Barbancho, E. Gomez, A. Barbancho, and L. Tardon, “Fundamental frequency alignment vs. note-based melodic similarity for singing voice assessment,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2013, pp. 744–748. [10] “7-2: Time-domain: PDF: ACF, in Chapter 7: Pitch Tracking, by Jyh-Shing Roger Jang,” [Online]. Available: http://mirlab.org/jang/books/audioSignalProcessing/ [11] “6-2: EPD in Time Domain, in Chapter 6: End-Point Detection (EPD), by Jyh-Shing Roger Jang,” [Online]. Available: http://mirlab.org/jang/books/audioSignalProcessing/ [12] 蔡振家（2013）。音樂認知心理學。臺北市：臺大出版中心。 [13] W. Krige, T. Herbst, and T. Niesler, “Explicit transition modelling for automatic singing transcription,” Journal of New Music Research, vol. 37, no. 4, pp. 311–324, 2008. [14] E. Gomez and J. Bonada, “Towards computer-assisted flamenco transcription: An experimental comparison of automatic transcription algorithms as applied to a cappella singing,” Computer Music Journal, vol. 37, no. 2, pp. 73–90, 2013. [15] E. Molina, L. J. Tardón, A. M. Barbancho and I. Barbancho, 'SiPTH: Singing Transcription Based on Hysteresis Defined on the Pitch-Time Curve,' in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 23, no. 2, pp. 252-263, Feb. 2015. [16] J. Bednar and T. Watt, “Alpha-trimmed means and their relationship to median filters,” IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-32, no. 1, pp. 145–153, Feb. 1984. [17] Aruffo, C., Goldstone, R.L., Earn, D.J.D., 'Absolute judgment of musical interval width,' Music Perception, vol. 32 (2), pp. 186-200. 2014. [18] Emilio Molina, Anan M. Barbancho, Lorenzo J. Tardon & Isabel Barbancho, ',' in Proceedings of the 15th International Symposium on Music Information Retrieval ISMIR, 2014, pp. 567-572. [19] G. Haus and E. Pollastri, 'An audio front end for queryby-humming systems,' in Proceedings of the 2nd International Symposium on Music Information Retrieval ISMIR, pp. 65–72, sn, 2001. [20] L. P. Clarisse, J. P. Martens, M. Lesaffre, B. D. Baets, H. D. Meyer, and M. Leman, 'An Auditory Model Based Transcriber of Singing Sequences, ' in Proceedings of the 3rd International Conference on Music Information Retrieval ISMIR, pp. 116–123, 2002. [21] Evaluation framework for automatic singing transcriptionK. Kimura and A. Lipeles, 'Fuzzy Controller Component,' U. S. Patent 14,860,040, December 14, 1996. [22] G. E. Poliner, D. P. W. Ellis, A. F. Ehmann, E. Gomez, S. Streich, and B. Ong, “Melody transcription from music audio: Approaches and evaluation,” IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 4, pp. 1247–1256, May 2007. [23] M. Ryynänen and A. Klapuri, “Modelling of note events for singing transcription,” in Proc. ISCA Tutorial Res. Workshop Statist. Percept. Audio Process. SAPA, Jeju, Korea, Oct. 2004. [24] L. Rabiner and R. Schafer, Digital processing of speech signals, ser. Prentice-Hall Series in Signal Processing No. 7. Englewood Cliffs, NJ, USA: Prentice-Hall, 1978. [25] E. PoUastri. A pitch tracking system dedicated to process singing voice for musical retrieval. In IEEE International Conference on Multimedia and Expo, Volume 1, pp. 341-344, Lusanne, Switzerland, 2002. [26] R.J. McNab, L.A. Smith, and I.H. Witten. Signal processing for melody transcription. In 19th Australasian Computer Science Conference, Melbourne, Austraha, February 1996. [27] T. De Mulder, J.P. Martens, M. Lesaffre, M. Leman, B. De Baets, and H. De Meyer. Recent improvements of an auditory model based front-end for the transcription of vocal queries. In IEEE International Conference on Acoustics, Speech, and Signal Processing, Volume 4, pp. 257-260, Montreal, Canada, 2004. [28] L.P. Clarisse, J.P. Martens, M. Lesaffre, B. De Baets, H. De Meyer, and M. Leman. An auditory model based transcriber of singing sequences. In International Conference on Music Information Retrieval, Paris, Prance, October 2002. [29] R.J. McNab, L.A. Smith, and I.H. Witten. Signal processing for melody transcription. In 19th Australasian Computer Science Conference, Melbourne, Austraha, February 1996. [30] G. Haus and E. Pollastri. An audio front end for query-by-humming systems. In 2nd Annual International Symposium on Music Information Retrieval, Bloomington, Indiana, USA, 2001. [31] Chong-kai Wang, “An integrated singing transcription system using a robust melody tracker and a multilingual singing lyric recognizer,” M.S. thesis, Chang-Gung University, Taoyuan City, Taipei, 2003. [32] A. Viterbi. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory, 13(2):260-269, April 1967. [33] M. Ryynanen and A. Klapuri. Modelling of note events for singing transcription. In ISCA Tutorial and Research Workshop on Statistical and Perceptual Audio Processing, Jeju, Korea, October 2004. [34] D. Ellis and G. Poliner (2006), Classification-Based Melody Transcription, Machine Learning, special issue on Machine Learning In and For Music, vol. 65, no. 2-3, pp. 439-456, Dec 2006. (18pp) [35] Song, L., Li, M., Yan, Y., Melody extraction for vocal polyphonic music based on bayesian framework (2014) Proceedings - 2014 10th International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIH-MSP 2014, pp. 570-573. [36] Jyh-Shing Roger Jang, 'Speech and Audio Processing (SAP) Toolbox', available at 'http://mirlab.org/jang/matlab/toolbox/sap'. [37] T. Viitaniemi, A. Klapuri, and A. Eronen. A probabilistic model for the transcription of single-voice melodies. In 2003 Finnish Signal Processing Symposium, pp. 59-63, Tampere, Finland, May 2003. [38] M. P. Ryynanen and A. Klapuri, 'Polyphonic music transcription using note event modeling,' IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005., 2005, pp. 319-322. [39] M. P. Ryynanen and A. Klapuri, “Modelling of note events ぴ for singing transcription,” in Proc. ISCA Tutorial and Research Workshop on Statistical and Perceptual Audio, Oct. 2004. [40] Klapuri, A.P., Eronen, A.J., Astola, J.T. Analysis of the meter of acoustic musical signals (2006) IEEE Transactions on Audio, Speech and Language Processing, 14 (1), pp. 342-355. [41] J. Bednar and T. Watt, “Alpha-trimmed means and their relationship to median ﬁlters,” IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-32, no. 1, pp. 145–153, Feb. 1984. [42] Jeppesen, Knud, Couterpoint: the polyphonic vocal style of the sixteenth century, New York: Prentice-Hall, 1939, p. 109 [43] C. Krumhansl. Cognitive Foundations of Musical Pitch. Oxford University Press, 1990. [44] A. De Cheveigné and H. Kawahara, “YIN, a fundamental frequency estimator for speech and music,” J. Acoust. Soc. Amer., vol. 111, no. 4, p. 1917, 2002. [45] A. De Cheveigné, Matlab implementation of YIN algorithm [Online]. Available: http://audition.ens.fr/adc/sw/yin.zip Feb. 2012
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/50796	-
dc.description.abstract	哼唱轉譜演算法最核心的問題是，分割（segmentation）與標籤（labelling）。把原始聲音特徵分割為一個個的音符段落，是為分割；將音符段落標上正確的音高，是為標籤。根據Ryynanen的分類，隱藏式馬可夫模型（hidden Markov models，HMM）屬於並聯式（jointly）的系統，同時決定邊界與音高；音程分割法（E. Molina，2015）則屬於串聯式（cascade）系統則先決定邊界，而後給予音高。隱藏式馬可夫模型以訓練資料為基礎，以機率模型模擬音樂語法中複雜的慣用模式；音程分割法從音符的角度來考慮，過濾掉短暫、激烈的音高變化，而得到較好的音符邊界。本研究提出一套哼唱轉譜系統，其中分割與標籤的階段，使用實驗室自行收集的資料庫，訓練隱藏式馬可夫模型，加上音程分割法與前測音高，在實驗中得到55%的正確音符偵測（correct）成效。其主要原因不是音程分割法找出正確的邊界，而是前測音高的給定，造成音高段落內的音域緊縮，降低音準飄移問題的難度。在驗證方法方面，我們自行收集了140首由非音樂專業使用者錄製的哼唱音檔，製作驗證資料庫。標準答案的製作以半自動的方式進行：轉譜專家們錄製MIDI音軌，再以動態時間校正法（dynamic time warping，DTW）與原始音檔對齊，最後由一位專家手動作最後修正。在製作的過程中，錄音檔與專家答案的差異，彰顯了『音準飄移』現象在音高判定的難題。本研究回顧了相關文獻，並根據錯誤傳遞容忍度，提出修改音準的原則。	zh_TW
dc.description.abstract	Segmentation and labelling are core problems in humming transcription. Based on features like energy, voicing and abrupt changes in fundamental frequency (F0), segmentation stage divide the whole song into note sequence with proper boundary. While the F0 sequence are widely varying and out of absolute tuning, labelling stage assign a pitch label such as an integer MIDI note number for each note. According to Ryynanen’s classification, hidden Markov models (HMM) is one of the methods that perform these two stages jointly; SPiTH (Molina, 2015) belong to cascade system, deciding boundary and pitch sequentially. Based on corpus data, HMM methods use probability distribution to model the conventional syntax in music; in the view of that music in constituted by notes, SPiTH filters the unstable pitch change in each note, obtaining better note boundary. We propose a humming transcription system in this paper. In the stages of segmentation and labelling, firstly, the interval-based segmentation (SPiTH) divide song into note set. Second, HMM model which is trained by collected corpus, is used to assign pitch label to each note. In the experiment, this method has 55% correct in note rate. The main reason of this advantage is not lying on the proper note boundary, but the prior pitch label: The assignment of prior pitch makes the unstable pitch change shrink, which make the tuning problem (singing out-of-tune) more easily. In the evaluation method, we collect 140 songs recorded by non-professional user and make the answer of each song (ground truth). Firstly, experts play on the MIDI keyboard and record it. Second, the MIDI file are aligned to the WAV file through dynamic time warping (DTW) algorithm. At last, an expert corrects the remain errors manually. When making the ground truth, the pitch difference between MIDI and WAV highlight the tuning problem. After reviewing the related literature, we also propose the principle of correcting pitch based on tolerance difference between singer and listener and the error propagating phenomenon.	en
dc.description.provenance	Made available in DSpace on 2021-06-15T12:58:55Z (GMT). No. of bitstreams: 1 ntu-105-R01943140-1.pdf: 4213836 bytes, checksum: 047e3ce5eb9be55dabc309a6e353df32 (MD5) Previous issue date: 2016	en
dc.description.tableofcontents	論文口試委員審定書 i 誌謝 ii 中文摘要 iii ABSTRACT iv 目錄 v 圖目錄 vii 表目錄 ix 第一章緒論 1 第一節研究動機 1 第二節研究目標 1 第三節文獻回顧 2 第一項分割與標籤法 2 第二項音符事件統計模型 2 第三項音樂理論上下文 3 第四節論文架構 3 第二章研究方法 4 第一節音高追蹤 4 第一項自相關函數 4 第二項端點偵測 5 第二節錯誤修正 7 第三節音符分割 10 第一項音程分割法 10 第二項滑音修正 13 第四節前測音高標示 14 第五節音高標籤與隱藏式馬可夫模型 15 第一項總覽 16 第二項轉移機率 17 第三項狀態機率 17 第四項計算最佳路徑 18 第五項音程隱藏式馬可夫模型 20 第三章驗證方法 23 第一節資料庫的蒐集 23 第二節標準答案製作 23 第一項專家轉譜 24 第二項以動態時間校正法與位移對齊錄音檔與MIDI檔 24 第三項第二次專家修改 27 第四項音準飄移與音程感知 29 第五項個案研究 33 第三節效能評估 36 第一項一般評估 36 第二項錯誤分類評估 36 第三項音程評估 37 第四項音符邊界評估 38 第四章實驗結果與討論 39 第一節底線方法（方法一） 39 第二節隱藏式馬可夫模型方法（方法五∼七） 39 第三節驗證與討論 40 第五章結論與未來展望 44 第一節結論 44 第二節未來展望 45 參考文獻 46
dc.language.iso	zh-TW
dc.title	以隱藏式馬可夫模型為基礎之哼唱轉譜演算法	zh_TW
dc.title	A Humming Transcription Algorithm Based on Hidden Markov Models	en
dc.type	Thesis
dc.date.schoolyear	104-2
dc.description.degree	碩士
dc.contributor.coadvisor	張智星(Jyh-Shing Jang)
dc.contributor.oralexamcommittee	陳宏銘(Hong-Ming Chen),呂仁園(Ren-Yuan Lu)
dc.subject.keyword	哼唱轉譜,分割,標籤,隱藏式馬可夫模型,驗證資料庫,音準飄移,	zh_TW
dc.subject.keyword	humming transcription,note segmentation and labelling,HMM,corpus,tuning,	en
dc.relation.page	52
dc.identifier.doi	10.6342/NTU201600513
dc.rights.note	有償授權
dc.date.accepted	2016-07-13
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	電子工程學研究所	zh_TW
顯示於系所單位：	電子工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-105-1.pdf 目前未授權公開取用	4.12 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。