複音音樂的樂器編制分析及其在音樂相似度估計上的應用

Nien-Teh Hsu; 許年德

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/9365

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	貝蘇章
dc.contributor.author	Nien-Teh Hsu	en
dc.contributor.author	許年德	zh_TW
dc.date.accessioned	2021-05-20T20:19:19Z	-
dc.date.available	2009-06-30
dc.date.available	2021-05-20T20:19:19Z	-
dc.date.copyright	2009-06-30
dc.date.issued	2009
dc.date.submitted	2009-06-15
dc.identifier.citation	[1] K. Brandenburg and M. Bosi, 'Overview of MPEG audio: Current and future standards for low-bit-rate audio coding,' Journal-Audio Engineering Society, vol. 45, pp. 4-21, 1997. [2] M. A. Casey, R. Veltkamp, M. Goto, M. Leman, C. Rhodes, and M. Slaney, 'Content-based music information retrieval: Current directions and future challenges,' Proceedings of the IEEE, vol. 96, no. 4, pp. 668-696, 2008. [3] S. Douglas, 'Music downloads reach record high,' 2007, [online] available at http://www.investmentmarkets.co.uk. [4] M. Castelluccio, 'The Music Genome Project,' Strategic Finance, vol. 88, no. 6, pp. 57-58, 2006. [5] H. C. Chen and A. L. P. Chen, 'A music recommendation system based on music and user grouping,' Journal of Intelligent Information Systems, vol. 24, no. 2, pp. 113-132, 2005. [6] S. C. Pei and N. T. Hsu, 'Instrumentation analysis and identification of polyphonic music using beat-synchronous feature integration and fuzzy clustering,' in Proceedings of the ICASSP, 2009, (accepted). [7] S. C. Pei and N. T. Hsu, 'A novel music similarity measure system based on instrumentation analysis,' in Proceedings of the ICME, 2009, (accepted). [8] C. S. Xu, N. C. Maddage, and X. Shao, 'Automatic music classi‾cation and summarization,' IEEE Transactions on Speech and Audio Processing, vol. 13, no. 3, pp. 441-450, 2005. [9] G. Tzanetakis and P. Cook, 'Musical genre classification of audio signals,' IEEE Transactions on Speech and Audio Processing, vol. 10, no. 5, pp. 293-302, 2002. [10] B. Logan and A. Salomon, 'A music similarity function based on signal analysis,' in Proceedings of the ICME, 2001, pp. 952-955. [11] Y. Rubner, C. Tomasi, and L. J. Guibas, 'The earth mover's distance as a metric for image retrieval,' International Journal of Computer Vision, vol. 40, no. 2, pp. 99-121, 2000. [12] E. Pampalk, 'Speeding up music similarity,' in Proceedings of the MIREX, 2005. [13] E. Pampalk, A. Flexer, and G. Widmer, 'Improvements of audio-based music similarity and genre classification,' in Proceedings of the ISMIR, 2005. [14] E. Pampalk, A. Rauber, and D. Merkl, 'Content-based organization and visualization of music archives,' in Proceedings of the ACM International Conference on Multimedia, 2002, pp. 570-579. [15] E. Pampalk, S. Dixon, and G. Widmer, 'Exploring music collections by browsing different views,' Computer Music Journal, vol. 28, no. 2, pp. 49-62, 2004. [16] J. J. Aucouturier and F. Pachet, 'Music similarity measures: What's the use?,' in Proceedings of the ISMIR, 2002. [17] J. J. Aucouturier and F. Pachet, 'Improving timbre similarity: How high is the sky?,' Journal of Negative Results in Speech and Audio Sciences, vol. 1, no. 1, pp. 1-13, 2004. [18] J. J. Aucouturier, F. Pachet, and M. Sandler, 'The way it sounds: Timbre models for analysis and retrieval of music signals,' IEEE Transactions on Multimedia, vol. 7, no. 6, pp. 1028-1035, 2005. [19] G. Tzanetakis and P. Cook, 'MARSYAS: A framework for audio analysis,' Organised Sound, vol. 4, no. 3, pp. 169-175, 2000. [20] E. Pampalk, 'A Matlab toolbox to compute music similarity from audio,' in Proceedings of the ISMIR, 2004, pp. 254-257. [21] T. Kolenda, S. Sigurdsson, O. Winther, L. K. Hansen, and J. Larsen, 'DTU: Toolbox,' 2002, [online] available at http://isp.imm.dtu.dk/toolbox/. [22] X. Huang and H. W. Hon, Spoken Language Processing: A Guide to Theory, Algorithm and System Development, Prentice Hall PTR, 2001. [23] J. S. Jang, 'Audio Processing Toolbox,' [online] available at http://www.cs.nthu.edu.tw/~jang. [24] J. M. Martinez, 'MPEG-7 overview,' ISO/IEC JTC1/SC29/WG11N5525, 2003, [online] available at http://www.chiariglione.org/mpeg/standards/mpeg-7. [25] S. Quackenbush and A. Lindsay, 'Overview of MPEG-7 audio,' IEEE Transactions on Circuits and Systems for Video Technology, vol. 11, no. 6, pp. 725-729, 2001. [26] J. J. Aucouturier and F. Pachet, 'The influence of polyphony on the dynamical modelling of musical timbre,' Pattern Recognition Letters, vol. 28, no. 5, pp. 654-661, 2007. [27] J. B. MacQueen, 'Some methods for classi‾cation and analysis of multivariate observations,' in Proceedings of the Berkeley Symposium on Mathematical Statistics and Probability, 1967, pp. 281-297. [28] T. K. Moon, 'The expectation-maximization algorithm,' IEEE Signal Processing Magazine, vol. 13, no. 6, pp. 47-60, 1996. [29] J. S. Jang, 'DCPR (Data Clustering and Pattern Recognition) Toolbox,' [on-line] available at http://www.cs.nthu.edu.tw/~jang. [30] S. Chretien and A. O. Hero III, 'Kullback proximal algorithms for maximum-likelihood estimation,' IEEE Transactions on Information Theory, vol. 46, no. 5, pp. 1800-1810, 2000. [31] X. Zhou, Y. Fu, M. Liu, M. Hasegawa-Johnson, and T. S. Huang, 'Robust analysis and weighting on MFCC components for speech recognition and speaker identification,' in Proceedings of the ICME, 2007, pp. 188-191. [32] S. Kullback and R. A. Leibler, 'On information and su±ciency,' Annals of Mathematical Statistics, vol. 22, no. 1, pp. 79-86, 1951. [33] G. Salton and C. Buckley, 'Improving retrieval performance by relevance feedback,' Journal of the American Society for Information Science, vol. 41, no. 4, pp. 288-297, 1990. [34] J. C. Brown, 'Calculation of a constant Q spectral transform,' The Journal of the Acoustical Society of America, vol. 89, no. 1, pp. 425-434, 1991. [35] J. C. Brown and M. S. Puckette, 'An efficient algorithm for the calculation of a constant Q transform,' The Journal of the Acoustical Society of America, vol. 92, no. 5, pp. 2698-2701, 1992. [36] C. N. dos Santos, S. L. Netto, L. W. R. Biscainho, and D. B. Graziosi, 'A modified constant-Q transform for audio signals,' in Proceedings of the ICASSP, 2004, vol. 2, pp. 469-472. [37] D. B. Graziosi, C. N. dos Santos, S. L. Netto, and L. W. P. Biscainho, 'A constant-Q spectral transformation with improved frequency response,' in Proceedings of the ISCAS, 2004, vol. 5, pp. 544-547. [38] F. C. C. B. Diniz, I. Kothe, L. W. P. Biscainho, and S. L. Netto, 'A bounded-Q fast filter bank for audio signal analysis,' in Proceedings of the International Telecommunications Symposium, 2006, pp. 1015-1019. [39] F. C. Diniz, L. W. P. Biscainho, and S. L. Netto, 'Practical design of filter banks for automatic music transcription,' International Symposium on Image and Signal Processing and Analysis, pp. 81-85, 2007. [40] O. Izmirli, 'A hierarchical constant Q transform for partial tracking in musical signals,' in Proceedings of the 2nd COST G-6 Workshop on Digital Audio Effects, 1999. [41] W. J. Pielemeier, G. H. Wakefield, and M. H. Simoni, 'Time-frequency analysis of musical signals,' Proceedings of the IEEE, vol. 84, no. 9, pp. 1216-1230, 1996. [42] M. Lawrence, 'Simple music theory as it relates to signal processing,' 2004, [online] available at http://cnx.org/content/m12461. [43] Z. Duan, Y. Zhang, C. Zhang, and Z. Shi, 'Unsupervised single-channel music source separation by average harmonic structure modeling,' IEEE Transactions on Audio, Speech, and Language Processing, vol. 16, no. 4, pp. 766-778, 2008. [44] A. Eronen and A. Klapuri, 'Musical instrument recognition using cepstral coefficients and temporal features,' in Proceedings of the ICASSP, 2000, vol. 2, pp. 753-756. [45] A. Eronen, 'Comparison of features for musical instrument recognition,' in Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2001. [46] A. G. Krishna and T. V. Sreenivas, 'Music instrument recognition: from isolated notes to solo phrases,' in Proceedings of the ICASSP, 2004, vol. 4, pp. 265-268. [47] S. Essid, G. Richard, and B. David, 'Musical instrument recognition on solo performance,' in Proceedings of the EUSIPCO, 2004, pp. 1289-1292. [48] E. Benetos, M. Kotti, and C. Kotropoulos, 'Musical instrument classification using non-negative matrix factorization algorithms and subset feature selection,' in Proceedings of the ICASSP, 2006, vol. 5, pp. 221-224. [49] S. Essid, G. Richard, and B. David, 'Musical instrument recognition by pairwise classification strategies,' IEEE Transactions on Audio, Speech and Language Processing, vol. 14, no. 4, pp. 1401-1412, 2006. [50] A. Klapuri, 'Analysis of musical instrument sounds by source-filter-decay model,' in Proceedings of the ICASSP, 2007, pp. 53-56. [51] B. Su, An intergrated musical auto-transcription system, Master Thesis, National Taiwan University, 2001. [52] M. J. Kartomi, On Concepts and Classifications of Musical Instruments, University Of Chicago Press, 1990. [53] J. D. Deng, C. Simmermacher, and S. Cranefield, 'A study on feature analysis for musical instrument classi‾cation,' IEEE Transactions on Systems, Man, and Cybernetics, vol. 38, no. 2, pp. 429-438, 2008. [54] S. Aksoy and R. M. Haralick, 'Feature normalization and likelihood-based similarity measures for image retrieval,' Pattern Recognition Letters, vol. 22, no. 5, pp. 563-582, 2001. [55] S. Essid, G. Richard, and B. David, 'Instrument recognition in polyphonic music,' in Proceedings of the ICASSP, 2005, vol. 3, pp. 245-248. [56] J. Eggink and G. J. Brown, 'Instrument recognition in accompanied sonatas and concertos,' in Proceedings of the ICASSP, 2004, vol. 4, pp. 217-220. [57] T. Kitahara, M. Goto, K. Komatani, T. Ogata, and H. G. Okuno, 'Instrument identification in polyphonic music: Feature weighting with mixed sounds, pitch-dependent timbre modeling, and use of musical context,' in Proceedings of the ISMIR, 2005, pp. 558-563. [58] S. Essid, G. Richard, and B. David, 'Instrument recognition in polyphonic music based on automatic taxonomies,' IEEE Transactions on Audio, Speech, Language Processing, vol. 14, no. 1, pp. 68-80, Jan. 2006. [59] J. Eggink and G. J. Brown, 'A missing feature approach to instrument identification in polyphonic music,' in Proceedings of the ICASSP, 2003, vol. 5, pp. 553-556. [60] S. F. Chang, T. Sikora, and A. Purl, 'Overview of the MPEG-7 standard,' IEEE Transactions on Circuits and Systems for Video Technology, vol. 11, no. 6, pp. 688-695, 2001. [61] D. P. W. Ellis, C. V. Cotton, and M. I. Mandel, 'Cross-correlation of beat-synchronous representations for music similarity,' in Proceedings of the ICASSP, 2008, pp. 57-60. [62] S. Dixon, 'Automatic extraction of tempo and beat from expressive performances,' Journal of New Music Research, vol. 30, no. 1, pp. 39-58, 2001. [63] W. Pedrycz and F. Gomide, Fuzzy Systems Engineering: Toward Human-Centric Computing, Wiley-IEEE Press, 2007. [64] C. C. Chang and C. J. Lin, 'LIBSVM: a library for support vector machines,' 2001, [online] available at http://www.csie.ntu.edu.tw/~cjlin/libsvm. [65] T. Kitahara, M. Goto, K. Komatani, T. Ogata, and H. G. Okuno, 'Instrogram: A new musical instrument recognition technique without using onset detection nor f0 estimation,' in Proceedings of the ICASSP, 2006, pp. 14-19. [66] H. T. Cheng, Y. H. Yang, Y. C. Lin, I. B. Liao, and H. H. Chen, 'Automatic chord recognition for music classification and retrieval,' in Proceedings of the ICME, 2008, pp. 1505-1508.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/9365	-
dc.description.abstract	在過去的數十年裡，由於網際網路的蓬勃發展，各式各樣多媒體檔案的數量不斷增加。在這之中，不論是在獲取或是發佈數位音樂檔案都變得比過去容易很多。也由於此數量規模的不斷爆增，我們需要一個新的聆聽音樂和發掘新音樂的方法。在這篇論文的一開始，我們會介紹一個簡單的音樂相似度估計系統，並且模擬它的效能。根據實驗結果顯示，使用較低階的特徵向量來描述曲子的特性，並不足以讓我們分離出不同音樂內容本身對相似度造成的影響，例如和絃、曲風、樂器編制和旋律。因此，這篇論文的主要目標在於將原本的低階特徵替換為與音樂內容有關的中階特徵。於此，我們將特別著重於樂器編制的自動化分析。音樂訊號音色的時頻分析和單一樂器的分類問題都將在此篇論文中討論，以做為基本的工具。之後我們將延伸此想法到處理更複雜的複音音樂，並且截取其隨時間變化的樂器編制資訊。藉由在相似度估計系統上使用此資訊，我們發現計算出的相似樂曲結果中，將可以特別針對樂器和音色，而非其他音樂內容。如此將可以取代原本的相似度估計系統，達到實現多模式音樂相似度估計的目標。	zh_TW
dc.description.abstract	During the past few decades, the world has ushered in a new era, with booming Internet technology and immense multimedia content distribution. The acquisition and circulation of digital music ‾le become much easier than ever. Due to this rapidly rising of music quantity, a brand new way of discovering and recommending music is thus highly expected. In the beginning of this study, a conventional music similarity measure system based on the signal analysis methods is implemented and evaluated. According to the experimental results, it shows that the low-level features from signal analysis techniques are not strong enough to ful‾ll the discrimination between various musical content, such as the chord progression, genre, instrumentation, and melody. Therefore, the aim of this study is to incorporate the low-level feature with the mid-level feature, in order to utilize the musical content. We focus on the way to extract the instrumentation information leaved by the composers. The time-frequency analysis of musical instrumental signals and the classification problem of various instruments in the monophonic case are studied. After that, we extend the idea to deal with the polyphonic music and analyze its time-varying instrumentation information. By incorporating this information back to the original similarity measure system, the calculated similar songs can resemble to each other specifically in the sense of the instrumentation.	en
dc.description.provenance	Made available in DSpace on 2021-05-20T20:19:19Z (GMT). No. of bitstreams: 1 ntu-98-R96942047-1.pdf: 5121299 bytes, checksum: 77f21558dd639bf3a7e257e45ee0f2ff (MD5) Previous issue date: 2009	en
dc.description.tableofcontents	1 Introduction 7 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2 Primary Achievements of This Study . . . . . . . . . . . . . . . . . . 9 1.3 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2 A Music Similarity Measure System 11 2.1 Introduction and Related Work . . . . . . . . . . . . . . . . . . . . . 11 2.2 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.2.1 Mel-Frequency Cepstral Coefficients . . . . . . . . . . . . . . . 15 2.2.2 Timbral Texture Feature . . . . . . . . . . . . . . . . . . . . . 17 2.2.3 MPEG-7 Audio Descriptors . . . . . . . . . . . . . . . . . . . 19 2.3 Cluster Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.3.1 k-means Clustering . . . . . . . . . . . . . . . . . . . . . . . . 21 2.3.2 Gaussian Mixture Models . . . . . . . . . . . . . . . . . . . . 23 2.4 Distance Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.4.1 Likelihood Function . . . . . . . . . . . . . . . . . . . . . . . . 25 2.4.2 Kullback-Leibler Divergence . . . . . . . . . . . . . . . . . . . 26 2.4.3 Earth Mover's Distance . . . . . . . . . . . . . . . . . . . . . . 26 2.4.4 Monte-Carlo Sampling . . . . . . . . . . . . . . . . . . . . . . 27 2.5 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.5.1 Music Similarity Measure Toolbox . . . . . . . . . . . . . . . . 29 2.5.2 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3 Time-Frequency Analysis of Music Instrumental Signal 38 3.1 Introduction and Related Work . . . . . . . . . . . . . . . . . . . . . 38 3.2 Characteristics of Musical Instrumental Signal . . . . . . . . . . . . . 40 3.2.1 Pitch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.2.2 Harmonics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.3 Constant Q Transform . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.3.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.3.3 An Efficient Algorithm . . . . . . . . . . . . . . . . . . . . . . 46 3.4 Time-Frequency Analysis Using the Constant Q Transform . . . . . . 48 3.5 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.5.1 Music Database . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4 Instrument Classification of Monophonic Music 57 4.1 Introduction and Related Work . . . . . . . . . . . . . . . . . . . . . 57 4.2 History and Concept of Musical Instrument Classification . . . . . . . 59 4.3 Description of the Proposed System . . . . . . . . . . . . . . . . . . . 61 4.3.1 Feature Normalization . . . . . . . . . . . . . . . . . . . . . . 62 4.3.2 Support Vector Machine . . . . . . . . . . . . . . . . . . . . . 63 4.3.3 k-Fold Cross Validation . . . . . . . . . . . . . . . . . . . . . 65 4.4 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.4.1 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . 66 4.4.2 Instrument Family Classification Results . . . . . . . . . . . . 68 4.4.3 Individual Instrument Classification Results . . . . . . . . . . 68 4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 5 Instrumentation Analysis of Polyphonic Music 73 5.1 Introduction and Related Work . . . . . . . . . . . . . . . . . . . . . 73 5.2 Motivation and a Small Experiment . . . . . . . . . . . . . . . . . . . 75 5.3 Description of the Proposed System . . . . . . . . . . . . . . . . . . . 79 5.3.1 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . 79 5.3.2 Beat Tracking and Feature Integration . . . . . . . . . . . . . 81 5.3.3 Fuzzy Clustering . . . . . . . . . . . . . . . . . . . . . . . . . 82 5.3.4 Instrument Identification . . . . . . . . . . . . . . . . . . . . . 83 5.4 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 5.4.1 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . 86 5.4.2 Instrument Identification Result . . . . . . . . . . . . . . . . . 86 5.4.3 Instrumentation Analysis Result . . . . . . . . . . . . . . . . . 87 5.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 6 An Instrumentation-Based Music Similarity Measure System 90 6.1 Introduction and Related Work . . . . . . . . . . . . . . . . . . . . . 90 6.2 Instrumentation Analysis System . . . . . . . . . . . . . . . . . . . . 92 6.3 Proposed Similarity Measure System . . . . . . . . . . . . . . . . . . 92 6.3.1 Normalized Cross-Correlation . . . . . . . . . . . . . . . . . . 94 6.3.2 Kullback-Leibler Divergence . . . . . . . . . . . . . . . . . . . 94 6.3.3 Entropy Difference . . . . . . . . . . . . . . . . . . . . . . . . 95 6.3.4 MFCC Distance . . . . . . . . . . . . . . . . . . . . . . . . . . 95 6.3.5 Weighted Distance Optimization . . . . . . . . . . . . . . . . 95 6.4 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 6.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 7 Conclusions and Future Work 101 7.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
dc.language.iso	en
dc.title	複音音樂的樂器編制分析及其在音樂相似度估計上的應用	zh_TW
dc.title	Instrumentation Analysis of Polyphonic Music and Its Application to Music Similarity Measure	en
dc.type	Thesis
dc.date.schoolyear	97-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	王鵬華,徐忠枝,鄭士康
dc.subject.keyword	基於內容的音樂資訊擷取,樂器分類,音樂相似度估計,音樂訊號處理,	zh_TW
dc.subject.keyword	Content-based music information retrieval,Instrument classification,Music similarity measure,Audio signal processing,	en
dc.relation.page	110
dc.rights.note	同意授權(全球公開)
dc.date.accepted	2009-06-16
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	電信工程學研究所	zh_TW
顯示於系所單位：	電信工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-98-1.pdf	5 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。