MIDI音樂之旋律萃取

Yo-Wei Hsiao; 蕭友威

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/85470

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	丁肇隆(Chao-Lung Ting)
dc.contributor.author	Yo-Wei Hsiao	en
dc.contributor.author	蕭友威	zh_TW
dc.date.accessioned	2023-03-19T23:17:03Z	-
dc.date.copyright	2022-07-19
dc.date.issued	2022
dc.date.submitted	2022-07-13
dc.identifier.citation	[1] E. Cambouropoulos, “Voice and stream: Perceptual and computational modeling of voice separation,” Music Perception, vol. 26, no. 1, pp. 75–94, 2008. [2] A. L. Uitdenbogerd and J. Zobel, “Manipulation of music for melody matching,” in Proceedings of the sixth ACM international conference on Multimedia, 1998, pp. 235–240. [3] G. Ozcan, C. Isikhan, and A. Alpkocak, “Melody extraction on midi music files,” in Seventh IEEE International Symposium on Multimedia (ISM’05). Ieee, 2005, pp. 8–pp. [4] M.-K. Shan and F.-F. Kuo, “Music style mining and classification by melody,” IEICE TRANSACTIONS on Information and Systems, vol. 86, no. 3, pp.655–659, 2003. [5] A. Uitdenbogerd and J. Zobel, “Melodic matching techniques for large music databases,” in Proceedings of the seventh ACM international conference on Multimedia (Part 1), 1999, pp. 57–66. [6] W. Chai and B. Vercoe, “Melody retrieval on the web,” in Multimedia Computing and Networking 2002, vol. 4673. International Society for Optics and Photonics, 2001, pp. 226–241. [7] P. Gray and R. C. Bunescu, “A neural greedy model for voice separation in symbolic music.” in Proc. of the 17th International Society for Music Information Retrieval Conf., New York City, USA, 2016, pp. 782–788. [8] J. Kilian and H. H. Hoos, “Voice separation-a local optimization approach.” in ISMIR 2002, 3rd International Conference on Music Information Retrieval, Paris, France, 2002. [9] E. Chew and X. Wu, “Separating voices in polyphonic music: A contig mapping approach,” in International Symposium on Computer Music Modeling and Retrieval (CMMR), U. K. Wiil, Ed. Esbjerg, Denmark: pringer Berlin Heidelberg, 2004, pp. 1–20. [10] A. Ishigaki, M. Matsubara, and H. Saito, “Prioritized contig combining to segregate voices in polyphonic music,” in Sound and Music Computing Conference (SMC 2011), vol. 119, Padova, Italy, 2011, p. 58. [11] N. Guiomard-Kagan, M. Giraud, R. Groult, and F. Lev ́e, “Comparing voice and stream segmentation algorithms,” in Proc. of the 16th International Society for Music Information Retrieval Conf., Ḿalaga, Spain, 2015, pp. 493–499. [12] ——, “Improving voice separation by better connecting contigs,” in Proc. of the 17th International Society for Music Information Retrieval Conf., New York City, USA, 2016, pp. 164–170. [13] S. T. Madsen and G. Widmer, “A complexity-based approach to melody track identification in midi files,” in Proc. of the International Workshop on Artificial Intelligence and Music, Hyderabad, India, 2007. [14] I. Karydis, A. Nanopoulos, A. Papadopoulos, E. Cambouropoulos, and Y. Manolopoulos, “Horizontal and vertical integration/segregation in auditory streaming: a voice separation algorithm for symbolic musical data,” in Proc. of the 4th Sound and Music Computing Conference, Lefkada, Greece, 2007, pp. 299–306. [15] D. Makris, I. Karydis, and E. Cambouropoulos, “Visa3: Refining the voice integration/segregation algorithm,” in Proc. of the Sound and Music Computing Conference 2016, Hamburg, Germany, 2016, pp. 266–273. [16] P. B. Kirlin and P. E. Utgoff, “Voise: Learning to segregate voices in explicit and implicit polyphony.” in ISMIR 2005, 6th International Conference on Music Information Retrieval, London, UK, 2005, pp. 552–557. [17] D. Rizo, P. J. P. De Le ́on, C. P ́erez-Sancho, A. Pertusa, and J. M. I. Quereda, “A pattern recognition approach for melody track selection in midi files.” in Proc. of the 7th International Society for Music Information Retrieval Conf., Victoria, Canada, 2006, pp. 61–66. [18] A. Jordanous, “Voice separation in polyphonic music: A data-driven approach,” in Proc. of the 2008 International Computer Music Conference, Belfast, Ireland, 2008. [19] R. de Valk, T. Weyde, E. Benetos et al., “A machine learning approach to voice separation in lute tablature,” in Proc. of the 14th International Society for Music Information Retrieval Conf., Curitiba, Brazil, 2013, pp. 555–560. [20] T. Weyde and R. de Valk, “Chord- and note-based approaches to voice separation,” in Computational Music Analysis. Springer, 2016, pp. 137–154. [21] P. Gray and R. Bunescu, “From note-level to chord-level neural network models for voice separation in symbolic music,” arXiv preprint arXiv:2011.03028, 2020. [22] F. Simonetta, C. E. C. Chac ́on, S. Ntalampiras, and G. Widmer, “A convolutional approach to melody line sdentification in symbolic scores,” in Proc. of the 20th Int. Society for Music Information Retrieval Conf., Delft, The Netherlands, 2019, pp. 924–931. [23] A. Friberg and S. Ahlb ̈ack, “Recognition of the main melody in a polyphonic symbolic score using perceptual knowledge,” Journal of New Music Research, vol. 38, no. 2, pp. 155–169, 2009. [24] R. de Valk and T. Weyde, “Deep neural networks with voice entry estimation heuristics for voice separation in symbolic music representations,” in Proc. of the 19th International Society for Music Information Retrieval Conf., Paris, France, 2018, pp. 281–288. [25] Y.-W. Hsiao and L. Su, “Learning note-to-note affinity for voice segregation and melody line identification of symbolic music data,” in Proc. of the 22nd Int. Society for Music Information Retrieval Conf., Online, 2021, pp. 285–292. [26] A. McLeod and M. Steedman, “Hmm-based voice separation of midi performance,” Journal of New Music Research, vol. 45, no. 1, pp. 17–26, 2016. [27] D. Huron, “Tone and voice: A derivation of the rules of voice-leading from perceptual principles,” Music Perception, vol. 19, no. 1, pp. 1–64, 2001. [28] K. O’Shea and R. Nash, “An introduction to convolutional neural networks,” arXiv preprint arXiv:1511.08458, 2015. [29] R. M. Haralick and L. G. Shapiro, Computer and robot vision. Addison-wesley Reading, 1992, vol. 1. [30] S. Kiranyaz, T. Ince, and M. Gabbouj, “Real-time patient-specific ecg classification by 1-d convolutional neural networks,” IEEE Transactions on Biomedical Engineering, vol. 63, no. 3, pp. 664–675, 2015. [31] S. Kiranyaz, O. Avci, O. Abdeljaber, T. Ince, M. Gabbouj, and D. J. Inman, “1d convolutional neural networks and applications: A survey,” Mechanical systems and signal processing, vol. 151, p. 107398, 2021. [32] J. Shi and J. Malik, “Normalized cuts and image segmentation,” IEEE Transactions on pattern analysis and machine intelligence, vol. 22, no. 8, pp.888–905, 2000. [33] A. Ng, M. Jordan, and Y. Weiss, “On spectral clustering: Analysis and an algorithm,” Vancouver, Canada, 2001, pp. 849–856. [34] W.-T. Lu and L. Su, “Deep learning models for melody perception: An investigation on symbolic music data,” in 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPAASC). Honolulu, USA: IEEE, 2018, pp. 1620–1625. [35] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in 3rd International Conference on Learning Representations, ICLR 2015, San Diego, USA, 2015.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/85470	-
dc.description.abstract	在眾多音樂資料檢索的問題當中，旋律萃取為至關重要的一環。旋律作為音樂結構中最基本的要素之一，在音樂作曲與分析中都扮演關鍵的角色。在此篇論文中，我們提出了一個資料驅動的系統，藉由建立深度學習的模型並加以訓練，使其可以預測MIDI音樂中音符之間的親近度（affinity）。此系統將音樂的音符視為圖論（graph）中的頂點（vertex），模型學習出的親近度視為頂點之間的邊（edge），進而將MIDI音樂轉化成一張無向圖。我們將一首MIDI樂曲切成數個段落，並利用圖分群的演算法，將無向圖劃分成兩張子圖，他們分別代表一首樂曲段落中旋律及伴奏的部分。最後我們再使用集成學習（ensemble learning）中的投票法，將每個段落的旋律合併成一個完整的旋律，以代表此旋律萃取系統的最後產出結果。為了證實此系統的可行性，我們對深度學習模型進行超參數優化以及使用不同的資料集驗證旋律萃取的結果，並和其他現有的旋律萃取模型進行比較，展現此系統的在不同資料集上的泛化能力。	zh_TW
dc.description.abstract	Melody extraction is a crucial task of music information retrieval, as melody being one of the most important elements in musical composition and analysis. In this thesis, we propose a data-driven based framework, having a 1-D convolutional neural network learn affinities between musical notes. A music piece in MIDI format can then be represented as a weighted undirected graph, with notes being vertices and learned affinity values being the weighted edges in the graph. With graph partition algorithm, the melody track and the accompaniment track in a segment of the music piece are then obtained by applying spectral clustering over the learned graph. Finally, we use a voting system to merge all segmented melodies into a integrated one as our final result of melody extraction. Our proposed framework only takes musical notes as input without using further information (e.g., time signature and key signature), which is flexible with both well-labeled data and real-world performance data. The framework is tested and validated on multiple datasets with different hyperparameter settings, and the performance is compared with other rule-based and data-drive methods.	en
dc.description.provenance	Made available in DSpace on 2023-03-19T23:17:03Z (GMT). No. of bitstreams: 1 U0001-0106202214394500.pdf: 2982678 bytes, checksum: 020b26dc2add2e16464307fde82dea2c (MD5) Previous issue date: 2022	en
dc.description.tableofcontents	Contents Abstract i List of Figures iv List of Tables vi 1 Introduction 1 1.1 Motivation 3 1.2 Contributions 3 1.3 Thesis overview 2 Background and Related work 2.1 Terminology 2.1.1 Voice and melody 5 2.1.2 Melody extraction and voice segregation 6 2.1.3 MIDI 7 2.2 Literature review 7 3 Proposed Methods 10 3.1 Data representation 11 3.1.1 Indices for note events 11 3.1.2 Input features for machine learning models 11 3.2 Model Training 15 3.2.1 One-dimensional convolution neural network 15 3.2.2 Hyperparameters 17 3.3 Affinity Matrix Reconstruction 19 3.4 Graph Partition 19 3.5 Segment Merging 20 3.5.1 Segregating melody and accompaniment 21 3.5.2 Merging 21 3.6 Voice Segregation 24 4 Evaluation 25 4.1 Datasets 25 4.2 Evaluation Metrics 26 4.3 Baselines 27 4.3.1 For melody extraction 27 4.3.2 For voice segregation 28 4.4 Experiments 29 4.4.1 Implementation 29 4.4.2 Hyperparameter optimization over N 29 4.4.3 Hyperparameter optimization over M 31 4.4.4 Hyperparameter optimization over T 33 4.4.5 Models trained with different datasets 38 4.4.6 Baseline comparison 41 5 Conclusion 44 Reference 46 A Figures of training and validation curves 51 List of Figures 1.1 Visual representations for different types of musical data. 2 2.1 Explanations for melody extraction and voice segregation in sheet music. 6 2.2 An example of describing sheet music with note events of MIDI. 8 3.1 Different patterns of streams. 13 3.2 An example of data representation for M = 2. 14 3.3 The diagram of our 1-D CNN model. 18 3.4 Plot N ′(vi) over i. 23 4.1 Loss curves and accuracy curves of 3 models with M = 80 and N = 20. 30 4.2 The result of melody extraction and voice segregation for validation set over different M s and N s (with threshold T = 0.5). 32 4.3 The result of melody extraction and voice segregation for validation set over different M (with N = 20, T = 0.5). 33 4.4 Different melody extraction results for the same piece by applying pitch proximity matching and voting ensemble. 35 4.5 The result of melody extraction and voice segregation for validation set over M = [20, 30 ,40]. 37 4.6 The result of melody extraction and voice segregation for different validation set with models trained with only MPS (with N = 20, T = 0.5). 39 4.7 The result of melody extraction and voice segregation for different validation set with models trained with only AF (with N = 20, T = 0.5). 40 4.8 The result of melody extraction and voice segregation for different validation set with models trained with only BCD (with N = 20, T = 0.5). 40 A.1 Loss curves of 36 models during training in Section 4.4.2. 52 A.2 Accuracy curves of 36 models during training in Section 4.4.2. 53 A.3 Loss curves of 63 models during training in Section 4.4.3. 54 A.4 Accuracy curves of 63 models during training in Section 4.4.3. 55 A.5 Loss curves of 63 models trained with MPS during training in Section 4.4.5. 56 A.6 Loss curves of 63 models trained with AF during training in Section 4.4.5. 57 A.7 Loss curves of 63 models trained with BCD during training in Section 4.4.5. 58 A.8 Accuracy curves of 63 models trained with MPS during training in Section 4.4.5. 59 A.9 Accuracy curves of 63 models trained with AF during training in Section 4.4.5. 60 A.10 Accuracy curves of 63 models trained with BCD during training in Section 4.4.5. 61 List of Tables 4.1 The brief summary of three datasets. Mel. and Accomp. are the abbreviations of melody and accompaniment, respectively. 26 4.2 The validation accuracies (in %) of predicting ˆwij at the final epoch over different Ms and Ns. Each value is the mean value of accuracies from 3 different validation folds. 31 4.3 The validation accuracies (in %) of predicting ˆwij at the final epoch over different Ms. Each value is the mean value of accuracies from 9 different validation folds. 32 4.4 Performances of melody extraction (frame-level F1-score in %) and voice segregation (frame-level accuracy in %) under different merging algorithms (VOTE being voting method and PITCH being pitch proximity method). The hyperparameter setting was N = 20 and T = 0.5 for voting method. Numbers in bold show that VOTE is superior to PITCH. 34 4.5 Performances of melody extraction (frame-level F1-score in %) and voice segregation (frame-level accuracy in %) under different combinations of training datasets. The hyperparameter setting was N = 20 and T = 0.5. Numbers in bold show that the performance of a validation set is better when the model was only trained with its own type of data. 39 4.6 Validation results (in %) of melody extraction for MPS dataset. 42 4.7 Validation results (in %) of melody extraction for AF dataset. 42 4.8 Validation results (in %) of voice segregation for BCD dataset. 43
dc.language.iso	en
dc.subject	音樂資料檢索	zh_TW
dc.subject	MIDI	zh_TW
dc.subject	深度學習	zh_TW
dc.subject	旋律萃取	zh_TW
dc.subject	Music information retrieval	en
dc.subject	MIDI	en
dc.subject	Deep learning	en
dc.subject	Melody extraction	en
dc.title	MIDI音樂之旋律萃取	zh_TW
dc.title	Melody Extraction for MIDI Data	en
dc.type	Thesis
dc.date.schoolyear	110-2
dc.description.degree	碩士
dc.contributor.coadvisor	黃乾綱(Chien-Kang Huang)
dc.contributor.oralexamcommittee	張恆華(Herng-Hua Chang),蘇黎(Li Su)
dc.subject.keyword	MIDI,深度學習,旋律萃取,音樂資料檢索,	zh_TW
dc.subject.keyword	MIDI,Deep learning,Melody extraction,Music information retrieval,	en
dc.relation.page	61
dc.identifier.doi	10.6342/NTU202200851
dc.rights.note	同意授權(全球公開)
dc.date.accepted	2022-07-13
dc.contributor.author-college	工學院	zh_TW
dc.contributor.author-dept	工程科學及海洋工程學研究所	zh_TW
dc.date.embargo-lift	2022-07-19	-
顯示於系所單位：	工程科學及海洋工程學系

文件中的檔案：

檔案	大小	格式
U0001-0106202214394500.pdf	2.91 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。