Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/79798
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor張智星(Jyh-Shing Jang)
dc.contributor.authorMeng-Hsuan Wuen
dc.contributor.author吳孟軒zh_TW
dc.date.accessioned2022-11-23T09:11:45Z-
dc.date.available2021-08-20
dc.date.available2022-11-23T09:11:45Z-
dc.date.copyright2021-08-20
dc.date.issued2021
dc.date.submitted2021-08-16
dc.identifier.citation[1] Gino Brunner, Andres Konrad, Yuyi Wang, and Roger Wattenhofer. MIDI­VAE: modeling dynamics and instrumentation of music with applications to style transfer. In Emilia Gómez, Xiao Hu, Eric Humphrey, and Emmanouil Benetos, editors, Proceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR 2018, Paris, France, September 23­27, 2018, pages 747–754, 2018. [2] Hao­Wen Dong, Wen­Yi Hsiao, Li­Chia Yang, and Yi­Hsuan Yang. Musegan: Multi­track sequential generative adversarial networks for symbolic music generation and accompaniment. In Sheila A. McIlraith and Kilian Q. Weinberger, editors, Proceedings of the Thirty­Second AAAI Conference on Artificial Intelligence, (AAAI­18), the 30th innovative Applications of Artificial Intelligence (IAAI­18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI­18), New Orleans, Louisiana, USA, February 2­7, 2018, pages 34–41. AAAI Press, 2018. [3] Cheng­Zhi Anna Huang, Ashish Vaswani, Jakob Uszkoreit, Ian Simon, Curtis Hawthorne, Noam Shazeer, Andrew M. Dai, Matthew D. Hoffman, Monica Dinculescu, and Douglas Eck. Music transformer: Generating music with long­term structure. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6­9, 2019. OpenReview.net, 2019. [4] Yu­Siang Huang and Yi­Hsuan Yang. Pop music transformer: Beat­based modeling and generation of expressive pop piano compositions. In Chang Wen Chen, Rita Cucchiara, Xian­Sheng Hua, Guo­Jun Qi, Elisa Ricci, Zhengyou Zhang, and Roger Zimmermann, editors, MM ’20: The 28th ACM International Conference on 53doi:doi:10.6342/NTU202102219 Multimedia, Virtual Event / Seattle, WA, USA, October 12­16, 2020, pages 1180–1188. ACM, 2020. [5] Aäron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew W. Senior, and Koray Kavukcuoglu. Wavenet: A generative model for raw audio. In The 9th ISCA Speech Synthesis Workshop, Sunnyvale, CA, USA, 13­15 September 2016, page 125. ISCA, 2016. [6] Soroush Mehri, Kundan Kumar, Ishaan Gulrajani, Rithesh Kumar, Shubham Jain, Jose Sotelo, Aaron C. Courville, and Yoshua Bengio. Samplernn: An unconditional end­to­end neural audio generation model. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24­26, 2017, Conference Track Proceedings. OpenReview.net, 2017. [7] Prafulla Dhariwal, Heewoo Jun, Christine Payne, Jong Wook Kim, Alec Radford, and Ilya Sutskever. Jukebox: A generative model for music. CoRR, abs/2005.00341, 2020. [8] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett, editors, Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4­9, 2017, Long Beach, CA, USA, pages 5998– 6008, 2017. [9] Tsung­Ping Chen and Li Su. Functional harmony recognition of symbolic music data with multi­task recurrent neural networks. In Emilia Gómez, Xiao Hu, Eric Humphrey, and Emmanouil Benetos, editors, Proceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR 2018, Paris, France, September 23­27, 2018, pages 90–97, 2018. [10] Pei­Ching Li, Li Su, Yi­Hsuan Yang, and Alvin W. Y. Su. Analysis of expressive musical terms in violin using score­informed and expression­based audio features. In Meinard Müller and Frans Wiering, editors, Proceedings of the 16th International 54doi:doi:10.6342/NTU202102219 Society for Music Information Retrieval Conference, ISMIR 2015, Málaga, Spain, October 26­30, 2015, pages 809–815, 2015. [11] Diederik P. Kingma and Max Welling. Auto­encoding variational bayes. In Yoshua Bengio and Yann LeCun, editors, 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14­16, 2014, Conference Track Proceedings, 2014. [12] Kyunghyun Cho, Bart van Merrienboer, Çaglar Gülçehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using RNN encoder­decoder for statistical machine translation. In Alessandro Moschitti, Bo Pang, and Walter Daelemans, editors, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25­29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL, pages 1724–1734. ACL, 2014. [13] Ian J. Goodfellow, Jean Pouget­Abadie, Mehdi Mirza, Bing Xu, David WardeFarley, Sherjil Ozair, Aaron C. Courville, and Yoshua Bengio. Generative adversarial networks. CoRR, abs/1406.2661, 2014. [14] Zihang Dai, Zhilin Yang, Yiming Yang, Jaime G. Carbonell, Quoc Viet Le, and Ruslan Salakhutdinov. Transformer­xl: Attentive language models beyond a fixedlength context. In Anna Korhonen, David R. Traum, and Lluís Màrquez, editors, Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28­ August 2, 2019, Volume 1: Long Papers, pages 2978–2988. Association for Computational Linguistics, 2019. [15] Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. Transformers are rnns: Fast autoregressive transformers with linear attention. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13­18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pages 5156–5165. PMLR, 2020. [16] Sean Vasquez and Mike Lewis. Melnet: A generative model for audio in the frequency domain. CoRR, abs/1906.01083, 2019. [17] Wen­Yi Hsiao, Jen­Yu Liu, Yin­Cheng Yeh, and Yi­Hsuan Yang. Compound word transformer: Learning to compose full­song music over dynamic directed hypergraphs. In Thirty­Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty­Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2­9, 2021, pages 178–186. AAAI Press, 2021. [18] Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, and Yejin Choi. The curious case of neural text degeneration. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26­30, 2020. OpenReview.net, 2020. [19] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. Pytorch: An imperative style, high performance deep learning library. In Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alché­Buc, Emily B. Fox, and Roman Garnett, editors, Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8­14, 2019, Vancouver, BC, Canada, pages 8024–8035, 2019. [20] Colin Raffel and Daniel P. W. Ellis. Intuitive analysis, creation and manipulation of midi data with prettymidi. Proceedings of the 15th International Conference on Music Information Retrieval Late Breaking and Demo Papers, 2014. [21] Hao­Wen Dong, Wen­Yi Hsiao, and Yi­Hsuan Yang. Pypianoroll: Open source python package for handling multitrack pianorolls. Late­Breaking Demos of the 19th International Society for Music Information Retrieval Conference, 2018.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/79798-
dc.description.abstract在機器學習的應用上,音樂相關的研究雖然和自然語言處理相似,但舉凡和弦、調性等等音樂的特性,使得音樂處理中有更多可以考慮的面向。音樂生成是音樂處理其中一個題目,主要目的是讓模型能在短時間內自動生成全新的音樂。近年來,隨著機器學習模型的日新月異,音樂生成這塊領域也與之蓬勃發展,不但生成結果更流暢,更能生成複雜也有架構的音樂。 本論文以語言的形式看待音樂,定義音高、音長等音樂中的元素為所謂的音樂事件,依此將音樂表示成一連串的序列,並透過語言模訓練自迴歸模型,最後從預測機率取樣達到音樂生成的結果。我們使用貝多芬鋼琴奏鳴曲之功能和聲資料集作為訓練資料,並利用資料集中專業音樂人所標記之和弦、調性與樂句標籤設計音樂事件加入模型,生成帶有豐富音樂資訊的音樂。此外,我們更進一步透過和弦與調性標籤設計損失函數,使生成之音樂更具音樂性。最後我們設計主觀測試問卷,比較模型中音樂資訊有無對於聽感與樂理上的差異。zh_TW
dc.description.provenanceMade available in DSpace on 2022-11-23T09:11:45Z (GMT). No. of bitstreams: 1
U0001-0908202117594400.pdf: 2867496 bytes, checksum: 6aaa071fd35cbce35473f10287821074 (MD5)
Previous issue date: 2021
en
dc.description.tableofcontents致謝 iii 摘要 v Abstract vii 目錄 ix 圖目錄 xiii 表目錄 xv 第一章 緒論 1 1.1 研究簡介與動機 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 研究貢獻 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 章節概述 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 第二章 文獻探討 5 2.1 符號音樂自動生成技術 . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.1 從音高卷軸的角度看音樂 . . . . . . . . . . . . . . . . . . . . . . 5 2.1.2 從語言的角度看音樂 . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 音訊音樂自動生成技術 . . . . . . . . . . . . . . . . . . . . . . . . . 9 第三章 資料集簡介 13 3.1 事件 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.2 拍與下拍 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.3 和弦 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.4 樂句 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 第四章 研究方法 19 4.1 資料前處理 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.1.1 事件 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.1.1.1 音符資訊 . . . . . . . . . . . . . . . . . . . . . . . . 20 4.1.1.2 調性與和弦資訊 . . . . . . . . . . . . . . . . . . . . 20 4.1.1.3 樂句資訊 . . . . . . . . . . . . . . . . . . . . . . . . 21 4.1.2 音樂事件之合併 . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.1.2.1 忽略 . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.1.2.2 類別事件 . . . . . . . . . . . . . . . . . . . . . . . . 23 4.1.3 資料擴充與序列分組 . . . . . . . . . . . . . . . . . . . . . . . . 24 4.1.4 事件總結 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.2 模型與架構 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.2.1 轉換器 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.2.1.1 位置編碼 . . . . . . . . . . . . . . . . . . . . . . . . 26 4.2.1.2 多頭注意力機制 . . . . . . . . . . . . . . . . . . . . 27 4.2.1.3 前饋神經網路 . . . . . . . . . . . . . . . . . . . . . . 28 4.2.2 模型整體架構 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.3 損失函數 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.3.1 調性與和弦損失函數 . . . . . . . . . . . . . . . . . . . . . . . . 30 4.4 訓練參數 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.5 資料取樣與後處理 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.5.1 開頭 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.5.2 取樣方式 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.5.3 資料後處理 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.6 實作工具 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 第五章 實驗結果 35 5.1 實驗一:調性與和弦損失函數 . . . . . . . . . . . . . . . . . . . . . 35 5.2 實驗二:抄襲原曲程度 . . . . . . . . . . . . . . . . . . . . . . . . . 37 5.3 實驗三:主觀測試問卷 . . . . . . . . . . . . . . . . . . . . . . . . . 38 5.3.1 整體回覆結果 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 5.3.2 學習音樂時間與評分結果關係 . . . . . . . . . . . . . . . . . . . 41 5.3.3 音樂創作經驗與評分結果關係 . . . . . . . . . . . . . . . . . . . 41 5.3.4 音樂表演經驗與評分結果關係 . . . . . . . . . . . . . . . . . . . 42 5.3.5 貝多芬熟悉程度與評分結果關係 . . . . . . . . . . . . . . . . . . 43 5.4 音樂生成結果 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 5.4.1 只有音符事件的模型 . . . . . . . . . . . . . . . . . . . . . . . . 43 5.4.2 包含調性或和弦事件之模型 . . . . . . . . . . . . . . . . . . . . 44 5.4.3 包含調性與和弦事件之模型 . . . . . . . . . . . . . . . . . . . . 45 5.4.4 包含調性、和弦與樂句事件之模型 . . . . . . . . . . . . . . . . 46 第六章 結論與未來展望 49 6.1 結論 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 6.2 未來展望 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 參考文獻 53
dc.language.isozh-TW
dc.subject轉換器zh_TW
dc.subject符號音樂自動生成zh_TW
dc.subject音樂事件zh_TW
dc.subject貝多芬鋼琴奏鳴曲集zh_TW
dc.subject調性zh_TW
dc.subject和弦zh_TW
dc.subject樂句zh_TW
dc.subjectSymbolic music generationen
dc.subjectTransformeren
dc.subjectPhraseen
dc.subjectChorden
dc.subjectTonalityen
dc.subjectBeethoven Piano Sonata with Function Harmonyen
dc.subjectEvent-based representationen
dc.title考慮調性、和聲與樂句的音樂自動生成zh_TW
dc.title"Classical Music Transformer: An Investigation on Tonality, Harmony and Phrase"en
dc.date.schoolyear109-2
dc.description.degree碩士
dc.contributor.coadvisor蘇黎(Li Su)
dc.contributor.oralexamcommittee楊奕軒(Hsin-Tsai Liu),(Chih-Yang Tseng)
dc.subject.keyword符號音樂自動生成,音樂事件,貝多芬鋼琴奏鳴曲集,調性,和弦,樂句,轉換器,zh_TW
dc.subject.keywordSymbolic music generation,Event-based representation,Beethoven Piano Sonata with Function Harmony,Tonality,Chord,Phrase,Transformer,en
dc.relation.page56
dc.identifier.doi10.6342/NTU202102219
dc.rights.note同意授權(全球公開)
dc.date.accepted2021-08-16
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept資訊工程學研究所zh_TW
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
U0001-0908202117594400.pdf2.8 MBAdobe PDF檢視/開啟
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved