Please use this identifier to cite or link to this item:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98711Full metadata record
| ???org.dspace.app.webui.jsptag.ItemTag.dcfield??? | Value | Language |
|---|---|---|
| dc.contributor.advisor | 莊裕澤 | zh_TW |
| dc.contributor.advisor | Yuh-Jzer Joung | en |
| dc.contributor.author | 陳澤暘 | zh_TW |
| dc.contributor.author | Tse-Yang Chen | en |
| dc.date.accessioned | 2025-08-18T16:11:44Z | - |
| dc.date.available | 2025-08-19 | - |
| dc.date.copyright | 2025-08-18 | - |
| dc.date.issued | 2025 | - |
| dc.date.submitted | 2025-08-08 | - |
| dc.identifier.citation | [1] S. Black, S. Biderman, E. Hallahan, Q. Anthony, L. Gao, L. Golding, H. He, C. Leahy, K. McDonell, J. Phang, et al. Gpt-neox-20b: An open-source autoregressive language model. arXiv preprint arXiv:2204.06745, 2022.
[2] J.-P. Briot. From artificial neural networks to deep learning for music generation: history, concepts and trends. Neural Computing and Applications, 33(1):39–65, 2021. [3] K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014. [4] J. Choi and K. Lee. Pop2piano: Pop audio-based piano cover generation. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE, 2023. [5] C. Donahue, J. Thickstun, and P. Liang. Melody transcription via generative pre-training. In ISMIR, 2022. [6] J. Gardner, I. Simon, E. Manilow, C. Hawthorne, and J. Engel. Mt3: Multi-task multitrack music transcription. arXiv preprint arXiv:2111.03017, 2021. [7] G. Hadjeres, F. Pachet, and F. Nielsen. Deepbach: a steerable model for bach chorales generation. In International conference on machine learning, pages 1362–1371. PMLR, 2017. [8] W.-Y. Hsiao, J.-Y. Liu, Y.-C. Yeh, and Y.-H. Yang. Compound word transformer: Learning to compose full-song music over dynamic directed hypergraphs. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 178–186, 2021. [9] Y.-S. Huang and Y.-H. Yang. Pop music transformer: Beat-based modeling and generation of expressive pop piano compositions. In Proceedings of the 28th ACM international conference on multimedia, pages 1180–1188, 2020. [10] S. Ji, X. Yang, and J. Luo. A survey on deep learning for symbolic music generation: Representations, algorithms, evaluations, and challenges. ACM Computing Surveys, 56(1):1–39, 2023. [11] D. P. Kingma. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013. [12] K. Komiya and Y. Fukuhara. Amt-apc: Automatic piano cover by fine-tuning an automatic music transcription model. arXiv preprint arXiv:2409.14086, 2024. [13] I. Loshchilov and F. Hutter. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017. [14] H. H. Mao, T. Shin, and G. Cottrell. Deepj: Style-specific music generation. In 2018 IEEE 12th International Conference on Semantic Computing (ICSC), pages 377–382. IEEE, 2018. [15] M. Müller, Y. Özer, M. Krause, T. Prätzlich, and J. Driedger. Sync toolbox: A python package for efficient, robust, and accurate music synchronization. Journal of Open Source Software, 6(64):3434, 2021. [16] T. Prätzlich, J. Driedger, and M. Müller. Memory-restricted multiscale dynamic time warping. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 569–573. IEEE, 2016. [17] I. Sutskever. Sequence to sequence learning with neural networks. arXiv preprint arXiv:1409.3215, 2014. [18] H. Takamori, T. Nakatsuka, S. Fukayama, M. Goto, and S. Morishima. Audio-based automatic generation of a piano reduction score by considering the musical structure. In MultiMedia Modeling: 25th International Conference, MMM 2019, Thessaloniki, Greece, January 8–11, 2019, Proceedings, Part II 25, pages 169–181. Springer, 2019. [19] C.-P. Tan, H. Ai, Y.-H. Chang, S.-H. Guan, and Y.-H. Yang. Picogen2: Piano cover generation with transfer learning approach and weakly aligned data. In Proceedings of the 25th International Society for Music Information Retrieval Conference (ISMIR), San Francisco, CA, United States, Nov. 2024. [20] C.-P. Tan, S.-H. Guan, and Y.-H. Yang. Picogen: Generate piano covers with a two -stage approach. In Proceedings of the 2024 International Conference on Multimedia Retrieval, pages 1180–1184, 2024. [21] H. H. Tan and D. Herremans. Music fadernets: Controllable music generation based on high-level features via low-level feature modelling. arXiv preprint arXiv:2007.15474, 2020. [22] K. Toyama, T. Akama, Y. Ikemiya, Y. Takida, W.-H. Liao, and Y. Mitsufuji. Automatic piano transcription with hierarchical frequency-time transformer. arXiv preprint arXiv:2307.04305, 2023. [23] A. Vaswani. Attention is all you need. Advances in Neural Information Processing Systems, 2017. [24] S.-L. Wu and Y.-H. Yang. Compose & embellish: Well-structured piano performance generation via a two-stage approach. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE, 2023. [25] S.-L. Wu and Y.-H. Yang. Musemorphose: Full-song and fine-grained piano music style transfer with one transformer vae. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 31:1953–1967, 2023. [26] T. Y. Yip and C.-j. Chau. Music2midi: Pop music to midi piano cover generation. In International Conference on Multimedia Modeling, pages 101–113, 2025. [27] J. Zhao, G. Xia, and Y. Wang. Beat transformer: Demixed beat and downbeat tracking with dilated self-attention. arXiv preprint arXiv:2209.07140, 2022 | - |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98711 | - |
| dc.description.abstract | 鋼琴翻奏生成(Piano Cover Generation)旨在將一首流行歌曲自動轉換為鋼琴編曲。過去已有眾多深度學習研究探討此任務,其解決方案涵蓋了從模型架構的修改到資料預處理的優化等多個層面。然而,我們觀察到這些模型時常無法確保其輸出與原曲之間的結構一致性。我們推論,其原因在於模型的架構缺乏節拍感知的能力,或是模型無法正確學習複雜的節奏資訊。這些節奏資訊至關重要,因為它不僅主導了鋼琴翻奏與原曲在結構層面上的相似性(如速度、BPM),也直接影響了生成音樂的整體品質。
在本論文中,我們提出了一套名為 Etude 的三階段式架構,其名稱融合了其三大核心模組的英文縮寫:萃取(Extract)、結構化(strucTUralize)與解碼(DEcode)。透過預先提取節奏資訊,並採用一種新穎且高度簡化的、基於 REMI 的 token 表示法,我們的模型確保了生成的翻奏具備正確的歌曲結構,提升了音樂的流暢度與動態表現,並能透過注入指定風格來實現高度可控的生成。最終,在包含人類聽眾的主觀評測中,Etude 的表現大幅超越了所有過去的代表性模型,其生成品質更加接近人類作曲家的水平。 | zh_TW |
| dc.description.abstract | Piano cover generation aims to automatically convert a pop song into a piano arrangement. Numerous deep learning studies have previously addressed this task, with solutions ranging from architectural modifications to optimizations in data preprocessing. However, we observe that these models often fail to ensure structural consistency between their output and the original song. We hypothesize this is due to a lack of beat-aware capabilities in their architectures or an inability of the models to correctly learn complex rhythmic information. This rhythmic information is critical, as it not only governs the structural similarity (e.g., tempo, BPM) but also directly impacts the overall quality of the generated piano music.
In this paper, we propose a three-stage architecture, Etude, composed of Extract, strucTUralize, and DEcode stages. By pre-extracting rhythmic information and utilizing a novel, highly simplified REMI-based tokenization, our model ensures the generated covers possess a proper song structure, improves fluency and musical dynamics, and enables highly controllable generation through the injection of specified styles. Finally, in subjective evaluations with human listeners, Etude substantially outperforms all previous models, achieving a quality closer to that of human composers. | en |
| dc.description.provenance | Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-08-18T16:11:44Z No. of bitstreams: 0 | en |
| dc.description.provenance | Made available in DSpace on 2025-08-18T16:11:44Z (GMT). No. of bitstreams: 0 | en |
| dc.description.tableofcontents | 致謝 i
摘要 iii Abstract v 目次 vii 圖次 xi 表次 xiii 第一章 緒論 1 1.1 研究背景與動機 1 1.2 研究目的 3 第二章 文獻探討 5 2.1 符號音樂生成 5 2.1.1 音符事件序列 5 2.1.2 鋼琴卷軸 6 2.2 音樂風格轉換 8 2.3 自動鋼琴翻奏生成(APCG) 9 2.3.1 Pop2Piano 10 2.3.2 PiCoGen 10 2.3.3 PiCoGen2 11 2.3.4 AMT-APC 13 2.3.5 Music2MIDI 13 2.4 總結 14 第三章 研究方法 17 3.1 研究架構 17 3.2 資料集前處理 18 3.2.1 提取節拍資訊 18 3.2.2 轉錄與對齊 18 3.2.3 量化 19 3.2.4 Tokenize 20 3.3 Tiny-REMI Token 20 3.3.1 Token 結構 21 3.3.2 編碼方式 22 3.3.3 解碼方式 24 3.4 模型 25 3.4.1 Extractor 模型 25 3.4.2 Decoder 模型 26 3.4.2.1 Bar-wise Mix 27 3.4.2.2 風格向量 28 第四章 實驗過程與成果評估 33 4.1 資料集 33 4.2 訓練過程 33 4.2.1 Extractor 33 4.2.2 Decoder 34 4.3 模型推論 35 4.4 客觀評估 36 4.4.1 對齊路徑偏差(Warp Path Deviation, WPD) 37 4.4.2 節奏網格一致性(Rhythmic Grid Coherence, RGC) 38 4.4.3 IOI 模式熵(IOI Pattern Entropy, IPE) 39 4.5 主觀評估 40 4.6 評估結果 41 4.7 風格向量對模型生成的影響 46 第五章 結論 49 5.1 研究總結 49 5.2 研究貢獻 50 5.3 研究限制 51 5.4 未來研究方向 52 參考文獻 53 | - |
| dc.language.iso | zh_TW | - |
| dc.subject | 音樂生成 | zh_TW |
| dc.subject | 自動鋼琴翻奏生成 | zh_TW |
| dc.subject | 可控生成 | zh_TW |
| dc.subject | 自動音樂轉錄 | zh_TW |
| dc.subject | 音樂資訊檢索 | zh_TW |
| dc.subject | Music Information Retrieval (MIR) | en |
| dc.subject | Automatic Music Transcription | en |
| dc.subject | Controllable Generation | en |
| dc.subject | Music Generation | en |
| dc.subject | Automatic Piano Cover Generation | en |
| dc.title | Etude:基於萃取、結構化與解碼的自動鋼琴翻奏生成模型架構 | zh_TW |
| dc.title | Etude: Automatic Piano Cover Generation with a Three-Stage Approach — Extract, strucTUralize, and DEcode | en |
| dc.type | Thesis | - |
| dc.date.schoolyear | 113-2 | - |
| dc.description.degree | 碩士 | - |
| dc.contributor.oralexamcommittee | 陳建錦;魏志平;楊奕軒;林俊叡 | zh_TW |
| dc.contributor.oralexamcommittee | Chien-Chin Chen;Chih-Ping Wei;Yi-Hsuan Yang;June-Ray Lin | en |
| dc.subject.keyword | 自動鋼琴翻奏生成,音樂生成,音樂資訊檢索,自動音樂轉錄,可控生成, | zh_TW |
| dc.subject.keyword | Automatic Piano Cover Generation,Music Generation,Music Information Retrieval (MIR),Automatic Music Transcription,Controllable Generation, | en |
| dc.relation.page | 56 | - |
| dc.identifier.doi | 10.6342/NTU202503741 | - |
| dc.rights.note | 同意授權(限校園內公開) | - |
| dc.date.accepted | 2025-08-12 | - |
| dc.contributor.author-college | 管理學院 | - |
| dc.contributor.author-dept | 資訊管理學系 | - |
| dc.date.embargo-lift | 2025-08-19 | - |
| Appears in Collections: | 資訊管理學系 | |
Files in This Item:
| File | Size | Format | |
|---|---|---|---|
| ntu-113-2.pdf Access limited in NTU ip range | 1.56 MB | Adobe PDF |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
