Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電信工程學研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/97520
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor楊奕軒zh_TW
dc.contributor.advisorYi-Hsuan Yangen
dc.contributor.author艾芯zh_TW
dc.contributor.authorHsin Aien
dc.date.accessioned2025-07-02T16:16:23Z-
dc.date.available2025-07-03-
dc.date.copyright2025-07-02-
dc.date.issued2025-
dc.date.submitted2025-06-23-
dc.identifier.citationS. Böck, F. Korzeniowski, J. Schlüter, F. Krebs, and G. Widmer. Madmom: A new python audio and music signal processing library. In Proceedings of the 24th ACM international conference on Multimedia, pages 1174–1178, 2016.
N. Boulanger-Lewandowski, Y. Bengio, and P. Vincent. Modeling temporal depen- dencies in high-dimensional sequences: Application to polyphonic music generation and transcription. arXiv preprint arXiv:1206.6392, 2012.
J.-P. Briot, G. Hadjeres, and F.-D. Pachet. Deep learning techniques for music generation–a survey. arXiv preprint arXiv:1709.01620, 2017.
G. Brunner, A. Konrad, Y. Wang, and R. Wattenhofer. Midi-vae: Modeling dynam- ics and instrumentation of music with applications to style transfer. arXiv preprint arXiv:1809.07600, 2018.
G. Brunner, Y. Wang, R. Wattenhofer, and S. Zhao. Symbolic music genre transfer with cyclegan. In 2018 ieee 30th international conference on tools with artificial intelligence (ictai), pages 786–793. IEEE, 2018.
J. Choi and K. Lee. Pop2piano : Pop audio-based piano cover generation, 2023.
K. Choi, C. Hawthorne, I. Simon, M. Dinculescu, and J. Engel. Encoding musical style with transformer autoencoders. In International conference on machine learning, pages 1899–1908. PMLR, 2020.
O. Cífka, U. Şimşekli, and G. Richard. Supervised symbolic music style translation using synthetic data. arXiv preprint arXiv:1907.02265, 2019.
O. Cífka, U. Şimşekli, and G. Richard. Groove2groove: One-shot music style trans- fer with supervision from synthetic data. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28:2638–2650, 2020.
P. Dhariwal, H. Jun, C. Payne, J. W. Kim, A. Radford, and I. Sutskever. Jukebox: A generative model for music. arXiv preprint arXiv:2005.00341, 2020.
H.-W. Dong, W.-Y. Hsiao, L.-C. Yang, and Y.-H. Yang. Musegan: Multi-track se- quential generative adversarial networks for symbolic music generation and accom- paniment. In Proceedings of the AAAI conference on artificial intelligence, vol- ume 32, 2018.
D. Eck and J. Schmidhuber. A first look at music composition using lstm recur- rent neural networks. Istituto Dalle Molle Di Studi Sull Intelligenza Artificiale, 103(4):48–56, 2002.
I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. Advances in neural information processing systems, 27, 2014.
G. Hadjeres, F. Pachet, and F. Nielsen. Deepbach: a steerable model for bach chorales generation. In International conference on machine learning, pages 1362– 1371. PMLR, 2017.
I. Higgins, L. Matthey, A. Pal, C. Burgess, X. Glorot, M. Botvinick, S. Mohamed, and A. Lerchner. beta-vae: Learning basic visual concepts with a constrained vari- ational framework. In International conference on learning representations, 2017.
A. Holtzman, J. Buys, L. Du, M. Forbes, and Y. Choi. The curious case of neural text degeneration. arXiv preprint arXiv:1904.09751, 2019.
W.-Y. Hsiao, J.-Y. Liu, Y.-C. Yeh, and Y.-H. Yang. Compound word transformer: Learning to compose full-song music over dynamic directed hypergraphs. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 178–186, 2021.
W.-N. Hsu, Y. Zhang, and J. Glass. Learning latent representations for speech gen- eration and transformation. arXiv preprint arXiv:1704.04222, 2017.
Z. Hu, Z. Yang, R. R. Salakhutdinov, L. Qin, X. Liang, H. Dong, and E. P. Xing. Deep generative models with learnable knowledge constraints. Advances in Neural Information Processing Systems, 31, 2018.
C.-Z. A. Huang, A. Vaswani, J. Uszkoreit, N. Shazeer, C. Hawthorne, A. M. Dai, M. D. Hoffman, and D. Eck. Music transformer: Generating music with long-term structure. arXiv preprint arXiv:1809.04281, 2018.
J. Huang, K. Chen, and Y.-H. Yang. Emotion-driven piano music genera- tion via two-stage disentanglement and functional representation. arXiv preprint arXiv:2407.20955, 2024.
Y.-S. Huang and Y.-H. Yang. Pop music transformer: Beat-based modeling and generation of expressive pop piano compositions. In Proceedings of the 28th ACM international conference on multimedia, pages 1180–1188, 2020.
Y.-N. Hung, I. Chiang, Y.-A. Chen, Y.-H. Yang, et al. Musical composition style transfer via disentangled timbre representations. arXiv preprint arXiv:1905.13567, 2019.
D. P. Kingma, T. Salimans, R. Jozefowicz, X. Chen, I. Sutskever, and M. Welling. Improved variational inference with inverse autoregressive flow. Advances in neural information processing systems, 29, 2016.
D. P. Kingma, M. Welling, et al. Auto-encoding variational bayes, 2013.
Q. Kong, B. Li, X. Song, Y. Wan, and Y. Wang. High-resolution piano transcription with pedals by regressing onset and offset times. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29:3707–3717, 2021.
W. T. Lu, L. Su, et al. Transferring the style of homophonic music using recurrent neural networks and autoregressive model. In ISMIR, pages 740–746, 2018.
H. H. Mao, T. Shin, and G. Cottrell. Deepj: Style-specific music generation. In 2018 IEEE 12th international conference on semantic computing (ICSC), pages 377–382. IEEE, 2018.
N. Mor, L. Wolf, A. Polyak, and Y. Taigman. A universal music translation network. arXiv preprint arXiv:1805.07848, 2018.
A. Pati and A. Lerch. Is disentanglement enough? on latent representations for controllable music generation. arXiv preprint arXiv:2108.01450, 2021.
A. Roberts, J. Engel, C. Raffel, C. Hawthorne, and D. Eck. A hierarchical latent vector model for learning long-term structure in music. In International conference on machine learning, pages 4364–4373. PMLR, 2018.
H. H. Tan and D. Herremans. Music fadernets: Controllable music genera- tion based on high-level features via low-level feature modelling. arXiv preprint arXiv:2007.15474, 2020.
P. M. Todd. A connectionist approach to algorithmic composition. Computer Music Journal, 13(4):27–43, 1989.
A. Van Den Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, K. Kavukcuoglu, et al. Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499, 12, 2016.
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017.
S.-L.WuandY.-H.Yang.Compose&embellish:Well-structuredpianoperformance generation via a two-stage approach. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE, 2023.
S.-L. Wu and Y.-H. Yang. Musemorphose: Full-song and fine-grained piano music style transfer with one transformer vae. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 31:1953–1967, 2023.
L.-C. Yang, S.-Y. Chou, and Y.-H. Yang. Midinet: A convolutional genera- tive adversarial network for symbolic-domain music generation. arXiv preprint arXiv:1703.10847, 2017.
H. Zhang and S. Dixon. Disentangling the horowitz factor: Learning content and style from expressive piano performance. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE, 2023.
J. Zhao and G. Xia. Accomontage: Accompaniment arrangement via phrase selec- tion and style transfer. arXiv preprint arXiv:2108.11213, 2021.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/97520-
dc.description.abstract針對特定編曲家風格的流行鋼琴演奏版(piano cover)進行風格轉換,是符號化音樂生成領域中的一項獨特挑戰,其核心在於實現穩健的內容與風格解耦。本研究中,我們將「風格」定義為特定編曲家的伴奏模式——例如其特有的節奏密度 (rhythmic intensity)、複音織度 (polyphony) 、音域(pitch range)等伴奏型態;而將「內容」定義為核心的旋律及和聲。此任務的一項關鍵困難在於,即使是旋律本身也可能包含了編曲家的風格變化。本論文旨在解決此問題,我們確立了以導引譜 (lead sheet)——一種包含旋律與和弦進行的樂譜——作為「內容」的穩固基礎。透過提供一個明確的核心音樂結構,譜面得以有效去除鋼琴演奏中所附加的風格變化,為風格轉換提供了更清晰的分離基礎。在此之上,本研究系統性地比較了數種基於 Transformer 的架構,以探究直接基於 token (token-based) 的控制方法與更複雜的基於嵌入 (embedding-based) 策略的成效。值得注意的是,本研究框架的運作無需成對資料。我們的綜合評估顯示,儘管所有實現的方法都能有效捕捉目標編曲家的特徵,基於 token 的模型卻是一個更簡潔且有效的解決方案。它在風格轉換任務的兩大核心層面——內容保留與風格匹配——的客觀與主觀評估中,均取得了更優越的表現。這個關鍵發現提供了有力的實證證據:對於此類任務,利用導引譜來清晰地表示內容,能讓一個簡單的、基於 token 的模型實現風格轉換,為未來的研究提供了一個實際且有效的基準。zh_TW
dc.description.abstractArranger-specific style transfer for pop piano covers presents a unique challenge in achieving robust content-style disentanglement. For this work, we define arranger-specific style by unique accompaniment patterns, such as characteristic rhythmic intensity, polyphony, and pitch range. Content, conversely, is identified as the core melody and harmony. A key difficulty is that even performed melodies can contain an arranger's stylistic variations. This research addresses this by establishing the lead sheet as a robust anchor to decouple the musical content from stylistic variations, enabling a cleaner separation of style. Building on this foundation, we propose a Transformer-based framework to systematically compare the efficacy of a direct token-based conditioning approach versus more complex embedding-based strategies. Notably, this framework operates without requiring paired data. Our comprehensive evaluations demonstrate that while all implemented approaches successfully transfer the target arranger's characteristics, the simpler token-based model consistently proves to be a more effective and efficient solution. It achieved superior performance in both objective and subjective evaluations across the two core dimensions of the task: content preservation and style matching. This key finding highlights a crucial insight: leveraging a lead sheet for clear content representation allows a simple token-based model to achieve highly effective style transfer, providing a practical and efficient benchmark for future work.en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-07-02T16:16:23Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2025-07-02T16:16:23Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontentsVerification Letter from the Oral Examination Committee i
Acknowledgements iii
摘要 v
Abstract vii
Contents ix
List of Figures xiii
List of Tables xv
Chapter 1 Introduction 1
Chapter 2 Related Work 5
2.1 Music Generation 5
2.2 Music Style Transfer 7
2.3 Content and Style Representations 8
2.4 Research Gaps 9
Chapter 3 Methodology 11
3.1 Content and Style Disentanglement 11
3.2 Model Architecture 12
3.2.1 Model 1: Decoder-Only with Token-Based Content and Style 12
3.2.2 Model 2: Encoder-Decoder with Embedding-Based Content and Token-Based Style 13
3.2.3 Model 3: Encoder-Decoder with Token-Based Content and Embedding- Based Style 14
3.3 Data Representation and Tokenization 15
3.3.1 Data Source and Conversion 15
3.3.2 Lead Sheet Extraction 15
3.3.3 Symbolic Representation and Tokenization 16
3.3.4 Style Representation 17
3.3.5 Sequence Segmentation 17
3.4 Training Objectives 17
3.5 Implementation Details 18
3.5.1 Model Configurations and Hyper-parameters 18
3.5.2 Training Procedure 19
3.5.3 VAE-Specific Training Details (Model3) 19
3.5.4 Software and Hardware 20
Chapter 4 Experiment 21
4.1 Dataset 21
4.1.1 Dataset Preparation 21
4.1.2 Dataset Composition and Splitting 22
4.1.3 Dataset Statistics 23
4.2 Evaluation Metrics 23
4.2.1 Objective Metrics 24
4.2.1.1 Style Matching 24
4.2.1.2 Melodic Fidelity 25
4.2.2 Subjective Evaluation 26
4.3 Baseline Models 28
4.4 Experimental Setup 29
Chapter 5 Results and Discussion 31
5.1 Overview 31
5.2 Objective Results: Style Matching 31
5.3 Objective Results: Melodic Fidelity 34
5.4 Subjective Results 35
5.5 Discussion 36
Chapter 6 Conclusion 39
References 41
Appendix A — Experiment 47
A.1 REMI Vocabulary 47
A.2 Model Configurations 48
-
dc.language.isoen-
dc.subjectTransformerzh_TW
dc.subject鋼琴伴奏zh_TW
dc.subject內容-風格解耦zh_TW
dc.subject導引譜zh_TW
dc.subject音樂風格轉換zh_TW
dc.subjectTransformeren
dc.subjectMusic Style Transferen
dc.subjectPiano Accompanimenten
dc.subjectContent-Style Disentanglementen
dc.subjectLead Sheeten
dc.title基於 Transformer 模型鋼琴伴奏風格轉換zh_TW
dc.titleTransformer-Based Piano Accompaniment Style Transferen
dc.typeThesis-
dc.date.schoolyear113-2-
dc.description.degree碩士-
dc.contributor.oralexamcommittee鄭皓中;蘇黎zh_TW
dc.contributor.oralexamcommitteeHao-Chung Cheng;Li Suen
dc.subject.keyword音樂風格轉換,鋼琴伴奏,內容-風格解耦,導引譜,Transformer,zh_TW
dc.subject.keywordMusic Style Transfer,Piano Accompaniment,Content-Style Disentanglement,Lead Sheet,Transformer,en
dc.relation.page48-
dc.identifier.doi10.6342/NTU202501267-
dc.rights.note同意授權(全球公開)-
dc.date.accepted2025-06-24-
dc.contributor.author-college電機資訊學院-
dc.contributor.author-dept電信工程學研究所-
dc.date.embargo-lift2025-07-03-
顯示於系所單位:電信工程學研究所

文件中的檔案:
檔案 大小格式 
ntu-113-2.pdf1.16 MBAdobe PDF檢視/開啟
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved