Please use this identifier to cite or link to this item:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/92773
Title: | 基於Transformer模型節奏、和弦與文字控制之音樂生成研究 MusiConGen: Rhythm and Chord Control for Transformer-Based Text-to-Music Generation |
Authors: | 藍雲瀚 Yun-Han Lan |
Advisor: | 楊奕軒 Yi-Hsuan Yang |
Co-Advisor: | 鄭皓中 Hao-Chung Cheng |
Keyword: | 音樂,大型語言模型,生成式模型,控制, Music,LLM,Generative model,Control, |
Publication Year : | 2024 |
Degree: | 碩士 |
Abstract: | 現有的文字轉音樂模型能夠產生高品質且多樣化的音樂信號。然而,僅用文字提示無法精確控制生成音樂的時間特徵,如和弦與節奏。為了解決這個問題,我們引入了 MusiConGen,一個基於時序條件控制的 Transformer文字轉音樂模型,基於預訓練的 MusicGen 框架進行構建。本研究之貢獻為提出消費級GPU之高效微調(finetuning)機制,它集成了自動提取的和弦與節奏特徵作為控制信號。在推理(inference)過程中,控制信號可以是從參考音訊信號中提取的音樂特徵,或是使用者定義的符號(symbolic)和弦序列、BPM和文字提示。我們對兩個數據集進行的性能評估——一個來自提取的控制特徵,另一個來自使用者創建的輸入——證明 MusiConGen 能生成與指定時序控制良好對齊的逼真音樂。 Existing text-to-music models can produce high-quality audio with great diversity. However, textual prompts alone cannot precisely control temporal musical features such as chords and rhythm of the generated music. To address this challenge, we introduce MusiConGen, a temporally-conditioned Transformer-based text-to-music model that builds upon the pretrained MusicGen framework. Our innovation lies in an efficient finetuning mechanism, tailored for consumer-grade GPUs, that integrates automatically-extracted chords and rhythm features as the control signal. During inference, the control can either be musical features extracted from a reference audio signal, or be user-defined symbolic chord sequence, BPM, and textual prompts. Our performance evaluation on two datasets---one derived from extracted features and the other from user-created inputs---demonstrates that MusiConGen can generate realistic music that aligns well with the specified temporal control. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/92773 |
DOI: | 10.6342/NTU202400889 |
Fulltext Rights: | 同意授權(限校園內公開) |
Appears in Collections: | 資料科學學位學程 |
Files in This Item:
File | Size | Format | |
---|---|---|---|
ntu-112-2.pdf Restricted Access | 7.88 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.