基於Transformer模型節奏、和弦與文字控制之音樂生成研究

藍雲瀚; Yun-Han Lan

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/92773

標題:	基於Transformer模型節奏、和弦與文字控制之音樂生成研究 MusiConGen: Rhythm and Chord Control for Transformer-Based Text-to-Music Generation
作者:	藍雲瀚 Yun-Han Lan
指導教授:	楊奕軒 Yi-Hsuan Yang
共同指導教授:	鄭皓中 Hao-Chung Cheng
關鍵字:	音樂,大型語言模型,生成式模型,控制, Music,LLM,Generative model,Control,
出版年 :	2024
學位:	碩士
摘要:	現有的文字轉音樂模型能夠產生高品質且多樣化的音樂信號。然而，僅用文字提示無法精確控制生成音樂的時間特徵，如和弦與節奏。為了解決這個問題，我們引入了 MusiConGen，一個基於時序條件控制的 Transformer文字轉音樂模型，基於預訓練的 MusicGen 框架進行構建。本研究之貢獻為提出消費級GPU之高效微調(finetuning)機制，它集成了自動提取的和弦與節奏特徵作為控制信號。在推理(inference)過程中，控制信號可以是從參考音訊信號中提取的音樂特徵，或是使用者定義的符號(symbolic)和弦序列、BPM和文字提示。我們對兩個數據集進行的性能評估——一個來自提取的控制特徵，另一個來自使用者創建的輸入——證明 MusiConGen 能生成與指定時序控制良好對齊的逼真音樂。 Existing text-to-music models can produce high-quality audio with great diversity. However, textual prompts alone cannot precisely control temporal musical features such as chords and rhythm of the generated music. To address this challenge, we introduce MusiConGen, a temporally-conditioned Transformer-based text-to-music model that builds upon the pretrained MusicGen framework. Our innovation lies in an efficient finetuning mechanism, tailored for consumer-grade GPUs, that integrates automatically-extracted chords and rhythm features as the control signal. During inference, the control can either be musical features extracted from a reference audio signal, or be user-defined symbolic chord sequence, BPM, and textual prompts. Our performance evaluation on two datasets---one derived from extracted features and the other from user-created inputs---demonstrates that MusiConGen can generate realistic music that aligns well with the specified temporal control.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/92773
DOI:	10.6342/NTU202400889
全文授權:	同意授權(限校園內公開)
顯示於系所單位：	資料科學學位學程

文件中的檔案：

檔案	大小	格式
ntu-112-2.pdf 目前未授權公開取用	7.88 MB	Adobe PDF	檢視/開啟

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。