Skip navigation

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More
DSpace logo
English
中文
  • Browse
    • Communities
      & Collections
    • Publication Year
    • Author
    • Title
    • Subject
    • Advisor
  • Search TDR
  • Rights Q&A
    • My Page
    • Receive email
      updates
    • Edit Profile
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資料科學學位學程
Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/92773
Title: 基於Transformer模型節奏、和弦與文字控制之音樂生成研究
MusiConGen: Rhythm and Chord Control for Transformer-Based Text-to-Music Generation
Authors: 藍雲瀚
Yun-Han Lan
Advisor: 楊奕軒
Yi-Hsuan Yang
Co-Advisor: 鄭皓中
Hao-Chung Cheng
Keyword: 音樂,大型語言模型,生成式模型,控制,
Music,LLM,Generative model,Control,
Publication Year : 2024
Degree: 碩士
Abstract: 現有的文字轉音樂模型能夠產生高品質且多樣化的音樂信號。然而,僅用文字提示無法精確控制生成音樂的時間特徵,如和弦與節奏。為了解決這個問題,我們引入了 MusiConGen,一個基於時序條件控制的 Transformer文字轉音樂模型,基於預訓練的 MusicGen 框架進行構建。本研究之貢獻為提出消費級GPU之高效微調(finetuning)機制,它集成了自動提取的和弦與節奏特徵作為控制信號。在推理(inference)過程中,控制信號可以是從參考音訊信號中提取的音樂特徵,或是使用者定義的符號(symbolic)和弦序列、BPM和文字提示。我們對兩個數據集進行的性能評估——一個來自提取的控制特徵,另一個來自使用者創建的輸入——證明 MusiConGen 能生成與指定時序控制良好對齊的逼真音樂。
Existing text-to-music models can produce high-quality audio with great diversity. However, textual prompts alone cannot precisely control temporal musical features such as chords and rhythm of the generated music. To address this challenge, we introduce MusiConGen, a temporally-conditioned Transformer-based text-to-music model that builds upon the pretrained MusicGen framework. Our innovation lies in an efficient finetuning mechanism, tailored for consumer-grade GPUs, that integrates automatically-extracted chords and rhythm features as the control signal. During inference, the control can either be musical features extracted from a reference audio signal, or be user-defined symbolic chord sequence, BPM, and textual prompts. Our performance evaluation on two datasets---one derived from extracted features and the other from user-created inputs---demonstrates that MusiConGen can generate realistic music that aligns well with the specified temporal control.
URI: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/92773
DOI: 10.6342/NTU202400889
Fulltext Rights: 同意授權(限校園內公開)
metadata.dc.date.embargo-lift: 2029-06-12
Appears in Collections:資料科學學位學程

Files in This Item:
File SizeFormat 
ntu-112-2.pdf
  Restricted Access
7.88 MBAdobe PDFView/Open
Show full item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved