利用音樂理解模型增強多模態圖神經網絡的音樂推薦

管晟宇; Cheng-Yu Kuan

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101815

標題:	利用音樂理解模型增強多模態圖神經網絡的音樂推薦 Enhancing Multimodal Graph Neural Networks with Music Understanding Models for Music Recommendation
作者:	管晟宇 Cheng-Yu Kuan
指導教授:	曹承礎 Seng-Cho Chou
關鍵字:	多模態推薦,圖神經網路物品冷啟動推薦推薦系統音樂嵌入學習 Multimodal recommendation,Content-based recommender systemsGraph neural networksCold-start item recommendationRecommender systemsMusic representation learning
出版年 :	2026
學位:	碩士
摘要:	在音樂推薦系統的發展中，如何有效地對聲音資訊中的複雜特徵進行建模至關重要。本研究透過結合音樂理解模型與多模態圖神經網路 (GNN) 二者，直接從原始音訊中提取深層語義特徵並嵌入複雜的圖網路中，使模型得以在二分圖 (bipartite graph) 上有效傳遞基於內容的信號，提升音樂推薦系統的效能。此外，面對每日發行的大量新歌曲，音樂推薦系統常受限於嚴重的交互資訊稀缺與流行度偏差 (popularity bias);本研究藉由引入深層音樂嵌入向量，利用從聲音中提取的資訊對新歌曲進行推薦，使系統能降低對歷史互動數據的依賴，為音樂推薦領域中的「物品冷啟動」 (item cold-start) 問題提供一種解法。在 Music4All 資料集上的實驗結果顯示了內容特徵與推薦效能在不同場景下的關係。在互動數據豐富的「熱啟動」(warm-start) 場景下，純協同過濾模型 LightGCN 的表現優於實驗中其他多模態 GNN，顯示當使用者與物品間的互動信號充足時，加入其他模態資訊輔助的效益有限。然而，音樂特徵在「物品冷啟動」的情境下則十分重要。具體而言，在低互動的資料集上，DRAGON 模型的 Recall@20 指標在加入音樂特徵後顯著增長了 30.80\%。本研究證實聲音資訊在音樂推薦系統中是關鍵的信號；若與先進的 GNN 架構進行整合，能大幅改善長尾與小眾音樂的推薦效果，有助於建構完善的多模態推薦系統。 Effectively modeling the complex characteristics in audio content is fundamental to advancing personalized music recommendation systems. To achieve this, this thesis proposes a novel approach that enhances multimodal graph neural networks (GNNs) by integrating music understanding models, specifically MERT and Music2Vec. By extracting deep semantic features directly from raw audio waveforms and embedding these representations into sophisticated graph structures, the model effectively propagates content-based signals across the user-item bipartite graph. Furthermore, this approach provides a solution to the cold-start item recommendation problem—a persistent challenge arising from the high volume of daily releases, which results in severe information scarcity and entrenched popularity bias. By incorporating deep music embeddings, the system can recommend newly released items based on their inherent acoustic properties, reducing the dependency on historical interaction data. Experimental results on the Music4All dataset indicate a nuanced interplay between content features and recommendation performance. In warm-start scenarios, the purely collaborative filtering model LightGCN consistently outperformed all evaluated multimodal GNNs, suggesting that auxiliary modalities may offer limited additional benefit when dense interaction histories provide strong signals. Conversely, music representations become critical in cold-start item recommendation. Notably, the DRAGON architecture achieves a 30.80\% improvement in Recall@20 when transitioning from single to dual modalities on the low-count split dataset. Our findings highlights that acoustic information functions as an essential complementary signal which, when integrated through advanced GNN architectures, substantially improves the discovery of long-tail and niche music items.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101815
DOI:	10.6342/NTU202600712
全文授權:	同意授權(限校園內公開)
電子全文公開日期:	2026-03-05
顯示於系所單位：	資訊管理學系

文件中的檔案：

檔案	大小	格式
ntu-114-1.pdf 授權僅限NTU校內IP使用（校園外請利用VPN校外連線服務）	5.56 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。