MTSTRec: 多模態時間對齊共享標籤推薦系統

徐嬿鎔; Yen-Jung Hsu

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/96349

標題:	MTSTRec: 多模態時間對齊共享標籤推薦系統 MTSTRec: Multimodal Time-Aligned Shared Token Recommender
作者:	徐嬿鎔 Yen-Jung Hsu
指導教授:	林澤 Che Lin
關鍵字:	多模態序列推薦,時間對齊共享標籤,圖像風格表示,大型語言模型, multimodal sequential recommendation,time-aligned shared token,image style representation,large language model,
出版年 :	2024
學位:	碩士
摘要:	在電子商務中，序列推薦透過分析用戶匿名的瀏覽記錄，無需依賴個人資訊來提供個性化的產品推薦。儘管基於商品ID的序列推薦已廣泛使用於實際應用中，但它往往無法充分捕捉影響用戶偏好和購買意願的多樣因素，如文字描述、視覺內容和價格等，這些因素在推薦系統中分別代表不同的模態。現有的多模態序列推薦模型主要採用早期或晚期融合方法。然而，早期融合忽略了機器感知模型通常針對特定模態進行優化，而晚期融合則忽略了用戶瀏覽偏好中產品序列對應位置的時間對齊問題。為了解決這些限制，本文提出了一個統一的多模態融合框架——多模態時間對齊共享標籤推薦系統（Multimodal Time-aligned Shared Token Recommender; MTSTRec）。MTSTRec 採用基於Transformer的架構，為每個產品引入單一的時間對齊共享標籤，實現高效的跨模態融合，同時保留不同模態的時間對齊特性。這一方法不僅保留了每個模態的獨特貢獻，還能更好地對齊它們，從而更準確地捕捉用戶偏好。此外，該模型從文本、圖像和其他產品數據中提取豐富特徵，提供更全面的用戶決策表徵，更好地去模擬使用者場景。大量實驗表明，MTSTRec 在多個序列推薦基準上達到最傑出的表現，顯著提升了現有的多模態融合策略。 Sequential recommendation in e-commerce leverages users' anonymous browsing histories to offer personalized product suggestions without relying on private information. While item ID-based sequential recommendations are commonly used, they often fail to fully capture the diverse factors influencing user preferences, such as textual descriptions, visual content, and pricing. These factors represent distinct modalities in recommender systems. Existing multimodal sequential recommendation models typically employ either early or late fusion of different modalities, overlooking the alignment of corresponding positions in time of product sequences that represent users' browsing preferences. To address these limitations, this paper proposes a unified framework for multimodal fusion in recommender systems, introducing the Multimodal Time-aligned Shared Token Recommender (MTSTRec). MTSTRec leverages a transformer-based architecture that incorporates a single time-aligned shared token for each product, allowing for efficient cross-modality fusion that also aligns in time. This approach not only preserves the distinct contributions of each modality but also aligns them to better capture user preferences. Additionally, the model extracts rich features from text, images, and other product data, offering a more comprehensive representation of user decision-making in e-commerce. Extensive experiments demonstrate that MTSTRec achieves state-of-the-art performance across multiple sequential recommendation benchmarks, significantly improving upon existing multimodal fusion strategies.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/96349
DOI:	10.6342/NTU202404633
全文授權:	未授權
顯示於系所單位：	電信工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-113-1.pdf 未授權公開取用	9.83 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。