Please use this identifier to cite or link to this item:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/96349| Title: | MTSTRec: 多模態時間對齊共享標籤推薦系統 MTSTRec: Multimodal Time-Aligned Shared Token Recommender |
| Authors: | 徐嬿鎔 Yen-Jung Hsu |
| Advisor: | 林澤 Che Lin |
| Keyword: | 多模態序列推薦,時間對齊共享標籤,圖像風格表示,大型語言模型, multimodal sequential recommendation,time-aligned shared token,image style representation,large language model, |
| Publication Year : | 2024 |
| Degree: | 碩士 |
| Abstract: | 在電子商務中,序列推薦透過分析用戶匿名的瀏覽記錄,無需依賴個人資訊來提供個性化的產品推薦。儘管基於商品ID的序列推薦已廣泛使用於實際應用中,但它往往無法充分捕捉影響用戶偏好和購買意願的多樣因素,如文字描述、視覺內容和價格等,這些因素在推薦系統中分別代表不同的模態。現有的多模態序列推薦模型主要採用早期或晚期融合方法。然而,早期融合忽略了機器感知模型通常針對特定模態進行優化,而晚期融合則忽略了用戶瀏覽偏好中產品序列對應位置的時間對齊問題。為了解決這些限制,本文提出了一個統一的多模態融合框架——多模態時間對齊共享標籤推薦系統(Multimodal Time-aligned Shared Token Recommender; MTSTRec)。MTSTRec 採用基於Transformer的架構,為每個產品引入單一的時間對齊共享標籤,實現高效的跨模態融合,同時保留不同模態的時間對齊特性。這一方法不僅保留了每個模態的獨特貢獻,還能更好地對齊它們,從而更準確地捕捉用戶偏好。此外,該模型從文本、圖像和其他產品數據中提取豐富特徵,提供更全面的用戶決策表徵,更好地去模擬使用者場景。大量實驗表明,MTSTRec 在多個序列推薦基準上達到最傑出的表現,顯著提升了現有的多模態融合策略。 Sequential recommendation in e-commerce leverages users' anonymous browsing histories to offer personalized product suggestions without relying on private information. While item ID-based sequential recommendations are commonly used, they often fail to fully capture the diverse factors influencing user preferences, such as textual descriptions, visual content, and pricing. These factors represent distinct modalities in recommender systems. Existing multimodal sequential recommendation models typically employ either early or late fusion of different modalities, overlooking the alignment of corresponding positions in time of product sequences that represent users' browsing preferences. To address these limitations, this paper proposes a unified framework for multimodal fusion in recommender systems, introducing the Multimodal Time-aligned Shared Token Recommender (MTSTRec). MTSTRec leverages a transformer-based architecture that incorporates a single time-aligned shared token for each product, allowing for efficient cross-modality fusion that also aligns in time. This approach not only preserves the distinct contributions of each modality but also aligns them to better capture user preferences. Additionally, the model extracts rich features from text, images, and other product data, offering a more comprehensive representation of user decision-making in e-commerce. Extensive experiments demonstrate that MTSTRec achieves state-of-the-art performance across multiple sequential recommendation benchmarks, significantly improving upon existing multimodal fusion strategies. |
| URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/96349 |
| DOI: | 10.6342/NTU202404633 |
| Fulltext Rights: | 未授權 |
| Appears in Collections: | 電信工程學研究所 |
Files in This Item:
| File | Size | Format | |
|---|---|---|---|
| ntu-113-1.pdf Restricted Access | 9.83 MB | Adobe PDF |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
