Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電信工程學研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/96349
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor林澤zh_TW
dc.contributor.advisorChe Linen
dc.contributor.author徐嬿鎔zh_TW
dc.contributor.authorYen-Jung Hsuen
dc.date.accessioned2024-12-24T16:28:39Z-
dc.date.available2024-12-25-
dc.date.copyright2024-12-24-
dc.date.issued2024-
dc.date.submitted2024-12-04-
dc.identifier.citation[1] Yehuda Koren, Steffen Rendle, and Robert Bell. Advances in collaborative filtering. Recommender systems handbook, pages 91–142, 2021.
[2] Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. Item-based collaborative filtering recommendation algorithms. In Proceedings of the 10th international conference on World Wide Web, pages 285–295, 2001.
[3] Guy Shani, David Heckerman, Ronen I Brafman, and Craig Boutilier. An mdp-based recommender system. Journal of machine Learning research, 6(9), 2005.
[4] B Hidasi. Session-based recommendations with recurrent neural networks. arXiv preprint arXiv:1511.06939, 2015.
[5] Wang-Cheng Kang and Julian McAuley. Self-attentive sequential recommendation. In IEEE international conference on data mining (ICDM), pages 197–206. IEEE, 2018.
[6] Fei Sun, Jun Liu, Jian Wu, Changhua Pei, Xiao Lin, Wenwu Ou, and Peng Jiang. Bert4rec: Sequential recommendation with bidirectional encoder representations from transformer. In Proceedings of the 28th ACM international conference on information and knowledge management, pages 1441–1450, 2019.
[7] K O’Shea. An introduction to convolutional neural networks. arXiv preprint arXiv:1511.08458, 2015.
[8] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
[9] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pretraining of deep bidirectional transformers for language understanding. In North American Chapter of the Association for Computational Linguistics, 2019.
[10] Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
[11] Shijie Geng, Shuchang Liu, Zuohui Fu, Yingqiang Ge, and Yongfeng Zhang. Recommendation as language processing (rlp): A unified pretrain, personalized prompt & predict paradigm (p5). In Proceedings of the 16th ACM Conference on Recommender Systems, pages 299–315, 2022.
[12] Hanjia Lyu, Song Jiang, Hanqing Zeng, Yinglong Xia, Qifan Wang, Si Zhang, Ren Chen, Chris Leung, Jiajie Tang, and Jiebo Luo. LLM-rec: Personalized recommendation via prompting large language models. In Kevin Duh, Helena Gomez, and Steven Bethard, editors, Findings of the Association for Computational Linguistics: NAACL, pages 583–612. Association for Computational Linguistics, 2024.
[13] Berke Ugurlu. Style4rec: Enhancing transformer-based e-commerce recommendation systems with style and shopping cart information. 臺灣大學電信工程學研究所學位論文, pages 1–59, 2023.
[14] Ruining He and Julian McAuley. Vbpr: visual bayesian personalized ranking from implicit feedback. In Proceedings of the AAAI conference on artificial intelligence, volume 30, 2016.
[15] Fan Liu, Zhiyong Cheng, Changchang Sun, Yinglong Wang, Liqiang Nie, and Mohan Kankanhalli. User diverse preference modeling by multimodal attentive metric learning. In Proceedings of the 27th ACM international conference on multimedia, pages 1526–1534, 2019.
[16] Jiahao Liang, Xiangyu Zhao, Muyang Li, Zijian Zhang, Wanyu Wang, Haochen Liu, and Zitao Liu. Mmmlp: Multi-modal multilayer perceptron for sequential recommendations. In Proceedings of the ACM Web Conference, pages 1109–1117, 2023.
[17] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. 3rd International Conference on Learning Representations (ICLR), pages 1–14, 2015.
[18] A Vaswani. Attention is all you need. Advances in Neural Information Processing Systems, 2017.
[19] Qidong Liu, Jiaxi Hu, Yutian Xiao, Xiangyu Zhao, Jingtong Gao, Wanyu Wang, Qing Li, and Jiliang Tang. Multimodal recommender systems: A survey. ACM Comput. Surv., 2024.
[20] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2021.
[21] Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad AlDahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, et al. The llama 3 herd of models. arXiv preprint arXiv:2407.21783, 2024.
[22] Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
[23] Zihuai Zhao, Wenqi Fan, Jiatong Li, Yunqing Liu, Xiaowei Mei, Yiqi Wang, Zhen Wen, Fei Wang, Xiangyu Zhao, Jiliang Tang, et al. Recommender systems in the era of large language models (llms). arXiv preprint arXiv:2307.02046, 2023.
[24] Ruyu Li, Wenhao Deng, Yu Cheng, Zheng Yuan, Jiaqi Zhang, and Fajie Yuan. Exploring the upper limits of text-based collaborative filtering using large language models: Discoveries and insights. arXiv preprint arXiv:2305.11700, 2023.
[25] Likang Wu, Zhi Zheng, Zhaopeng Qiu, Hao Wang, Hongchao Gu, Tingjia Shen, Chuan Qin, Chen Zhu, Hengshu Zhu, Qi Liu, et al. A survey on large language models for recommendation. World Wide Web, 27(5):60, 2024.
[26] Hongyu Zhou, Xin Zhou, Zhiwei Zeng, Lingzi Zhang, and Zhiqi Shen. A comprehensive survey on multimodal recommender systems: Taxonomy, evaluation, and future directions. arXiv preprint arXiv:2302.04473, 2023.
[27] Chuhan Wu, Fangzhao Wu, Tao Qi, and Yongfeng Huang. Mm-rec: multimodal news recommendation. arXiv preprint arXiv:2104.07407, 2021.
[28] Leon A Gatys. A neural algorithm of artistic style. arXiv preprint arXiv:1508.06576, 2015.
[29] Leon A Gatys, Alexander S Ecker, and Matthias Bethge. Image style transfer using convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2414–2423, 2016.
[30] Chu-Chun Yu, Ming-Yi Hong, Chiok-Yew Ho, and Che Lin. Push4rec: Temporal and contextual trend-aware transformer push notification recommender. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6625–6629. IEEE, 2024.
[31] Chun-Kai Huang, Yi-Hsien Hsieh, Ta-Jung Chien, Li-Cheng Chien, Shao-Hua Sun, Tung-Hung Su, Jia-Horng Kao, and Che Lin. Scalable numerical embeddings for multivariate time series: Enhancing healthcare data representation learning. arXiv preprint arXiv:2405.16557, 2024.
[32] Arsha Nagrani, Shan Yang, Anurag Arnab, Aren Jansen, Cordelia Schmid, and Chen Sun. Attention bottlenecks for multimodal fusion. Advances in neural information processing systems, 34:14200–14213, 2021.
[33] Zaiqiao Meng, Richard McCreadie, Craig Macdonald, and Iadh Ounis. Exploring data splitting strategies for the evaluation of recommendation models. In Proceedings of the 14th ACM conference on recommender systems, pages 681–686, 2020.
[34] Balázs Hidasi and Alexandros Karatzoglou. Recurrent neural networks with top-k gains for session-based recommendations. In Proceedings of the 27th ACM international conference on information and knowledge management, pages 843–852, 2018.
[35] OpenAI. Openai gpt-4o api [gpt-4o-mini], 2024.
[36] OpenAI. Openai embedding api [text-embedding-3-large], 2024.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/96349-
dc.description.abstract在電子商務中,序列推薦透過分析用戶匿名的瀏覽記錄,無需依賴個人資訊來提供個性化的產品推薦。儘管基於商品ID的序列推薦已廣泛使用於實際應用中,但它往往無法充分捕捉影響用戶偏好和購買意願的多樣因素,如文字描述、視覺內容和價格等,這些因素在推薦系統中分別代表不同的模態。現有的多模態序列推薦模型主要採用早期或晚期融合方法。然而,早期融合忽略了機器感知模型通常針對特定模態進行優化,而晚期融合則忽略了用戶瀏覽偏好中產品序列對應位置的時間對齊問題。為了解決這些限制,本文提出了一個統一的多模態融合框架——多模態時間對齊共享標籤推薦系統(Multimodal Time-aligned Shared Token Recommender; MTSTRec)。MTSTRec 採用基於Transformer的架構,為每個產品引入單一的時間對齊共享標籤,實現高效的跨模態融合,同時保留不同模態的時間對齊特性。這一方法不僅保留了每個模態的獨特貢獻,還能更好地對齊它們,從而更準確地捕捉用戶偏好。此外,該模型從文本、圖像和其他產品數據中提取豐富特徵,提供更全面的用戶決策表徵,更好地去模擬使用者場景。大量實驗表明,MTSTRec 在多個序列推薦基準上達到最傑出的表現,顯著提升了現有的多模態融合策略。zh_TW
dc.description.abstractSequential recommendation in e-commerce leverages users' anonymous browsing histories to offer personalized product suggestions without relying on private information. While item ID-based sequential recommendations are commonly used, they often fail to fully capture the diverse factors influencing user preferences, such as textual descriptions, visual content, and pricing. These factors represent distinct modalities in recommender systems. Existing multimodal sequential recommendation models typically employ either early or late fusion of different modalities, overlooking the alignment of corresponding positions in time of product sequences that represent users' browsing preferences. To address these limitations, this paper proposes a unified framework for multimodal fusion in recommender systems, introducing the Multimodal Time-aligned Shared Token Recommender (MTSTRec). MTSTRec leverages a transformer-based architecture that incorporates a single time-aligned shared token for each product, allowing for efficient cross-modality fusion that also aligns in time. This approach not only preserves the distinct contributions of each modality but also aligns them to better capture user preferences. Additionally, the model extracts rich features from text, images, and other product data, offering a more comprehensive representation of user decision-making in e-commerce. Extensive experiments demonstrate that MTSTRec achieves state-of-the-art performance across multiple sequential recommendation benchmarks, significantly improving upon existing multimodal fusion strategies.en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-12-24T16:28:39Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2024-12-24T16:28:39Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontentsVerification Letter from the Oral Examination Committee i
Acknowledgements iii
摘要 v
Abstract vii
Contents ix
List of Figures xiii
List of Tables xv
Chapter 1 Introduction 1
1.1 Recommendation Systems . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Multimodal Recommendation Systems . . . . . . . . . . . . . . . . 2
1.3 Multimodal Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Research Contributions . . . . . . . . . . . . . . . . . . . . . . . . . 3
Chapter 2 Related Work 7
2.1 Sequential Recommendation Systems . . . . . . . . . . . . . . . . . 7
2.2 Multimodal Recommendation Systems . . . . . . . . . . . . . . . . 8
Chapter 3 Proposed Method: MTSTRec 11
3.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2 Feature Extractor Module . . . . . . . . . . . . . . . . . . . . . . . 13
3.2.1 ID Extractor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2.2 Style Extractor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2.3 Text Extractor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2.4 Prompt-Text Extractor . . . . . . . . . . . . . . . . . . . . . . . . 16
3.2.5 Price Extractor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3 Multimodal Transformer with Time-aligned Shared Token Fusion . . 20
3.3.1 Self-Attention Encoder . . . . . . . . . . . . . . . . . . . . . . . . 21
3.3.2 Time-aligned Shared Token Fusion with Multimodal Integration . . 23
3.4 Loss Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Chapter 4 Experimental Settings 29
4.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.2 Dataset Splitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.3 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.3.1 Normalized Discounted Cumulative Gain . . . . . . . . . . . . . . 32
4.3.2 Hit Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.3.3 Mean Reciprocal Rank . . . . . . . . . . . . . . . . . . . . . . . . 37
4.4 Benchmark Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.5 Implementation details . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.5.1 Prompt-Text Configuration . . . . . . . . . . . . . . . . . . . . . . 41
Chapter 5 Result and Disscussion 43
5.1 Performance Comparison (RQ1) . . . . . . . . . . . . . . . . . . . . 43
5.2 Ablation Study of Modalities (RQ2) . . . . . . . . . . . . . . . . . . 45
5.3 Ablation Study of Fusion Module (RQ3) . . . . . . . . . . . . . . . 47
5.4 Comparison of Shared Tokens Configurations (RQ4) . . . . . . . . . 48
5.5 The Impact of ID Modules Across Different Datasets . . . . . . . . . 50
5.6 Analysis of Text and Prompt-Text Embedding in MTSTRec . . . . . 52
5.6.1 Comparison of Language Models for Text Embedding . . . . . . . . 52
5.6.2 Comparison of Prompt Strategies and LLMs in MTSTRec . . . . . 52
5.6.3 Performance of Prompt-Text Embedding and Gating Weights . . . . 54
Chapter 6 Conclusion and Future Work 57
References 59
-
dc.language.isoen-
dc.subject大型語言模型zh_TW
dc.subject圖像風格表示zh_TW
dc.subject時間對齊共享標籤zh_TW
dc.subject多模態序列推薦zh_TW
dc.subjectlarge language modelen
dc.subjectmultimodal sequential recommendationen
dc.subjecttime-aligned shared tokenen
dc.subjectimage style representationen
dc.titleMTSTRec: 多模態時間對齊共享標籤推薦系統zh_TW
dc.titleMTSTRec: Multimodal Time-Aligned Shared Token Recommenderen
dc.typeThesis-
dc.date.schoolyear113-1-
dc.description.degree碩士-
dc.contributor.oralexamcommittee王志宇;王釧茹;蔡銘峰zh_TW
dc.contributor.oralexamcommitteeChih-Yu Wang;Chuan-Ju Wang;Ming-Feng Tsaien
dc.subject.keyword多模態序列推薦,時間對齊共享標籤,圖像風格表示,大型語言模型,zh_TW
dc.subject.keywordmultimodal sequential recommendation,time-aligned shared token,image style representation,large language model,en
dc.relation.page64-
dc.identifier.doi10.6342/NTU202404633-
dc.rights.note未授權-
dc.date.accepted2024-12-04-
dc.contributor.author-college電機資訊學院-
dc.contributor.author-dept電信工程學研究所-
顯示於系所單位:電信工程學研究所

文件中的檔案:
檔案 大小格式 
ntu-113-1.pdf
  未授權公開取用
9.83 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved