保持內容一致性的學習方法：跨模態和跨項目的表示學習

蔡易儒; Yi-Ru Tsai

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/91322

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	鄭卜壬	zh_TW
dc.contributor.advisor	Pu-Jen Cheng	en
dc.contributor.author	蔡易儒	zh_TW
dc.contributor.author	Yi-Ru Tsai	en
dc.date.accessioned	2023-12-20T16:29:13Z	-
dc.date.available	2023-12-21	-
dc.date.copyright	2023-12-20	-
dc.date.issued	2023	-
dc.date.submitted	2023-11-30	-
dc.identifier.citation	[1] H. Akbari, L. Yuan, R. Qian, W.-H. Chuang, S.-F. Chang, Y. Cui, and B. Gong. Vatt: Transformers for multimodal self-supervised learning from raw video, audio and text. Advances in Neural Information Processing Systems, 34:24206–24221, 2021. [2] J.-J. Aucouturier and F. Pachet. Scaling up music playlist generation. In Proceedings. IEEE International Conference on Multimedia and Expo, volume 1, pages 105–108. IEEE, 2002. [3] J. Chen, H. Zhang, X. He, L. Nie, W. Liu, and T.-S. Chua. Attentive collaborative filtering: Multimedia recommendation with item-and component-level attention. In Proceedings of the 40th International ACM SIGIR conference on Research and Development in Information Retrieval, pages 335–344, 2017. [4] Q. Chen, J. Lin, Y. Zhang, H. Yang, J. Zhou, and J. Tang. Towards knowledgebased personalized product description generation in e-commerce. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 3040–3050, 2019. [5] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555, 2014. [6] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018. [7] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020. [8] M. D. Ekstrand, J. T. Riedl, J. A. Konstan, et al. Collaborative filtering recommender systems. Foundations and Trends® in Human–Computer Interaction, 4(2):81–173, 2011. [9] L. Festinger. A Theory of Cognitive Dissonance. Stanford University Press, Stanford, CA, 1957. [10] Y. Gong, X. Luo, K. Q. Zhu, W. Ou, Z. Li, and L. Duan. Automatic generation of chinese short product titles for mobile display. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 9460–9465, 2019. [11] D. Griffin and J. Lim. Signal estimation from modified short-time fourier transform. IEEE Transactions on acoustics, speech, and signal processing, 32(2):236–243, 1984. [12] M. Grootendorst. Keybert: Minimal keyword extraction with bert., 2020. [13] X. He, K. Deng, X. Wang, Y. Li, Y. Zhang, and M. Wang. Lightgcn: Simplifying and powering graph convolution network for recommendation. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, pages 639–648, 2020. [14] Y. He, J. Wang, W. Niu, and J. Caverlee. A hierarchical self-attentive model for recommending user-generated item lists. In Proceedings of the 28th ACM international conference on information and knowledge management, pages 1481–1490, 2019. [15] Y. He, Y. Zhang, W. Liu, and J. Caverlee. Consistency-aware recommendation for user-generated item list continuation. In Proceedings of the 13th international conference on web search and data mining, pages 250–258, 2020. [16] S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997. [17] Y.-C. Huang and S.-K. Jenor. An audio recommendation system based on audio signature description scheme in mpeg-7 audio. In 2004 IEEE International Conference on Multimedia and Expo (ICME)(IEEE Cat. No. 04TH8763), volume 1, pages 639–642. IEEE, 2004. [18] T. Joachims. A probabilistic analysis of the rocchio algorithm with tfidf for text categorization. Technical report, Carnegie-mellon univ pittsburgh pa dept of computer science, 1996. [19] B. Juarto and A. S. Girsang. Neural collaborative with sentence bert for news recommender system. JOIV: International Journal on Informatics Visualization, 5(4):448–455, 2021. [20] W.-C. Kang and J. McAuley. Self-attentive sequential recommendation. In 2018 IEEE international conference on data mining (ICDM), pages 197–206. IEEE, 2018. [21] P. Knees, T. Pohle, M. Schedl, and G. Widmer. Combining audio-based similarity with web-based data to accelerate automatic music playlist generation. In Proceedings of the 8th ACM international workshop on Multimedia information retrieval, pages 147–154, 2006. [22] C. Li, H. Xu, J. Tian, W. Wang, M. Yan, B. Bi, J. Ye, H. Chen, G. Xu, Z. Cao, et al. mplug: Effective and efficient vision-language learning by cross-modal skipconnections. arXiv preprint arXiv:2205.12005, 2022. [23] J. Li, D. Li, C. Xiong, and S. Hoi. Blip: Bootstrapping language-image pretraining for unified vision-language understanding and generation. In International Conference on Machine Learning, pages 12888–12900. PMLR, 2022. [24] J. Li, R. Selvaraju, A. Gotmare, S. Joty, C. Xiong, and S. C. H. Hoi. Align before fuse: Vision and language representation learning with momentum distillation. Advances in neural information processing systems, 34:9694–9705, 2021. [25] Q. Li, B. M. Kim, D. H. Guan, and D. w. Oh. A music recommender based on audio features. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’04, page 532–533, New York, NY, USA, 2004. Association for Computing Machinery. [26] X. Li, X. Yin, C. Li, P. Zhang, X. Hu, L. Zhang, L. Wang, H. Hu, L. Dong, F. Wei, et al. Oscar: Object-semantics aligned pre-training for vision-language tasks. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXX 16, pages 121–137. Springer, 2020. [27] B. Logan. Music recommendation from song sets. In ISMIR, pages 425–428, 2004. [28] B. Logan et al. Mel frequency cepstral coefficients for music modeling. In Ismir, volume 270, page 11. Plymouth, MA, 2000. [29] Z. Luo, Y. Xi, R. Zhang, and J. Ma. A frustratingly simple approach for end-to-end image captioning, 2022. [30] H. Ma, H. Yang, M. R. Lyu, and I. King. Sorec: social recommendation using probabilistic matrix factorization. In Proceedings of the 17th ACM conference on Information and knowledge management, pages 931–940, 2008. [31] F. Mitzalis, O. Caglayan, P. Madhyastha, and L. Specia. Bertgen: Multi-task generation through bert. arXiv preprint arXiv:2106.03484, 2021. [32] J. Ni, J. Li, and J. McAuley. Justifying recommendations using distantly-labeled reviews and fine-grained aspects. In Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pages 188–197, 2019. [33] K. O’Shea and R. Nash. An introduction to convolutional neural networks. arXiv preprint arXiv:1511.08458, 2015. [34] S. Pauws, W. Verhaegh, and M. Vossen. Fast generation of optimal music playlists using local search. In ISMIR, pages 138–143, 2006. [35] L. E. Peterson. K-nearest neighbor. Scholarpedia, 4(2):1883, 2009. [36] J. Platt, C. J. Burges, S. Swenson, C. Weare, and A. Zheng. Learning a gaussian process prior for automatically generating music playlists. Advances in neural information processing systems, 14, 2001. [37] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, et al. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021. [38] R. Ragno, C. J. Burges, and C. Herley. Inferring similarity between music objects with application to playlist generation. In Proceedings of the 7th ACM SIGMM international workshop on Multimedia information retrieval, pages 73–80, 2005. [39] N. Reimers and I. Gurevych. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084, 2019. [40] S. Rendle, C. Freudenthaler, Z. Gantner, and L. Schmidt-Thieme. Bpr: Bayesian personalized ranking from implicit feedback. arXiv preprint arXiv:1205.2618, 2012. [41] J. B. Schafer, D. Frankowski, J. Herlocker, and S. Sen. Collaborative filtering recommender systems. The adaptive web: methods and strategies of web personalization, pages 291–324, 2007. [42] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra. Gradcam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. [43] A. Sherstinsky. Fundamentals of recurrent neural network (rnn) and long short-term memory (lstm) network. Physica D: Nonlinear Phenomena, 404:132306, 2020. [44] F. Sun, J. Liu, J. Wu, C. Pei, X. Lin, W. Ou, and P. Jiang. Bert4rec: Sequential recommendation with bidirectional encoder representations from transformer. In Proceedings of the 28th ACM international conference on information and knowledge management, pages 1441–1450, 2019. [45] H. Tan and M. Bansal. Lxmert: Learning cross-modality encoder representations from transformers. arXiv preprint arXiv:1908.07490, 2019. [46] H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, and H. Jégou. Training data-efficient image transformers & distillation through attention. In International conference on machine learning, pages 10347–10357. PMLR, 2021. [47] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017. [48] T. Wang and Y. Fu. Item-based collaborative filtering with bert. In Proceedings of The 3rd Workshop on e-Commerce and NLP, pages 54–58, 2020. [49] X. Wang, X. He, M. Wang, F. Feng, and T.-S. Chua. Neural graph collaborative filtering. In Proceedings of the 42nd international ACM SIGIR conference on Research and development in Information Retrieval, pages 165–174, 2019. [50] T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz, et al. Huggingface’s transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771, 2019. [51] L. Yang, Z. Liu, Y. Dou, J. Ma, and P. S. Yu. Consisrec: Enhancing gnn for social recommendation via consistent neighbor aggregation. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’21, page 2141–2145, New York, NY, USA, 2021. Association for Computing Machinery. [52] X. Yang, Y. Ma, L. Liao, M. Wang, and T.-S. Chua. Transnfcm: Translation-based neural fashion compatibility modeling. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 403–410, 2019. [53] K. Yoshii, M. Goto, K. Komatani, T. Ogata, and H. G. Okuno. Hybrid collaborative and content-based music recommendation using probabilistic model with latent user preferences. In ISMIR, volume 6, pages 296–301, 2006. [54] J.-G. Zhang, P. Zou, Z. Li, Y. Wan, X. Pan, Y. Gong, and P. S. Yu. Multi-modal generative adversarial network for short product title generation in mobile e-commerce. arXiv preprint arXiv:1904.01735, 2019. [55] K. Zhou, H. Wang, W. X. Zhao, Y. Zhu, S. Wang, F. Zhang, Z. Wang, and J.-R. Wen. S3-rec: Self-supervised learning for sequential recommendation with mutual information maximization. In Proceedings of the 29th ACM international conference on information & knowledge management, pages 1893–1902, 2020.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/91322	-
dc.description.abstract	這篇論文的重點是解決推薦系統中學習內容一致性的挑戰。我們提出了一種新穎的模型，旨在學習跨模態和跨項目的表示，有效捕捉相似項目內容的文本和視覺語義。我們在這項研究中將嵌入應用於推薦系統和主題生成。廣泛實驗在三個真實的亞馬遜數據集上的結果表明，與現有的知名模型相比，在這兩個應用中都取得了顯著的改善。	zh_TW
dc.description.abstract	The paper focuses on tackling the challenge of learning content consistency in recommender systems. We introduce a novel model that aims to learn cross-modal and cross-item representations, effectively capturing the textual and visual semantics of similar item contents. We apply the embedding to the recommender system and topic generation in this research. The results of extensive experiments on three real Amazon datasets show significant improvement in both applications, compared to existing well-known models.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-12-20T16:29:13Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2023-12-20T16:29:13Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	Verification Letter from the Oral Examination Committee i Acknowledgements ii 摘要 iii Abstract iv Contents v List of Figures vii List of Tables ix Denotation xi Chapter 1 Introduction 1 Chapter 2 Related Works 7 2.1 Recommendation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1.1 Content-Based Models . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1.2 Collaborative Filtering . . . . . . . . . . . . . . . . . . . . . . . . 8 2.1.3 Hybrid Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 Topic Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.1 Text Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.2 Vision Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2.3 Vision-language Representation Learning . . . . . . . . . . . . . . 12 Chapter 3 Methodology 14 3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.1.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.2 Aligned Cross-Modal and Cross-Item Encoder . . . . . . . . . . . . 15 3.2.1 Aligned Cross-Modal Encoder . . . . . . . . . . . . . . . . . . . . 15 3.2.1.1 Image-Text Contrastive Loss (ITC) . . . . . . . . . . . 15 3.2.1.2 Masked Language Modeling Loss (MLM) . . . . . . . 16 3.2.1.3 Image-Text Matching Loss (ITM) . . . . . . . . . . . . 16 3.2.2 Aligned Cross-Item Encoder . . . . . . . . . . . . . . . . . . . . . 17 3.2.2.1 Item-Item Contrastive Loss (IIC) . . . . . . . . . . . . 17 3.2.2.2 Item-Item Matching Loss (IIM) . . . . . . . . . . . . . 17 3.3 Embedding Propagation Model . . . . . . . . . . . . . . . . . . . . 18 3.4 Consistent Content Decoder . . . . . . . . . . . . . . . . . . . . . . 19 3.5 Recommender System . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.6 Bayesian Personalized Ranking Bonus Loss . . . . . . . . . . . . . . 20 Chapter 4 Experiments 24 4.1 Dataset and Experimental Settings . . . . . . . . . . . . . . . . . . . 24 4.2 Experimental Results and Discussion . . . . . . . . . . . . . . . . . 25 4.2.1 Recommendation . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.2.2 Consistent Content Topic Generation . . . . . . . . . . . . . . . . . 28 4.3 Ablation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Chapter 5 Conclusions 36 References 37	-
dc.language.iso	en	-
dc.subject	主題生成	zh_TW
dc.subject	交叉注意	zh_TW
dc.subject	跨模態	zh_TW
dc.subject	跨項目	zh_TW
dc.subject	交叉注意	zh_TW
dc.subject	跨模態	zh_TW
dc.subject	跨項目	zh_TW
dc.subject	主題生成	zh_TW
dc.subject	cross-item	en
dc.subject	topic generation	en
dc.subject	cross-modal	en
dc.subject	cross-attention	en
dc.subject	topic generation	en
dc.subject	cross-item	en
dc.subject	cross-modal	en
dc.subject	cross-attention	en
dc.title	保持內容一致性的學習方法：跨模態和跨項目的表示學習	zh_TW
dc.title	YR-REC: Yoked and Refined Representation with Content Consistency for Recommendation and Explanation	en
dc.type	Thesis	-
dc.date.schoolyear	112-1	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	曾新穆;高宏宇;江俊宇;黃瀚萱	zh_TW
dc.contributor.oralexamcommittee	Shin-Mu Tseng;Hung-Yu Kao;Jyun-Yu Jiang;Hen-Hsen Huang	en
dc.subject.keyword	交叉注意,跨模態,跨項目,主題生成,	zh_TW
dc.subject.keyword	cross-attention,cross-modal,cross-item,topic generation,	en
dc.relation.page	44	-
dc.identifier.doi	10.6342/NTU202301098	-
dc.rights.note	未授權	-
dc.date.accepted	2023-12-01	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	資訊網路與多媒體研究所	-
顯示於系所單位：	資訊網路與多媒體研究所

文件中的檔案：

檔案	大小	格式
ntu-112-1.pdf 未授權公開取用	3.65 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。