各種多臂賭博機演算法在個人化推薦之效率、效能與穩健性比較

蔡立忠; Li-Chung Tsai

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/90005

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	黃從仁	zh_TW
dc.contributor.advisor	Tsung-Ren Huang	en
dc.contributor.author	蔡立忠	zh_TW
dc.contributor.author	Li-Chung Tsai	en
dc.date.accessioned	2023-09-22T17:01:28Z	-
dc.date.available	2023-11-09	-
dc.date.copyright	2023-09-22	-
dc.date.issued	2023	-
dc.date.submitted	2023-07-25	-
dc.identifier.citation	Auer, P., Cesa-Bianchi, N., and Fischer, P. (2002). Finite-time analysis of the multiarmed bandit problem. Machine learning, 47:235–256. Chapelle, O. and Li, L. (2011). An empirical evaluation of thompson sampling. Advances in neural information processing systems, 24. Das, D., Sahoo, L., and Datta, S. (2017). A survey on recommendation system. International Journal of Computer Applications, 160(7). Davidson, J., Liebald, B., Liu, J., Nandy, P., Van Vleet, T., Gargi, U., Gupta, S., He, Y.,Lambert, M., Livingston, B., et al. (2010). The youtube video recommendation system. In Proceedings of the fourth ACM conference on Recommender systems, pages 293–296. Gomez-Uribe, C. A. and Hunt, N. (2015). The netflix recommender system: Algorithms, business value, and innovation. ACM Transactions on Management Information Systems (TMIS), 6(4):1–19. Gordon, N. J., Salmond, D. J., and Smith, A. F. (1993). Novel approach to nonlinear/non-gaussian bayesian state estimation. In IEE proceedings F (radar and signal processing),volume 140, pages 107–113. IET. Kawale, J., Bui, H. H., Kveton, B., Tran-Thanh, L., and Chawla, S. (2015). Efficient thompson sampling for online matrix-factorization recommendation. Advances inneural information processing systems, 28. Mnih, A. and Salakhutdinov, R. R. (2007). Probabilistic matrix factorization. Advances in neural information processing systems, 20. Nocedal, J. and Wright, S. J. (2006). Quadratic programming. Numerical optimization,pages 448–492. Sanz-Cruzado, J., Castells, P., and López, E. (2019). A simple multi-armed nearest-neighbor bandit for interactive recommendation. In Proceedings of the 13th ACMConference on Recommender Systems, pages 358–362. Shani, G. and Gunawardana, A. (2011). Evaluating recommendation systems.Recommender systems handbook, pages 257–297. Silva, N., Silva, T., Werneck, H., Rocha, L., and Pereira, A. (2023). User cold-start prob-lem in multi-armed bandits: when the first recommendations guide the user＇s experi-ence. ACM Transactions on Recommender Systems, 1(1):1–24. Silva, N., Werneck, H., Silva, T., Pereira, A. C., and Rocha, L. (2022a). Multi-armed ban-dits in recommendation systems: A survey of the state-of-the-art and future directions. Expert Systems with Applications, 197:116669. Silva, T., Silva, N., Werneck, H., Mito, C., Pereira, A. C., and Rocha, L. (2022b). irec:An interactive recommendation framework. In Proceedings of the 45th InternationalACM SIGIR Conference on Research and Development in Information Retrieval, pages3165–3175. Yogeswaran, M. and Ponnambalam, S. (2012). Reinforcement learning: exploration–exploitation dilemma in multi-agent foraging task. Opsearch, 49:223–236. Zhao, X., Zhang, W., and Wang, J. (2013). Interactive collaborative filtering. In Proceedings of the 22nd ACM international conference on Information & Knowledge Management, pages 1411–1420.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/90005	-
dc.description.abstract	推薦系統（recommendation system）在當今數位時代扮演著重要的角色。它們是商業領域中的關鍵技術，被廣泛應用於電子商務、社交媒體、音樂和影視娛樂等領域，它們能夠提供個性化的體驗，提高銷售轉換率，增加用戶參與度，促進跨銷和交叉銷售，並增加競爭力。隨著數據和機器學習（machine learning）技術的不斷進步，推薦系統在未來將繼續發揮重要的作用，並不斷優化和創新，以滿足用戶和商業需求的不斷變化。因此，演算法的評估方式是否能夠有效模擬現實情境，顯得格外重要。過去的演算法評估方式大多基於線下資料集，然而這些方法與現實世界中的用戶互動存在一定差距。因此，為了更貼近真實環境，本研究除了使用傳統非機率性預測方式，同時採用了多臂賭博機（Multi-Armed Bandit）演算法並加入機率性模擬的預測方式進行兩者比較。本研究同時針對不同的商業目的，將演算法的目標分為前期效率（efficiency）、長期效能（effectiveness）、跨情境穩健性（robustness）三大指標進行比較。通過這種機率性模擬的方式，我們能夠更好地模擬用戶的真實反應。這種模擬方式能夠重複推薦相同的產品，並且更全面地評估演算法的效果。與過去的評估方式相比，我們的研究結果顯示出截然不同且更真實現實的結果。最終本研究將給予各種資料庫特性與商業目的下，建議的演算法，以提供各大平台參考，利於商業發展。	zh_TW
dc.description.abstract	Recommendation systems are vital in the digital era, powering e-commerce, social media, music streaming, and more. They enhance user experiences, drive conversions, and promote cross-selling. Evaluating these systems is crucial, and our study compares algorithms using probabilistic simulation for real-world conditions. We assess efficiency, effectiveness, and robustness, essential metrics for diverse business objectives. By employing probabilistic simulation, we try to simulate user responses in real life and evaluate algorithm performance comprehensively. This approach enables repeated recommendations and yields distinct, realistic results compared to traditional evaluation methods. Our research provides valuable insights for algorithm selection, considering database characteristics and specific business goals. These findings empower major platforms to optimize their recommendation systems and drive business growth. In summary, our study demonstrates the importance of accurately evaluating recommendation algorithms in real-world contexts, highlighting the benefits of probabilistic simulation for improving system performance and user satisfaction.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-09-22T17:01:28Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2023-09-22T17:01:28Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	口試委員審定書 i 致謝 ii 摘要 iii Abstract iv 目錄 v 圖目錄 viii 表目錄 ix 符號列表 x 第一章緒論 1 1.1 研究背景............... 1 1.2 研究動機............... 2 1.3 研究架構............... 4 第二章相關文獻 5 2.1 ε-Greedy演算法 ............................. 6 2.2 UpperConfidenceBound(UCB)演算法 ............... 6 2.2.1 LinearUCB.............................. 7 2.3 ThompsonSampling(TS)演算法 ................... 10 2.3.1 Particle Thompson Sampling For Matrix Factorization(PTS) .. 11 2.4 Warming-Start of Contextual Bandits(WSCB) ............ 12 第三章研究方法 15 3.1 資料集選擇.............................. .. 15 3.2 資料前處理.............................. .. 16 3.2.1 資料欄位刪除 ............................ 16 3.2.2 重複資料清洗 ............................ 17 3.2.3 資料抽樣............................... 18 3.3評估方法................................. 18 3.3.1 實驗條件............................... 19 3.3.2 訓練集與測試集 ........................... 20 3.3.3 超參數選擇.............................. 20 3.4評估指標................................. 21 3.4.1 效率.................................. 22 3.4.2 效能.................................. 22 3.4.3 穩健性 ................................ 23 3.5 實驗設計................................. 24 3.6 框架工具................................. 25 第四章實證結果 26 4.1 典型評估方式 .............................. 26 4.1.1 典型效率比較結果.......................... 26 4.1.1.1 典型效率穩健性比較結果............... 28 4.1.2 典型效能比較結果.......................... 28 4.1.2.1 典型效能穩健性比較結果............... 29 4.1.3 典型評估方式整體比較 ....................... 30 4.2 機率性推薦方式............................. 30 4.2.1 機率性效率比較結果 ........................ 31 4.2.1.1 機率性效率穩健性比較結果.............. 32 4.2.2 機率性效能比較結果 ........................ 32 4.2.2.1 機率性效能穩健性比較結果.............. 34 4.2.3 機率性方式整體比較 ........................ 34 第五章討論與結論 36 5.1 研究發現 ................................ 36 5.2 研究貢獻 ................................ 37 5.3 限制與未來發展............................. 38 參考文獻 40 附錄 A — 網格搜索最佳模型參數 42 A.1 典型評估方式，前期效率參數(T=20)................ 42 A.2 典型評估方式，長期效能參數(T=500)............... 43 A.3 機率性推薦方式，前期效率參數(T=20) .............. 44 A.4 機率性推薦方式，長期效能參數(T=500).............. 45	-
dc.language.iso	zh_TW	-
dc.subject	強化學習	zh_TW
dc.subject	推薦系統	zh_TW
dc.subject	多臂賭博機	zh_TW
dc.subject	Recommendation systems	en
dc.subject	Reinforcement Learning	en
dc.subject	Multi-armed bandit	en
dc.title	各種多臂賭博機演算法在個人化推薦之效率、效能與穩健性比較	zh_TW
dc.title	A Comparison of Efficiency, Effectiveness, and Robustness of Multi-Armed Bandit Algorithms for Personalized Recommendations	en
dc.type	Thesis	-
dc.date.schoolyear	111-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	陳弘軒;蔡政安	zh_TW
dc.contributor.oralexamcommittee	Hung-Hsuan Chen;Chen-An Tsai	en
dc.subject.keyword	推薦系統,強化學習,多臂賭博機,	zh_TW
dc.subject.keyword	Recommendation systems,Reinforcement Learning,Multi-armed bandit,	en
dc.relation.page	45	-
dc.identifier.doi	10.6342/NTU202301110	-
dc.rights.note	未授權	-
dc.date.accepted	2023-07-26	-
dc.contributor.author-college	共同教育中心	-
dc.contributor.author-dept	統計碩士學位學程	-
顯示於系所單位：	統計碩士學位學程

文件中的檔案：

檔案	大小	格式
ntu-111-2.pdf 未授權公開取用	2.25 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。