多動作情境式拉霸問題之研究

Ya-Hsuan Chang; 張雅軒

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/17252

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	林軒田(Hsuan-Tien Lin)
dc.contributor.author	Ya-Hsuan Chang	en
dc.contributor.author	張雅軒	zh_TW
dc.date.accessioned	2021-06-08T00:03:09Z	-
dc.date.copyright	2013-08-20
dc.date.issued	2013
dc.date.submitted	2013-08-14
dc.identifier.citation	Bibliography [1] P. Auer. Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research, 3:397–422, 2003. [2] P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite-time analysis of the multiarmed bandit problem. Machine learning, 47(2-3):235–256, 2002. [3] P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire. The nonstochastic multiarmed bandit problem. SIAM Journal on Com- puting, 32(1):48–77, 2002. [4] A. Beygelzimer, J. Langford, L. Li, L. Reyzin, and R. E. Schapire. Contextual bandit algorithms with supervised learning guarantees. arXiv preprint arXiv:1002.4058, 2010. [5] U. Brefeld and T. Scheffer. Auc maximizing support vector learn- ing. In Proceedings of the ICML 2005 workshop on ROC Analysis in Machine Learning, 2005. [6] C.-C. Chang and C.-J. Lin. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technol- ogy, 2:27:1–27:27, 2011. Software available at http://www.csie. ntu.edu.tw/~cjlin/libsvm. 35 [7] K.-C. Chou and H.-T. Lin. Balancing between estimated reward and uncertainty during news article recommendation for ICML 2012 exploration and exploitation challenge. 2012. [8] W. Chu, L. Li, L. Reyzin, and R. E. Schapire. Contextual ban- dits with linear payoff functions. In Proceedings of the Inter- national Conference on Artificial Intelligence and Statistics (AIS- TATS), 2011. [9] M. Dudik, D. Hsu, S. Kale, N. Karampatziakis, J. Langford, L. Reyzin, and T. Zhang. Efficient optimal learning for contex- tual bandits. arXiv preprint arXiv:1106.2369, 2011. [10] C. Gentile and F. Orabona. On multilabel classification and ranking with partial feedback. arXiv preprint arXiv:1207.0166, 2012. [11] T.-K. Jan, D.-W. Wang, C.-H. Lin, and H.-T. Lin. A simple method- ology for soft cost-sensitive classification. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge dis- covery and data mining, pages 141–149. ACM, 2012. [12] S. Kale, L. Reyzin, and R. Schapire. Non-stochastic bandit slate problems. Advances in Neural Information Processing Systems (NIPS), pages 1054–1062, 2010. [13] L. Li, W. Chu, J. Langford, and R. E. Schapire. A contextual- bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World wide web, pages 661–670. ACM, 2010. [14] L. Li, W. Chu, J. Langford, and X. Wang. Unbiased offline evalua- tion of contextual-bandit-based news article recommendation algo- rithms. In Proceedings of the fourth ACM international conference on Web search and data mining, pages 297–306. ACM, 2011. 36 [15] E. L. Mencia and J. Furnkranz. Pairwise learning of multilabel classifications with perceptrons. In Proceedings of the 2008 IEEE International Joint Conference on Neural Networks (IJCNN-08), pages 2899–2906. IEEE, 2008. [16] S. Pandey, D. Agarwal, D. Chakrabarti, and V. Josifovski. Ban- dits for taxonomies: A model-based approach. In SIAM on DATA MINING, 2007. [17] H. Robbins. Some aspects of the sequential design of experiments. In Herbert Robbins Selected Papers, pages 169–177. Springer, 1985. [18] D. Sculley. Combined regression and ranking. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 979–988. ACM, 2010. [19] R. S. Sutton and A. G. Barto. Reinforcement Learning: An Intro- duction. MIT Press, 1998. [20] C.-C. Wang, S. R. Kulkarni, and H. V. Poor. Bandit problems with side observations. Automatic Control, IEEE Transactions on, 50(3):338–355, 2005.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/17252	-
dc.description.abstract	情境式拉霸問題 (Contextual Bandit Problem) 經常被使用來模擬線上的應用,像是文章推薦系統。然而,我們觀察到這些線上應用有部分的特性是傳統的情境式拉霸問題無法模擬的,像是單回合多動作的設定。於是我們提出一個新的多動作情境式拉霸問題 (Contextual Bandit with Multiple Actions) 來模擬這個特性。我們將一些現有的方法調整後用在這個新問題上,同時我們也針對新問題的特性提出了偶式回歸配合最高信心上界方法 (Pairwise Regression with Upper Confidence Bound). 實驗的結果顯示我們提出的新方法表現的比現有的方法好。	zh_TW
dc.description.abstract	The contextual bandit problem is usually used to model online applications like article recommendation. Somehow the problem cannot fully meet some needs of these applica- tions, such as making multiple actions at the same time. We propose a new Contextual Bandit Problem with Multiple Ac- tions (CBMA), which is an extension of the traditional con- textual bandit problem and fits the online applications better. We adapt some existing contextual bandit algorithms for our CBMA problem, and propose a new Pairwise Regression with Upper Confidence Bound (PairUCB) algorithm which utilizes the new properties of the CBMA problem, The experiment re- sults demostrate that PairUCB outperforms other algorithms.	en
dc.description.provenance	Made available in DSpace on 2021-06-08T00:03:09Z (GMT). No. of bitstreams: 1 ntu-102-R00922044-1.pdf: 625395 bytes, checksum: 70e588808175e58230120161f2bd3cc7 (MD5) Previous issue date: 2013	en
dc.description.tableofcontents	Contents 口試委員會審定書 iii 誌謝 v 摘要 vii Abstract ix 1 Introduction 1 2 Preliminary 5 2.1 ProblemSetup ...................... 5 2.2 RelatedWork....................... 6 3 Approaches 9 3.1 GeneralAlgorithmFramework ............. 9 3.2 BaselineApproach.................... 10 3.2.1 GreedyAlgorithm................ 10 3.2.2 StochasticAlgorithms.............. 12 3.2.3 Upper Confidence Bound Algorithm . . . . . . 13 3.3 ProposedApproach ................... 15 3.3.1 Pairwise Regression with Upper Confidence Bound 15 3.3.2 Mixed Pairwise and Pointwise Regression with Upper Confidence Bound . . . . . . . . . . . . 18 xi 4 Experiment 21 4.1 Dataset .......................... 21 4.2 Setup ........................... 23 4.3 Performance ....................... 25 5 Conclusion 33 Bibliography 35
dc.language.iso	en
dc.title	多動作情境式拉霸問題之研究	zh_TW
dc.title	Study on Contextual Bandit Problem with Multiple Actions	en
dc.type	Thesis
dc.date.schoolyear	101-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	林守德(Shou-de Lin),李育杰(Yuh-Jye Lee)
dc.subject.keyword	機器學習,情境式拉霸問題,信心值上界,	zh_TW
dc.subject.keyword	Machine Learning,Contextual Bandit Problem,Upper Confidence Bound,	en
dc.relation.page	37
dc.rights.note	未授權
dc.date.accepted	2013-08-15
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-102-1.pdf 未授權公開取用	610.74 kB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。