請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/17252
完整後設資料紀錄
DC 欄位 | 值 | 語言 |
---|---|---|
dc.contributor.advisor | 林軒田(Hsuan-Tien Lin) | |
dc.contributor.author | Ya-Hsuan Chang | en |
dc.contributor.author | 張雅軒 | zh_TW |
dc.date.accessioned | 2021-06-08T00:03:09Z | - |
dc.date.copyright | 2013-08-20 | |
dc.date.issued | 2013 | |
dc.date.submitted | 2013-08-14 | |
dc.identifier.citation | Bibliography
[1] P. Auer. Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research, 3:397–422, 2003. [2] P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite-time analysis of the multiarmed bandit problem. Machine learning, 47(2-3):235–256, 2002. [3] P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire. The nonstochastic multiarmed bandit problem. SIAM Journal on Com- puting, 32(1):48–77, 2002. [4] A. Beygelzimer, J. Langford, L. Li, L. Reyzin, and R. E. Schapire. Contextual bandit algorithms with supervised learning guarantees. arXiv preprint arXiv:1002.4058, 2010. [5] U. Brefeld and T. Scheffer. Auc maximizing support vector learn- ing. In Proceedings of the ICML 2005 workshop on ROC Analysis in Machine Learning, 2005. [6] C.-C. Chang and C.-J. Lin. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technol- ogy, 2:27:1–27:27, 2011. Software available at http://www.csie. ntu.edu.tw/~cjlin/libsvm. 35 [7] K.-C. Chou and H.-T. Lin. Balancing between estimated reward and uncertainty during news article recommendation for ICML 2012 exploration and exploitation challenge. 2012. [8] W. Chu, L. Li, L. Reyzin, and R. E. Schapire. Contextual ban- dits with linear payoff functions. In Proceedings of the Inter- national Conference on Artificial Intelligence and Statistics (AIS- TATS), 2011. [9] M. Dudik, D. Hsu, S. Kale, N. Karampatziakis, J. Langford, L. Reyzin, and T. Zhang. Efficient optimal learning for contex- tual bandits. arXiv preprint arXiv:1106.2369, 2011. [10] C. Gentile and F. Orabona. On multilabel classification and ranking with partial feedback. arXiv preprint arXiv:1207.0166, 2012. [11] T.-K. Jan, D.-W. Wang, C.-H. Lin, and H.-T. Lin. A simple method- ology for soft cost-sensitive classification. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge dis- covery and data mining, pages 141–149. ACM, 2012. [12] S. Kale, L. Reyzin, and R. Schapire. Non-stochastic bandit slate problems. Advances in Neural Information Processing Systems (NIPS), pages 1054–1062, 2010. [13] L. Li, W. Chu, J. Langford, and R. E. Schapire. A contextual- bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World wide web, pages 661–670. ACM, 2010. [14] L. Li, W. Chu, J. Langford, and X. Wang. Unbiased offline evalua- tion of contextual-bandit-based news article recommendation algo- rithms. In Proceedings of the fourth ACM international conference on Web search and data mining, pages 297–306. ACM, 2011. 36 [15] E. L. Mencia and J. Furnkranz. Pairwise learning of multilabel classifications with perceptrons. In Proceedings of the 2008 IEEE International Joint Conference on Neural Networks (IJCNN-08), pages 2899–2906. IEEE, 2008. [16] S. Pandey, D. Agarwal, D. Chakrabarti, and V. Josifovski. Ban- dits for taxonomies: A model-based approach. In SIAM on DATA MINING, 2007. [17] H. Robbins. Some aspects of the sequential design of experiments. In Herbert Robbins Selected Papers, pages 169–177. Springer, 1985. [18] D. Sculley. Combined regression and ranking. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 979–988. ACM, 2010. [19] R. S. Sutton and A. G. Barto. Reinforcement Learning: An Intro- duction. MIT Press, 1998. [20] C.-C. Wang, S. R. Kulkarni, and H. V. Poor. Bandit problems with side observations. Automatic Control, IEEE Transactions on, 50(3):338–355, 2005. | |
dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/17252 | - |
dc.description.abstract | 情境式拉霸問題 (Contextual Bandit Problem) 經常被使 用來模擬線上的應用,像是文章推薦系統。然而,我們 觀察到這些線上應用有部分的特性是傳統的情境式拉霸 問題無法模擬的,像是單回合多動作的設定。於是我們 提出一個新的多動作情境式拉霸問題 (Contextual Bandit with Multiple Actions) 來模擬這個特性。我們將一些現 有的方法調整後用在這個新問題上,同時我們也針對 新問題的特性提出了偶式回歸配合最高信心上界方法 (Pairwise Regression with Upper Confidence Bound). 實驗 的結果顯示我們提出的新方法表現的比現有的方法好。 | zh_TW |
dc.description.abstract | The contextual bandit problem is usually used to model online applications like article recommendation. Somehow the problem cannot fully meet some needs of these applica- tions, such as making multiple actions at the same time. We propose a new Contextual Bandit Problem with Multiple Ac- tions (CBMA), which is an extension of the traditional con- textual bandit problem and fits the online applications better. We adapt some existing contextual bandit algorithms for our CBMA problem, and propose a new Pairwise Regression with Upper Confidence Bound (PairUCB) algorithm which utilizes the new properties of the CBMA problem, The experiment re- sults demostrate that PairUCB outperforms other algorithms. | en |
dc.description.provenance | Made available in DSpace on 2021-06-08T00:03:09Z (GMT). No. of bitstreams: 1 ntu-102-R00922044-1.pdf: 625395 bytes, checksum: 70e588808175e58230120161f2bd3cc7 (MD5) Previous issue date: 2013 | en |
dc.description.tableofcontents | Contents
口試委員會審定書 iii 誌謝 v 摘要 vii Abstract ix 1 Introduction 1 2 Preliminary 5 2.1 ProblemSetup ...................... 5 2.2 RelatedWork....................... 6 3 Approaches 9 3.1 GeneralAlgorithmFramework ............. 9 3.2 BaselineApproach.................... 10 3.2.1 GreedyAlgorithm................ 10 3.2.2 StochasticAlgorithms.............. 12 3.2.3 Upper Confidence Bound Algorithm . . . . . . 13 3.3 ProposedApproach ................... 15 3.3.1 Pairwise Regression with Upper Confidence Bound 15 3.3.2 Mixed Pairwise and Pointwise Regression with Upper Confidence Bound . . . . . . . . . . . . 18 xi 4 Experiment 21 4.1 Dataset .......................... 21 4.2 Setup ........................... 23 4.3 Performance ....................... 25 5 Conclusion 33 Bibliography 35 | |
dc.language.iso | en | |
dc.title | 多動作情境式拉霸問題之研究 | zh_TW |
dc.title | Study on Contextual Bandit Problem with Multiple Actions | en |
dc.type | Thesis | |
dc.date.schoolyear | 101-2 | |
dc.description.degree | 碩士 | |
dc.contributor.oralexamcommittee | 林守德(Shou-de Lin),李育杰(Yuh-Jye Lee) | |
dc.subject.keyword | 機器學習,情境式拉霸問題,信心值上界, | zh_TW |
dc.subject.keyword | Machine Learning,Contextual Bandit Problem,Upper Confidence Bound, | en |
dc.relation.page | 37 | |
dc.rights.note | 未授權 | |
dc.date.accepted | 2013-08-15 | |
dc.contributor.author-college | 電機資訊學院 | zh_TW |
dc.contributor.author-dept | 資訊工程學研究所 | zh_TW |
顯示於系所單位: | 資訊工程學系 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-102-1.pdf 目前未授權公開取用 | 610.74 kB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。