請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/17368完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 林軒田(Hsuan-Tien Lin) | |
| dc.contributor.author | Ku-Chun Chou | en |
| dc.contributor.author | 周谷駿 | zh_TW |
| dc.date.accessioned | 2021-06-08T00:09:08Z | - |
| dc.date.copyright | 2013-08-23 | |
| dc.date.issued | 2013 | |
| dc.date.submitted | 2013-08-09 | |
| dc.identifier.citation | Bibliography
[1] P. Auer. Using Confidence Bounds for Exploitation-Exploration Trade-offs. Journal of Machine Learning Research, 3:397–422, 2002. [2] P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47:235–256, 2002. [3] O. Chapelle and L. Li. An empirical evaluation of thompson sampling. In NIPS, pages 2249–2257, 2011. [4] K.-C. Chou, C.-K. Chiang, H.-T. Lin, and C.-J. Lu. Pseudo-reward for linear contextual bandits. Submitted and under review, 2013. [5] K.-C. Chou and H.-T. Lin. Balancing between estimated reward and uncertainty during news article recommendation for icml 2012 exploration and exploitation challenge. Technical report, 2012. [6] W. Chu, L. Li, L. Reyzin, and R. E. Schapire. Contextual bandits with linear payoff functions. Journal of Machine Learning Research - Proceedings Track, 15:208–214, 2011. [7] C. chun Wang, S. R. Kulkarni, and H. V. Poor. Bandit Problems with Side Observations. Computing Research Repository, abs/cs/050, 2005. [8] A. Garivier and O. Capp’e. The kl-ucb algorithm for bounded stochastic bandits and beyond. Journal of Machine Learning Research - Proceedings Track, 19:359–376, 2011. 25 [9] D. Glowacka, T. Ruotsalo, K. Konuyshkova, k. Athukorala, S. Kaski, and G. Jacucci. Directing exploratory search: reinforcement learning from user interactions with keywords. In Proceedings of the 2013 international conference on Intelligent user interfaces, IUI ’13, pages 117–128, New York, NY, USA, 2013. ACM. [10] T. L. Lai and H. Robbins. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6:4–22, 1985. [11] J. Langford and T. Zhang. The epoch-greedy algorithm for multi-armed bandits with side information. In NIPS, 2007. [12] L. Li, W. Chu, J. Langford, and R. E. Schapire. A contextual-bandit approach to personalized news article recommendation. In WWW, pages 661–670, 2010. [13] L. Li,W. Chu, J. Langford, and X.Wang. Unbiased offline evaluation of contextualbandit- based news article recommendation algorithms. In WSDM, pages 297–306, 2011. [14] R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. MIT Press, 1998. [15] A. Tikhonov. On the stability of inverse problems. Doklady Akademii nauk SSSR,39(5):195-198,1943. [16] C.J.C.H. Watkins. Learning from Delayed Rewards. PhD thesis, King's College,Cambridge,UK,May 1989. [17] Y.Yue,S.A. Hong,and C. Guestrin. Hierarchical exploration for accelerating con-testual bandits. In ICML,2012. | |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/17368 | - |
| dc.description.abstract | 在此論文中我們研究網路線上廣告播放、推薦系統中衍生的情
境式拉霸問題(Contextual Bandit Problem)。線性上信賴界(Linear Upper Confidence Bound)是目前解決這等問題領先的演算法之一, 它 使用線性回歸向環境的片面回饋學習並且更新其內部的模型。因為 拉霸問題的特性線性上信賴界只使用片面回饋作為學習的資料,但這 導致它可能會很緩慢的達到最佳的效能。我們在此提出以虛擬回饋 (Pseudo-reward)來改善線性上信賴界。並且發現只要使用適當的虛 擬回饋以及搭配恰當的遺忘機制就可以快速的達到較佳效能並且同時 兼顧線性上信賴界於理論上的保證。除此之外,因為使用了虛擬回饋 我們還可以設計出可以快速做出預測的演算法。這種快速的預測能 力在許多網路應用都具有價值,因為預測速度和網頁載入速度息息相 關。最後我們使用了人造資料以及雅虎新聞提供的巨型真實世界資料 來驗證我們所提出的演算法之優越性。 | zh_TW |
| dc.description.abstract | We study the contextual bandit problem that arises in many real world applications
such as advertising, recommendations, and otherWeb applications. One leading algorithm for contextual bandit is the linear upper confidence bound (LINUCB) approach, which is based on updating internal linear regression models with the partial feedback from the environment. Because of updating with only the partial feedback, LINUCB can be slow in converging to the optimal performance. In this work, we study techniques that improve LINUCB by updating the linear regressionmodels with some additional feedback called the pseudo-reward. By choosing a proper pseudo-reward formula and implementing a forgetting mechanism to avoid being overly biased by the pseudo-rewards, we propose an improved algorithm that matches the regret guarantee of LINUCB in theory. Furthermore, we design a variant of the proposed algorithm that can be significantly more efficient than LINUCB during action selection, which directly implies faster response time in many applications. Extensive experimental results from both artificial data and the benchmark Yahoo! News recommendation data show that the proposed algorithm enjoys better performance than LINUCB and other contextual bandit algorithms. | en |
| dc.description.provenance | Made available in DSpace on 2021-06-08T00:09:08Z (GMT). No. of bitstreams: 1 ntu-102-R99922095-1.pdf: 510824 bytes, checksum: 7bec39883c8eefcfe4caa468b0d74c30 (MD5) Previous issue date: 2013 | en |
| dc.description.tableofcontents | i
中文摘要iii Abstract v 1 Introduction 1 2 Problem Settings 5 3 Preliminaries and Related Works 7 3.1 Baseline Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.2 LinUCB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 4 Pseudo-reward Algorithms 13 4.1 Introducing LINPRUCB . . . . . . . . . . . . . . . . . . . . . . . . . . 13 5 Experiments 17 5.1 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 5.2 Real World Benchmark Data . . . . . . . . . . . . . . . . . . . . . . . . 19 6 Conclusions 23 Bibliography 25 | |
| dc.language.iso | en | |
| dc.title | 以虛擬回饋演算法解決線性情境式拉霸問題 | zh_TW |
| dc.title | Pseudo-reward Algorithms for Linear Contextual Bandit Problems | en |
| dc.type | Thesis | |
| dc.date.schoolyear | 101-2 | |
| dc.description.degree | 碩士 | |
| dc.contributor.oralexamcommittee | 林智仁(Chih-Jen Lin),李育杰(Yuh-Jye Lee),呂及人(Chi-Jen Lu) | |
| dc.subject.keyword | 情境式拉霸問題,線性上信賴界,線性回歸, | zh_TW |
| dc.subject.keyword | contextual bandit problem,linear upper confidence bound,linear regression, | en |
| dc.relation.page | 26 | |
| dc.rights.note | 未授權 | |
| dc.date.accepted | 2013-08-09 | |
| dc.contributor.author-college | 電機資訊學院 | zh_TW |
| dc.contributor.author-dept | 資訊工程學研究所 | zh_TW |
| 顯示於系所單位: | 資訊工程學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-102-1.pdf 未授權公開取用 | 498.85 kB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
