請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/17368| 標題: | 以虛擬回饋演算法解決線性情境式拉霸問題 Pseudo-reward Algorithms for Linear Contextual Bandit Problems |
| 作者: | Ku-Chun Chou 周谷駿 |
| 指導教授: | 林軒田(Hsuan-Tien Lin) |
| 關鍵字: | 情境式拉霸問題,線性上信賴界,線性回歸, contextual bandit problem,linear upper confidence bound,linear regression, |
| 出版年 : | 2013 |
| 學位: | 碩士 |
| 摘要: | 在此論文中我們研究網路線上廣告播放、推薦系統中衍生的情
境式拉霸問題(Contextual Bandit Problem)。線性上信賴界(Linear Upper Confidence Bound)是目前解決這等問題領先的演算法之一, 它 使用線性回歸向環境的片面回饋學習並且更新其內部的模型。因為 拉霸問題的特性線性上信賴界只使用片面回饋作為學習的資料,但這 導致它可能會很緩慢的達到最佳的效能。我們在此提出以虛擬回饋 (Pseudo-reward)來改善線性上信賴界。並且發現只要使用適當的虛 擬回饋以及搭配恰當的遺忘機制就可以快速的達到較佳效能並且同時 兼顧線性上信賴界於理論上的保證。除此之外,因為使用了虛擬回饋 我們還可以設計出可以快速做出預測的演算法。這種快速的預測能 力在許多網路應用都具有價值,因為預測速度和網頁載入速度息息相 關。最後我們使用了人造資料以及雅虎新聞提供的巨型真實世界資料 來驗證我們所提出的演算法之優越性。 We study the contextual bandit problem that arises in many real world applications such as advertising, recommendations, and otherWeb applications. One leading algorithm for contextual bandit is the linear upper confidence bound (LINUCB) approach, which is based on updating internal linear regression models with the partial feedback from the environment. Because of updating with only the partial feedback, LINUCB can be slow in converging to the optimal performance. In this work, we study techniques that improve LINUCB by updating the linear regressionmodels with some additional feedback called the pseudo-reward. By choosing a proper pseudo-reward formula and implementing a forgetting mechanism to avoid being overly biased by the pseudo-rewards, we propose an improved algorithm that matches the regret guarantee of LINUCB in theory. Furthermore, we design a variant of the proposed algorithm that can be significantly more efficient than LINUCB during action selection, which directly implies faster response time in many applications. Extensive experimental results from both artificial data and the benchmark Yahoo! News recommendation data show that the proposed algorithm enjoys better performance than LINUCB and other contextual bandit algorithms. |
| URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/17368 |
| 全文授權: | 未授權 |
| 顯示於系所單位: | 資訊工程學系 |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-102-1.pdf 未授權公開取用 | 498.85 kB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
