Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/17252
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor林軒田(Hsuan-Tien Lin)
dc.contributor.authorYa-Hsuan Changen
dc.contributor.author張雅軒zh_TW
dc.date.accessioned2021-06-08T00:03:09Z-
dc.date.copyright2013-08-20
dc.date.issued2013
dc.date.submitted2013-08-14
dc.identifier.citationBibliography
[1] P. Auer. Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research, 3:397–422, 2003.
[2] P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite-time analysis of the multiarmed bandit problem. Machine learning, 47(2-3):235–256, 2002.
[3] P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire. The nonstochastic multiarmed bandit problem. SIAM Journal on Com- puting, 32(1):48–77, 2002.
[4] A. Beygelzimer, J. Langford, L. Li, L. Reyzin, and R. E. Schapire. Contextual bandit algorithms with supervised learning guarantees. arXiv preprint arXiv:1002.4058, 2010.
[5] U. Brefeld and T. Scheffer. Auc maximizing support vector learn- ing. In Proceedings of the ICML 2005 workshop on ROC Analysis in Machine Learning, 2005.
[6] C.-C. Chang and C.-J. Lin. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technol- ogy, 2:27:1–27:27, 2011. Software available at http://www.csie. ntu.edu.tw/~cjlin/libsvm.
35
[7] K.-C. Chou and H.-T. Lin. Balancing between estimated reward and uncertainty during news article recommendation for ICML 2012 exploration and exploitation challenge. 2012.
[8] W. Chu, L. Li, L. Reyzin, and R. E. Schapire. Contextual ban- dits with linear payoff functions. In Proceedings of the Inter- national Conference on Artificial Intelligence and Statistics (AIS- TATS), 2011.
[9] M. Dudik, D. Hsu, S. Kale, N. Karampatziakis, J. Langford, L. Reyzin, and T. Zhang. Efficient optimal learning for contex- tual bandits. arXiv preprint arXiv:1106.2369, 2011.
[10] C. Gentile and F. Orabona. On multilabel classification and ranking with partial feedback. arXiv preprint arXiv:1207.0166, 2012.
[11] T.-K. Jan, D.-W. Wang, C.-H. Lin, and H.-T. Lin. A simple method- ology for soft cost-sensitive classification. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge dis- covery and data mining, pages 141–149. ACM, 2012.
[12] S. Kale, L. Reyzin, and R. Schapire. Non-stochastic bandit slate problems. Advances in Neural Information Processing Systems (NIPS), pages 1054–1062, 2010.
[13] L. Li, W. Chu, J. Langford, and R. E. Schapire. A contextual- bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World wide web, pages 661–670. ACM, 2010.
[14] L. Li, W. Chu, J. Langford, and X. Wang. Unbiased offline evalua- tion of contextual-bandit-based news article recommendation algo- rithms. In Proceedings of the fourth ACM international conference on Web search and data mining, pages 297–306. ACM, 2011.
36
[15] E. L. Mencia and J. Furnkranz. Pairwise learning of multilabel classifications with perceptrons. In Proceedings of the 2008 IEEE International Joint Conference on Neural Networks (IJCNN-08), pages 2899–2906. IEEE, 2008.
[16] S. Pandey, D. Agarwal, D. Chakrabarti, and V. Josifovski. Ban- dits for taxonomies: A model-based approach. In SIAM on DATA MINING, 2007.
[17] H. Robbins. Some aspects of the sequential design of experiments. In Herbert Robbins Selected Papers, pages 169–177. Springer, 1985.
[18] D. Sculley. Combined regression and ranking. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 979–988. ACM, 2010.
[19] R. S. Sutton and A. G. Barto. Reinforcement Learning: An Intro- duction. MIT Press, 1998.
[20] C.-C. Wang, S. R. Kulkarni, and H. V. Poor. Bandit problems with side observations. Automatic Control, IEEE Transactions on, 50(3):338–355, 2005.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/17252-
dc.description.abstract情境式拉霸問題 (Contextual Bandit Problem) 經常被使 用來模擬線上的應用,像是文章推薦系統。然而,我們 觀察到這些線上應用有部分的特性是傳統的情境式拉霸 問題無法模擬的,像是單回合多動作的設定。於是我們 提出一個新的多動作情境式拉霸問題 (Contextual Bandit with Multiple Actions) 來模擬這個特性。我們將一些現 有的方法調整後用在這個新問題上,同時我們也針對 新問題的特性提出了偶式回歸配合最高信心上界方法 (Pairwise Regression with Upper Confidence Bound). 實驗 的結果顯示我們提出的新方法表現的比現有的方法好。zh_TW
dc.description.abstractThe contextual bandit problem is usually used to model online applications like article recommendation. Somehow the problem cannot fully meet some needs of these applica- tions, such as making multiple actions at the same time. We propose a new Contextual Bandit Problem with Multiple Ac- tions (CBMA), which is an extension of the traditional con- textual bandit problem and fits the online applications better. We adapt some existing contextual bandit algorithms for our CBMA problem, and propose a new Pairwise Regression with Upper Confidence Bound (PairUCB) algorithm which utilizes the new properties of the CBMA problem, The experiment re- sults demostrate that PairUCB outperforms other algorithms.en
dc.description.provenanceMade available in DSpace on 2021-06-08T00:03:09Z (GMT). No. of bitstreams: 1
ntu-102-R00922044-1.pdf: 625395 bytes, checksum: 70e588808175e58230120161f2bd3cc7 (MD5)
Previous issue date: 2013
en
dc.description.tableofcontentsContents
口試委員會審定書 iii 誌謝 v 摘要 vii Abstract ix 1 Introduction 1
2 Preliminary 5
2.1 ProblemSetup ...................... 5 2.2 RelatedWork....................... 6
3 Approaches 9
3.1 GeneralAlgorithmFramework ............. 9
3.2 BaselineApproach.................... 10
3.2.1 GreedyAlgorithm................ 10
3.2.2 StochasticAlgorithms.............. 12
3.2.3 Upper Confidence Bound Algorithm . . . . . . 13
3.3 ProposedApproach ................... 15
3.3.1 Pairwise Regression with Upper Confidence Bound 15
3.3.2 Mixed Pairwise and Pointwise Regression with
Upper Confidence Bound . . . . . . . . . . . . 18 xi
4 Experiment 21
4.1 Dataset .......................... 21 4.2 Setup ........................... 23 4.3 Performance ....................... 25
5 Conclusion 33
Bibliography 35
dc.language.isoen
dc.title多動作情境式拉霸問題之研究zh_TW
dc.titleStudy on Contextual Bandit Problem with Multiple Actionsen
dc.typeThesis
dc.date.schoolyear101-2
dc.description.degree碩士
dc.contributor.oralexamcommittee林守德(Shou-de Lin),李育杰(Yuh-Jye Lee)
dc.subject.keyword機器學習,情境式拉霸問題,信心值上界,zh_TW
dc.subject.keywordMachine Learning,Contextual Bandit Problem,Upper Confidence Bound,en
dc.relation.page37
dc.rights.note未授權
dc.date.accepted2013-08-15
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept資訊工程學研究所zh_TW
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
ntu-102-1.pdf
  未授權公開取用
610.74 kBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved