Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/29151
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor林守德(Shou-de Lin)
dc.contributor.authorJung-Jung Yehen
dc.contributor.author葉蓉蓉zh_TW
dc.date.accessioned2021-06-13T00:43:11Z-
dc.date.available2012-08-10
dc.date.copyright2011-08-10
dc.date.issued2011
dc.date.submitted2011-08-04
dc.identifier.citation[1] A. Damodaran. Investment Philosophies: Successful Investment Philosophies and the Greatest Investors Who Made Them Work. Wiley, Jan 2003
[2] A. Z. Stuart and J. Russell. Q-decomposition for reinforcement learning agents. in Machine Learning, Proceedings of the Twentieth International Conference, pages 656–663, 2003.
[3] C. Acerbi and D. Tasche: Expected Shortfall, a natural coherent alternative to value at risk, 2001
[4] E. J. Elton, and M. J. Gruber, Modern portfolio theory and investment analysis. New York: John Wiley &Sons, 1995
[5] E. Maskin and J. Riley, Optimal Auctions with Risk Averse Buyers, Econometrica, 52(1), The Econometric Society,1984
[6] H. Kashima, Risk-Sensitive Learning via Minimization of Empirical Conditional Value-at-Risk, Transactions on Information and Systems, pages 2043-2052, Oxford University Press , 2007
[7] J. A. Filar, L. C. M. Kallenberg and Huey-Miin Lee, Variance-Penalized Markov Decision Processes, Mathematics of Operations Research, Vol. 14, No. 1, pp. 147-161, INFORMS, 1989
[8] J. Li and L. Chan, Reward Adjustment Reinforcement Learning for Risk-averse Asset Allocation, in Proc. International Joint Conference on Neural Networks pages 534 – 541, 2006
[9] M. Sato and S. Kobayashi. Variance-penalized reinforcement learning for risk-averse asset allocation. in Proceedings of the Second International Conference on Intelligent Data Engineering and Automated Learning, Data Mining, Financial Engineering, and Intelligent Agents , pages 244–249, 2000.
[10] M. Heger. Consideration of Risk in Reinforcement Learning. in Proc. 11th International Conference on Machine Learning, pages 105-111,199
[11] O. Mihatsch and R. Neuneier. Risk-sensitive reinforcement learning. Machine Learning, 49(2-3):267–290, 2002.
[12] P. Geibel and F. Wysotzki. Risk-sensitive reinforcement learning applied to control under constraints. Journal of Artificial Intelligence Research, 24(1):81–108, 2005.
[13] P. Jorion, Value at Risk, The New Benchmark for Managing Financial Risk, 3rd ed. McGraw-Hill (2006). ISBN 978-0071464956
[14] R. E. Bellman, Dynamic Programming. Princeton University Press, Princeton, NJ,1957
[15] R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. The MIT Press, Mar 1998
[16] S. Koenig, and R. G. Simmons, Risk-sensitive planning with probabilistic decision graphs. in Proc of the Fourth International Conference on Principles of Knowledge Representation and Reasoning, pages 363–373, 1994.
[17] W. F. Sharpe, The Sharpe Ratio, Journal of Portfolio Management, pages 49–58, 1994
[18] Y. Liu and R. Goodwin and S. Koenig, Risk Averse Auction Agents, in Proc of the second international joint conference on Autonomous agents and multi agent systems, 2003
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/29151-
dc.description.abstract傳統的強化學習演算法的目的為最大化累計折現報酬,並忽略了報酬的分布,使得報酬可能極不穩定而且會有極大損失的結果出現。本篇論文提出以期望損失作為風險定義並因此設計一可避免風險之強化學習演算法以改善傳統的強化學習。實驗結果顯示避免期望損失的強化學習演算法同時也可降低其他類的風險,如報酬的標準差,最大損失,損失機率,及實際報酬低於期望報酬的情形。同時避免期望損失可以降低一般公司及金融企業的違約風險及股市的斷頭危機,而這是既有文獻中尚未能有效處理的風險概念。
我們設計了一個可分解的強化學習演算法以有效降低期望損失。此架構包含了兩個子代理人及一個仲裁者。子代理人各自學習期望損失及期望報酬,而仲裁者評估可能行為所得到的風險及報酬後採取最適決定。
實驗分為格子世界及台灣的電子股指數模擬交易兩部分。在格子世界裡我們會展現不同程度的危險厭惡代理人所獲得的期望報酬及風險,與另一敏感損失係數可助於代理人提升期望報酬在給定的風險下。在電子股模擬交易則與懲罰變異數與風險敏感的強化學習演算法進行比較,結果發現在給定的投資報酬率下,避免期望損失的代理人可有效降低其他風險評估值。
zh_TW
dc.description.abstractTraditional reinforcement learning agents focus on maximizing the expected cumulated rewards and ignore the distribution of the return. However, for some tasks people prefer actions that might not lead to as much return but more likely to avoid disaster. This thesis proposes to define risk as the expected loss and accordingly design a risk-avoiding reinforcement learning agent. Our experiment shows that such risk-avoiding reinforcement learning agent can improve different types of risks such as variance of return, the maximal loss, the probability of fatal errors. The risk defined based on loss is capable of reducing the credit risk to the banks as well as the loss existing in stock marginal trading, which can hardly be coped effectively in the previous literatures.
We design a Q-decomposed reinforcement learning system to handle the tradeoff between expected loss and return. The framework consists of two subagents and one arbiter. Subagents learn the expected loss and the expected return individually, and the arbiter evaluates the sum of the return and loss of each action and takes the best one.
We perform two experiments: the grid world and Taiwanese Electronic Stock Index simulated trades. In the grid world, we evaluate the expected return and the expected loss of different level of risk-averse agents. We compare the risk-avoiding agent with the variance-penalized and risk sensitive agent in the stock trading experiment. The results show that our risk-avoiding agent can not only reduce the expected loss but also cut down other kinds of risks.
en
dc.description.provenanceMade available in DSpace on 2021-06-13T00:43:11Z (GMT). No. of bitstreams: 1
ntu-100-R97922014-1.pdf: 1260102 bytes, checksum: 752680c471600c4995a95e13323a52af (MD5)
Previous issue date: 2011
en
dc.description.tableofcontents口試委員會審定書 #
誌謝 ii
摘要 iii
ABSTRACT iv
CONTENTS v
LIST OF FIGURES vii
LIST OF TABLES ix
Chapter 1 Introduction 1
1.1 Motivation 1
1.2 Background 3
1.3 Methodology Outline 4
1.4 Contribution 7
Chapter 2 Related Works 10
2.1 Reinforcement Learning Framework 10
2.1.1 Markov Decision Process 10
2.1.2 Temporal Difference Learning 12
2.2 Risk-Related Reinforcement Learning 13
Chapter 3 Methodology 16
3.1 Risk Definition 16
3.2 Risk Avoiding Reinforcement Learning 19
3.2.1 System Framework 19
3.3 Learning Algorithms for Subagents 21
3.3.1 Risk Subagent 21
3.3.2 Profit Subagent 25
3.3.3 RARL System Optimality 26
Chapter 4 Experiments 27
4.1 Grid world 27
4.1.1 Sensitive-Loss Threshold at Level Zero 27
4.1.2 Sensitive Loss-Threshold at Level of Walking Cost 30
4.2 Taiwanese Electronic Stock Index 34
4.2.1 Evaluation Metrics 35
4.2.2 Competitive Algorithms 36
4.2.3 Experiment Design 37
4.2.4 Results with Fluctuant Trend 40
4.2.5 Results with Smooth Trend 46
4.2.6 Single Risk-Devalued State-Action Value Function 52
Chapter 5 Conclusion 56
REFERENCE 57
dc.language.isoen
dc.subject風險zh_TW
dc.subject人工智慧zh_TW
dc.subject強化學習zh_TW
dc.subjectArtificial Intelligenceen
dc.subjectReinforcement Learningen
dc.subjectRisken
dc.title可避免風險之強化學習演算法zh_TW
dc.titleRisk-Avoiding Reinforcement Learningen
dc.typeThesis
dc.date.schoolyear99-2
dc.description.degree碩士
dc.contributor.oralexamcommittee林軒田(Hsuan-Tien Lin),黃健?(Chien-Feng Huang),王傑智(Chieh-Chih Wang)
dc.subject.keyword強化學習,風險,人工智慧,zh_TW
dc.subject.keywordReinforcement Learning,Risk,Artificial Intelligence,en
dc.relation.page59
dc.rights.note有償授權
dc.date.accepted2011-08-04
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept資訊工程學研究所zh_TW
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
ntu-100-1.pdf
  未授權公開取用
1.23 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved