請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/29151完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 林守德(Shou-de Lin) | |
| dc.contributor.author | Jung-Jung Yeh | en |
| dc.contributor.author | 葉蓉蓉 | zh_TW |
| dc.date.accessioned | 2021-06-13T00:43:11Z | - |
| dc.date.available | 2012-08-10 | |
| dc.date.copyright | 2011-08-10 | |
| dc.date.issued | 2011 | |
| dc.date.submitted | 2011-08-04 | |
| dc.identifier.citation | [1] A. Damodaran. Investment Philosophies: Successful Investment Philosophies and the Greatest Investors Who Made Them Work. Wiley, Jan 2003
[2] A. Z. Stuart and J. Russell. Q-decomposition for reinforcement learning agents. in Machine Learning, Proceedings of the Twentieth International Conference, pages 656–663, 2003. [3] C. Acerbi and D. Tasche: Expected Shortfall, a natural coherent alternative to value at risk, 2001 [4] E. J. Elton, and M. J. Gruber, Modern portfolio theory and investment analysis. New York: John Wiley &Sons, 1995 [5] E. Maskin and J. Riley, Optimal Auctions with Risk Averse Buyers, Econometrica, 52(1), The Econometric Society,1984 [6] H. Kashima, Risk-Sensitive Learning via Minimization of Empirical Conditional Value-at-Risk, Transactions on Information and Systems, pages 2043-2052, Oxford University Press , 2007 [7] J. A. Filar, L. C. M. Kallenberg and Huey-Miin Lee, Variance-Penalized Markov Decision Processes, Mathematics of Operations Research, Vol. 14, No. 1, pp. 147-161, INFORMS, 1989 [8] J. Li and L. Chan, Reward Adjustment Reinforcement Learning for Risk-averse Asset Allocation, in Proc. International Joint Conference on Neural Networks pages 534 – 541, 2006 [9] M. Sato and S. Kobayashi. Variance-penalized reinforcement learning for risk-averse asset allocation. in Proceedings of the Second International Conference on Intelligent Data Engineering and Automated Learning, Data Mining, Financial Engineering, and Intelligent Agents , pages 244–249, 2000. [10] M. Heger. Consideration of Risk in Reinforcement Learning. in Proc. 11th International Conference on Machine Learning, pages 105-111,199 [11] O. Mihatsch and R. Neuneier. Risk-sensitive reinforcement learning. Machine Learning, 49(2-3):267–290, 2002. [12] P. Geibel and F. Wysotzki. Risk-sensitive reinforcement learning applied to control under constraints. Journal of Artificial Intelligence Research, 24(1):81–108, 2005. [13] P. Jorion, Value at Risk, The New Benchmark for Managing Financial Risk, 3rd ed. McGraw-Hill (2006). ISBN 978-0071464956 [14] R. E. Bellman, Dynamic Programming. Princeton University Press, Princeton, NJ,1957 [15] R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. The MIT Press, Mar 1998 [16] S. Koenig, and R. G. Simmons, Risk-sensitive planning with probabilistic decision graphs. in Proc of the Fourth International Conference on Principles of Knowledge Representation and Reasoning, pages 363–373, 1994. [17] W. F. Sharpe, The Sharpe Ratio, Journal of Portfolio Management, pages 49–58, 1994 [18] Y. Liu and R. Goodwin and S. Koenig, Risk Averse Auction Agents, in Proc of the second international joint conference on Autonomous agents and multi agent systems, 2003 | |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/29151 | - |
| dc.description.abstract | 傳統的強化學習演算法的目的為最大化累計折現報酬,並忽略了報酬的分布,使得報酬可能極不穩定而且會有極大損失的結果出現。本篇論文提出以期望損失作為風險定義並因此設計一可避免風險之強化學習演算法以改善傳統的強化學習。實驗結果顯示避免期望損失的強化學習演算法同時也可降低其他類的風險,如報酬的標準差,最大損失,損失機率,及實際報酬低於期望報酬的情形。同時避免期望損失可以降低一般公司及金融企業的違約風險及股市的斷頭危機,而這是既有文獻中尚未能有效處理的風險概念。
我們設計了一個可分解的強化學習演算法以有效降低期望損失。此架構包含了兩個子代理人及一個仲裁者。子代理人各自學習期望損失及期望報酬,而仲裁者評估可能行為所得到的風險及報酬後採取最適決定。 實驗分為格子世界及台灣的電子股指數模擬交易兩部分。在格子世界裡我們會展現不同程度的危險厭惡代理人所獲得的期望報酬及風險,與另一敏感損失係數可助於代理人提升期望報酬在給定的風險下。在電子股模擬交易則與懲罰變異數與風險敏感的強化學習演算法進行比較,結果發現在給定的投資報酬率下,避免期望損失的代理人可有效降低其他風險評估值。 | zh_TW |
| dc.description.abstract | Traditional reinforcement learning agents focus on maximizing the expected cumulated rewards and ignore the distribution of the return. However, for some tasks people prefer actions that might not lead to as much return but more likely to avoid disaster. This thesis proposes to define risk as the expected loss and accordingly design a risk-avoiding reinforcement learning agent. Our experiment shows that such risk-avoiding reinforcement learning agent can improve different types of risks such as variance of return, the maximal loss, the probability of fatal errors. The risk defined based on loss is capable of reducing the credit risk to the banks as well as the loss existing in stock marginal trading, which can hardly be coped effectively in the previous literatures.
We design a Q-decomposed reinforcement learning system to handle the tradeoff between expected loss and return. The framework consists of two subagents and one arbiter. Subagents learn the expected loss and the expected return individually, and the arbiter evaluates the sum of the return and loss of each action and takes the best one. We perform two experiments: the grid world and Taiwanese Electronic Stock Index simulated trades. In the grid world, we evaluate the expected return and the expected loss of different level of risk-averse agents. We compare the risk-avoiding agent with the variance-penalized and risk sensitive agent in the stock trading experiment. The results show that our risk-avoiding agent can not only reduce the expected loss but also cut down other kinds of risks. | en |
| dc.description.provenance | Made available in DSpace on 2021-06-13T00:43:11Z (GMT). No. of bitstreams: 1 ntu-100-R97922014-1.pdf: 1260102 bytes, checksum: 752680c471600c4995a95e13323a52af (MD5) Previous issue date: 2011 | en |
| dc.description.tableofcontents | 口試委員會審定書 #
誌謝 ii 摘要 iii ABSTRACT iv CONTENTS v LIST OF FIGURES vii LIST OF TABLES ix Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Background 3 1.3 Methodology Outline 4 1.4 Contribution 7 Chapter 2 Related Works 10 2.1 Reinforcement Learning Framework 10 2.1.1 Markov Decision Process 10 2.1.2 Temporal Difference Learning 12 2.2 Risk-Related Reinforcement Learning 13 Chapter 3 Methodology 16 3.1 Risk Definition 16 3.2 Risk Avoiding Reinforcement Learning 19 3.2.1 System Framework 19 3.3 Learning Algorithms for Subagents 21 3.3.1 Risk Subagent 21 3.3.2 Profit Subagent 25 3.3.3 RARL System Optimality 26 Chapter 4 Experiments 27 4.1 Grid world 27 4.1.1 Sensitive-Loss Threshold at Level Zero 27 4.1.2 Sensitive Loss-Threshold at Level of Walking Cost 30 4.2 Taiwanese Electronic Stock Index 34 4.2.1 Evaluation Metrics 35 4.2.2 Competitive Algorithms 36 4.2.3 Experiment Design 37 4.2.4 Results with Fluctuant Trend 40 4.2.5 Results with Smooth Trend 46 4.2.6 Single Risk-Devalued State-Action Value Function 52 Chapter 5 Conclusion 56 REFERENCE 57 | |
| dc.language.iso | en | |
| dc.subject | 風險 | zh_TW |
| dc.subject | 人工智慧 | zh_TW |
| dc.subject | 強化學習 | zh_TW |
| dc.subject | Artificial Intelligence | en |
| dc.subject | Reinforcement Learning | en |
| dc.subject | Risk | en |
| dc.title | 可避免風險之強化學習演算法 | zh_TW |
| dc.title | Risk-Avoiding Reinforcement Learning | en |
| dc.type | Thesis | |
| dc.date.schoolyear | 99-2 | |
| dc.description.degree | 碩士 | |
| dc.contributor.oralexamcommittee | 林軒田(Hsuan-Tien Lin),黃健?(Chien-Feng Huang),王傑智(Chieh-Chih Wang) | |
| dc.subject.keyword | 強化學習,風險,人工智慧, | zh_TW |
| dc.subject.keyword | Reinforcement Learning,Risk,Artificial Intelligence, | en |
| dc.relation.page | 59 | |
| dc.rights.note | 有償授權 | |
| dc.date.accepted | 2011-08-04 | |
| dc.contributor.author-college | 電機資訊學院 | zh_TW |
| dc.contributor.author-dept | 資訊工程學研究所 | zh_TW |
| 顯示於系所單位: | 資訊工程學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-100-1.pdf 未授權公開取用 | 1.23 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
