請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/91216完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 呂育道 | zh_TW |
| dc.contributor.advisor | Yuh-Dauh Lyuu | en |
| dc.contributor.author | 林鼎鈞 | zh_TW |
| dc.contributor.author | Ding-Jun Lin | en |
| dc.date.accessioned | 2023-12-12T16:15:21Z | - |
| dc.date.available | 2023-12-13 | - |
| dc.date.copyright | 2023-12-12 | - |
| dc.date.issued | 2023 | - |
| dc.date.submitted | 2023-11-15 | - |
| dc.identifier.citation | [1] Artzner, P., Delbaen, F., Eber, J.-M., & Heath, D. (1999). Coherent measures of risk. Mathematical Finance, 9(3), 203–228.
[2] Artzner, P., Delbaen, F., Eber, J.-M., Heath, D., & Ku, H. (2007). Coherent multiperiod risk adjusted values and Bellman’s principle. Annals of Operations Research, 152(1), 5–22. [3] Baird, L. (1995). Residual algorithms: Reinforcement learning with function approximation. Machine Learning Proceedings 1995, Tahoe City, CA, 30–37. [4] Barto, A. G., Sutton, R. S., & Anderson, C. W. (1983). Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, SMC-13(5), 834–846. [5] Bäuerle, N., & Ott, J. (2011). Markov decision processes with average-value-at-risk criteria. Mathematical Methods of Operations Research, 74(3), 361–379. [6] Bellemare, M. G., Dabney, W., & Munos, R. (2017). A distributional perspective on reinforcement learning. Proceedings of the 34th International Conference on Machine Learning, Sydney, 449–458. [7] Bellman, R. (1957). A Markovian decision process. Journal of Mathematics and Mechanics, 6(5), 679–684. [8] Bellman, R. (1958). Dynamic programming. Princeton, NJ: Princeton University Press. [9] Bertsekas, D. P., & Tsitsiklis, J. N. (1991). An analysis of stochastic shortest path problems. Mathematics of Operations Research, 16(3), 580–595. [10] Bertsekas, D., & Tsitsiklis, J. N. (1996). Neuro-dynamic programming. Belmont, MA: Athena Scientific. [11] Bhatnagar, S., Sutton, R. S., Ghavamzadeh, M., & Lee, M. (2009). Natural actor–critic algorithms. Automatica, 45(11), 2471–2482. [12] Black, F., & Scholes, M. (1973). The pricing of options and corporate liabilities. Journal of Political Economy, 81(3), 637–654. [13] Blackwell, D. (1965). Discounted dynamic programming. Annals of Mathematical Statistics, 36(1), 226–235. [14] Blanchard, P., Higham, D. J., & Higham, N. J. (2021). Accurately computing the log-sum-exp and softmax functions. IMA Journal of Numerical Analysis, 41(4), 2311–2330. [15] Boda, K., & Filar, J. A. (2006). Time consistent dynamic risk measures. Mathematical Methods of Operations Research, 63(1), 169–186. [16] Buehler, H., Gonon, L., Teichmann, J., & Wood, B. (2019). Deep hedging. Quantitative Finance, 19(8), 1271–1291. [17] Cao, J., Chen, J., Hull, J., & Poulos, Z. (2020). Deep hedging of derivatives using reinforcement learning. Journal of Financial Data Science, 3(1), 10–27. [18] Caruana, R. (1997). Multitask learning. Machine Learning, 28(1), 41–75. [19] Chen, S. X. (2008). Nonparametric estimation of expected shortfall. Journal of Financial Econometrics, 6(1), 87–107. [20] Chow, Y., Tamar, A., Mannor, S., & Pavone, M. (2015). Risk-sensitive and robust decision-making: A cvar optimization approach. Proceedings of the 28th International Conference on Neural Information Processing Systems, Montreal, 1522–1530. [21] Dabney, W., Rowland, M., Bellemare, M., & Munos, R. (2018). Distributional reinforcement learning with quantile regression. Proceedings of the 32nd AAAI Conference on Artificial Intelligence, New Orleans, 2892–2901. [22] Degris, T., White, M., & Sutton, R. S. (2012). Off-policy actor-critic. Proceedings of the 29th International Conference on Machine Learning, Edinburgh, 179–186. [23] Delétang, G., Grau-Moya, J., Kunesch, M., Genewein, T., Brekelmans, R., Legg, S., & Ortega, P. A. (2021). Model-free risk-sensitive reinforcement learning. ArXiv Preprint ArXiv:2111.02907. [24] Denuit, M., Dhaene, J., Goovaerts, M., Kaas, R., & Laeven, R. (2006). Risk measurement with equivalent utility principles. Statistics & Risk Modeling, 24(1), 1–25. [25] Devroye, L. (2006). Nonuniform random variate generation. New York, NY: Springer. [26] Duan, Y., Chen, X., Houthooft, R., Schulman, J., & Abbeel, P. (2016). Benchmarking deep reinforcement learning for continuous control. Proceedings of the 33rd International Conference on Machine Learning, New York City, 1329–1338. [27] Föllmer, H., & Schied, A. (2002). Convex measures of risk and trading constraints. Finance and Stochastics, 6(4), 429–447. [28] Fujimoto, S., Hoof, H., & Meger, D. (2018). Addressing function approximation error in actor-critic methods. Proceedings of the 35th International Conference on Machine Learning, Stockholm, 1587–1596. [29] Ghosh, D., C Machado, M., & Le Roux, N. (2020). An operator view of policy gradient methods. Proceedings of the 34th International Conference on Neural Information Processing Systems, Virtual Conference, 3397–3406. [30] Girshick, R. (2015). Fast r-cnn. 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, 1440–1448. [31] Graves, A., Mohamed, A., & Hinton, G. (2013). Speech recognition with deep recurrent neural networks. 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, 6645–6649. [32] Greensmith, E., Bartlett, P. L., & Baxter, J. (2004). Variance reduction techniques for gradient estimates in reinforcement learning. Journal of Machine Learning Research, 5(9), 1471–1530. [33] Gretton, A., Borgwardt, K. M., Rasch, M. J., Schölkopf, B., & Smola, A. (2012). A kernel two-sample test. Journal of Machine Learning Research, 13(1), 723–773. [34] Halperin, I. (2020). QLBS: Q-learner in the Black-Scholes(-Merton) worlds. Journal of Derivatives, 28(1), 99–122. [35] Hardy, M. R., & Wirch, J. L. (2004). The iterated CTE: A dynamic risk measure. North American Actuarial Journal, 8(4), 62–75. [36] Harrison, J. M., & Pliska, S. R. (1981). Martingales and stochastic integrals in the theory of continuous trading. Stochastic Processes and Their Applications, 11(3), 215–260. [37] Hessel, M., Modayil, J., Van Hasselt, H., Schaul, T., Ostrovski, G., Dabney, W., Horgan, D., Piot, B., Azar, M., & Silver, D. (2018). Rainbow: Combining improvements in deep reinforcement learning. Proceedings of the 32nd AAAI Conference on Artificial Intelligence, New Orleans, 3215–3222. [38] Hodges, S. (1989). Optimal replication of contingent claims under transaction costs. Review of Futures Markets, 8(2), 222–239. [39] Howard, R. A., & Matheson, J. E. (1972). Risk-sensitive Markov decision processes. Management Science, 18(7), 356–369. [40] Huber, P. J. (1992). Robust estimation of a location parameter. Annals of Mathematical Statistics, 35(1), 492–518. [41] Iancu, D. A., Petrik, M., & Subramanian, D. (2015). Tight approximations of dynamic risk measures. Mathematics of Operations Research, 40(3), 655–682. [42] Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. 3rd International Conference on Learning Representations, San Diego, Poster presentation. [43] Koenker, R., & Bassett Jr, G. (1978). Regression quantiles. Econometrica, 46(1), 33–50. [44] Kolm, P. N., & Ritter, G. (2019). Dynamic replication and hedging: A reinforcement learning approach. Journal of Financial Data Science, 1(1), 159–171. [45] Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems 25, Lake Tahoe, 1097–1105. [46] Kupper, M., & Schachermayer, W. (2009). Representation results for law invariant time consistent functions. Mathematics and Financial Economics, 2(3), 189–210. [47] Li, Y., Swersky, K., & Zemel, R. (2015). Generative moment matching networks. Proceedings of the 32nd International Conference on Machine Learning, Lille, 1718–1727. [48] Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., & Wierstra, D. (2015). Continuous control with deep reinforcement learning. 4th International Conference on Learning Representations, San Juan, Poster presentation. [49] Lin, L.-J. (1992). Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning, 8(3), 293–321. [50] Loshchilov, I., & Hutter, F. (2017). Decoupled weight decay regularization. 7th International Conference on Learning Representations, New Orleans, Poster presentation. [51] Lyle, C., Bellemare, M. G., & Castro, P. S. (2019). A comparative analysis of expected and distributional reinforcement learning. Proceedings of the 33rd AAAI Conference on Artificial Intelligence, Honolulu, 4504–4511. [52] Mannor, S., & Tsitsiklis, J. N. (2013). Algorithmic aspects of mean–variance optimization in Markov decision processes. European Journal of Operational Research, 231(3), 645–653. [53] Markowitz, H. (1952). Portfolio Selection. Journal of Finance, 7(1), 77–91. [54] Mihatsch, O., & Neuneier, R. (2002). Risk-sensitive reinforcement learning. Machine Learning, 49(2), 267–290. [55] Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., & Ostrovski, G. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529–533. [56] Nguyen-Tang, T., Gupta, S., & Venkatesh, S. (2021). Distributional reinforcement learning via moment matching. Proceedings of the 35th AAAI Conference on Artificial Intelligence, Virtual Conference, 9144–9152. [57] Pflug, G. C., & Pichler, A. (2016). Time-consistent decisions and temporal decomposition of coherent risk functionals. Mathematics of Operations Research, 41(2), 682–699. [58] Pollak, R. A. (1968). Consistent planning. Review of Economic Studies, 35(2), 201–208. [59] Rachev, S., Ortobelli, S., Stoyanov, S., Fabozzi, F. J., & Biglova, A. (2008). Desirable properties of an ideal risk measure in portfolio theory. International Journal of Theoretical and Applied Finance, 11(1), 19–54. [60] Rockafellar, R. T., & Uryasev, S. (2000). Optimization of conditional value-at-risk. Journal of Risk, 2(3), 21–42. [61] Rockafellar, R. T., & Uryasev, S. (2002). Conditional value-at-risk for general loss distributions. Journal of Banking & Finance, 26(7), 1443–1471. [62] Roorda, B., Schumacher, J. M., & Engwerda, J. (2005). Coherent acceptability measures in multiperiod models. Mathematical Finance, 15(4), 589–612. [63] Rowland, M., Bellemare, M., Dabney, W., Munos, R., & Teh, Y. W. (2018). An analysis of categorical distributional reinforcement learning. Proceedings of the 21st International Conference on Artificial Intelligence and Statistics, Playa Blanca, 29–37. [64] Rummery, G. A., & Niranjan, M. (1994). On-line Q-learning using connectionist systems. Department of Engineering, University of Cambridge. [65] Ruszczyński, A. (2010). Risk-averse dynamic programming for Markov decision processes. Mathematical Programming, 125(2), 235–261. [66] Schulman, J., Levine, S., Abbeel, P., Jordan, M., & Moritz, P. (2015). Trust region policy optimization. Proceedings of the 32nd International Conference on Machine Learning, Lille, 1889–1897. [67] Shen, Y., Tobia, M. J., Sommer, T., & Obermayer, K. (2014). Risk-sensitive reinforcement learning. Neural Computation, 26(7), 1298–1328. [68] Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., & Lanctot, M. (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), 484–489. [69] Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., & Riedmiller, M. (2014). Deterministic policy gradient algorithms. Proceedings of the 31st International Conference on Machine Learning, Beijing, 387–395. [70] Sola, J., & Sevilla, J. (1997). Importance of input data normalization for the application of neural networks to complex industrial problems. IEEE Transactions on Nuclear Science, 44(3), 1464–1468. [71] Strotz, R. H. (1973). Myopia and inconsistency in dynamic utility maximization. Review of Economic Studies, 23(3), 165–180. [72] Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3(1), 9–44. [73] Sutton, R. S., & Barto, A. G. (1998). Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press. [74] Sutton, R. S., McAllester, D., Singh, S., & Mansour, Y. (1999). Policy gradient methods for reinforcement learning with function approximation. Advances in Neural Information Processing Systems 12, Denver, 1057–1063. [75] Tamar, A., Di Castro, D., & Mannor, S. (2016). Learning the variance of the reward-to-go. Journal of Machine Learning Research, 17(1), 361–396. [76] Van Hasselt, H., Guez, A., & Silver, D. (2016). Deep reinforcement learning with double Q-learning. Proceedings of the 30th AAAI Conference on Artificial Intelligence, Phoenix, 2094–2100. [77] Vittori, E., Trapletti, M., & Restelli, M. (2020). Option hedging with risk averse reinforcement learning. Proceedings of the First ACM International Conference on AI in Finance, New York City, 1–8. [78] Watkins, C. J., & Dayan, P. (1992). Q-learning. Machine Learning, 8(3), 279–292. [79] Zhang, S., & Yao, H. (2019). Quota: The quantile option architecture for reinforcement learning. Proceedings of the 33rd AAAI Conference on Artificial Intelligence, Honolulu, 5797–5804. | - |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/91216 | - |
| dc.description.abstract | 本篇論文結合風險規避強化學習以及深度值分布強化學習之技術,應用於可能存在市場摩擦之離散時間選擇權避險。具體而言,我們提出一個使用深度強化學習之選擇權避險架構,透過時序差分學習獲得損益分布函數之神經網路表徵,並從偽樣本中即時性地估計風險。此架構具多用性,因為其風險估計量以及分布之表徵形式皆為模組化。相較於其他以直接估計風險為目標的深度強化學習模型,此架構亦具有較高的獨立性與穩健性,因為它能在一個不依賴選擇權定價模型來建構報酬的選擇權避險情境中習得更好的避險策略與更準確的風險估計。另外,透過損益分布之表徵,此架構可進一步延伸為一種具有時間一致性的期望短缺最佳化方案之深度學習實現。最後,我們提出一個概念性驗證來展現此架構可從隨機的避險情境中習得可泛化的避險模型。 | zh_TW |
| dc.description.abstract | The thesis applies the combined techniques in risk-averse reinforcement learning (RL) and deep distributional RL to discrete-time option hedging with possible presence of friction. Specifically, we lay out a deep RL option hedging framework in which a neural network representation of the profit and loss distribution function is obtained through temporal difference learning and the risk is estimated from a pseudo-sample on the fly. The framework is versatile because the risk estimator and the distribution representation are both modular. It is also more independent and robust than several deep RL models that aim to directly estimate risk, in that it learns better hedging policies and more accurate risk predictions in a hedging setting where the reward formulation does not depend on an option pricing model. Moreover, the access of a representation of the profit and loss distribution allows extension of the framework to a novel deep learning implementation of a time-consistent optimization scheme of expected shortfall. Finally, we demonstrate a proof of concept that this framework can learn a generalizable hedging model from randomized hedging instances. | en |
| dc.description.provenance | Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-12-12T16:15:21Z No. of bitstreams: 0 | en |
| dc.description.provenance | Made available in DSpace on 2023-12-12T16:15:21Z (GMT). No. of bitstreams: 0 | en |
| dc.description.tableofcontents | 謝辭 i
中文摘要 ii Abstract iii Contents iv List of Figures vi List of Tables vii Chapter 1 Introduction 1 Chapter 2 Risk-neutral Reinforcement Learning 3 2.1 Markov Decision Process 3 2.2 Classic Reinforcement Learning 4 2.3 Deep Reinforcement Learning 8 Chapter 3 Risk-averse Reinforcement Learning for Option Hedging 11 3.1 Option Hedging as an MDP 11 3.2 Risk-averse Reinforcement Learning 14 3.3 Risk-averse Deep Reinforcement Learning for Option Hedging 18 Chapter 4 Risk-averse Deep Distributional Reinforcement Learning for Option Hedging 20 4.1 Deep Distributional Reinforcement Learning 20 4.2 Risk-averse Distributional DQN and DDPG for Option Hedging 24 4.3 Addressing Time Inconsistency of Expected Shortfall 30 Chapter 5 Numerical Results 35 5.1 The General Setup of the Experiments 35 5.2 Comparison of Models 40 5.2.1 Deep Hedging of Mean-variance Risk 41 5.2.2 Deep Hedging of Entropic Risk 46 5.2.3 Deep Hedging of Expected Shortfall 51 5.2.4 Section Summary 53 5.3 Time Consistency in Deep Hedging of Expected Shortfall 55 5.4 Generalization: A Proof of Concept 59 Chapter 6 Conclusion 63 References 65 | - |
| dc.language.iso | en | - |
| dc.subject | 市場摩擦 | zh_TW |
| dc.subject | 風險規避強化學習 | zh_TW |
| dc.subject | 值分布強化學習 | zh_TW |
| dc.subject | 選擇權避險 | zh_TW |
| dc.subject | 時間一致性 | zh_TW |
| dc.subject | 期望短缺 | zh_TW |
| dc.subject | 演員–評論家演算法 | zh_TW |
| dc.subject | 風險規避強化學習 | zh_TW |
| dc.subject | 值分布強化學習 | zh_TW |
| dc.subject | 選擇權避險 | zh_TW |
| dc.subject | 市場摩擦 | zh_TW |
| dc.subject | 時間一致性 | zh_TW |
| dc.subject | 期望短缺 | zh_TW |
| dc.subject | 演員–評論家演算法 | zh_TW |
| dc.subject | time consistency | en |
| dc.subject | option hedging | en |
| dc.subject | market friction | en |
| dc.subject | expected shortfall | en |
| dc.subject | risk-averse reinforcement learning | en |
| dc.subject | distributional reinforcement learning | en |
| dc.subject | option hedging | en |
| dc.subject | market friction | en |
| dc.subject | actor-critic method | en |
| dc.subject | time consistency | en |
| dc.subject | expected shortfall | en |
| dc.subject | actor-critic method | en |
| dc.subject | risk-averse reinforcement learning | en |
| dc.subject | distributional reinforcement learning | en |
| dc.title | 以風險規避之深度值分布強化學習進行市場摩擦下之選擇權避險 | zh_TW |
| dc.title | Risk-averse Deep Distributional Reinforcement Learning for Option Hedging under Market Frictions | en |
| dc.type | Thesis | - |
| dc.date.schoolyear | 112-1 | - |
| dc.description.degree | 碩士 | - |
| dc.contributor.oralexamcommittee | 張經略;陸裕豪 | zh_TW |
| dc.contributor.oralexamcommittee | Ching-Lueh Chang;U-Hou Lok | en |
| dc.subject.keyword | 風險規避強化學習,值分布強化學習,選擇權避險,市場摩擦,時間一致性,期望短缺,演員–評論家演算法, | zh_TW |
| dc.subject.keyword | risk-averse reinforcement learning,distributional reinforcement learning,option hedging,market friction,time consistency,expected shortfall,actor-critic method, | en |
| dc.relation.page | 72 | - |
| dc.identifier.doi | 10.6342/NTU202304116 | - |
| dc.rights.note | 同意授權(全球公開) | - |
| dc.date.accepted | 2023-11-16 | - |
| dc.contributor.author-college | 電機資訊學院 | - |
| dc.contributor.author-dept | 資訊工程學系 | - |
| 顯示於系所單位: | 資訊工程學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-112-1.pdf | 2.3 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
