DOI QR코드

DOI QR Code

Investigations on data-driven stochastic optimal control and approximate-inference-based reinforcement learning methods

데이터 기반 확률론적 최적제어와 근사적 추론 기반 강화 학습 방법론에 관한 고찰

  • Park, Jooyoung (Department of Control and Instrumentation Engineering, Korea University) ;
  • Ji, Seunghyun (Department of Control and Instrumentation Engineering, Korea University) ;
  • Sung, Keehoon (Department of Control and Instrumentation Engineering, Korea University) ;
  • Heo, Seongman (Department of Control and Instrumentation Engineering, Korea University) ;
  • Park, Kyungwook (School of Business Administration, Korea University)
  • 박주영 (고려대학교 과학기술대학 제어계측공학과) ;
  • 지승현 (고려대학교 과학기술대학 제어계측공학과) ;
  • 성기훈 (고려대학교 과학기술대학 제어계측공학과) ;
  • 허성만 (고려대학교 과학기술대학 제어계측공학과) ;
  • 박경욱 (고려대학교 경상대학 경영학부)
  • Received : 2015.03.22
  • Accepted : 2015.06.04
  • Published : 2015.08.25

Abstract

Recently in the fields o f stochastic optimal control ( SOC) and reinforcemnet l earning (RL), there have been a great deal of research efforts for the problem of finding data-based sub-optimal control policies. The conventional theory for finding optimal controllers via the value-function-based dynamic programming was established for solving the stochastic optimal control problems with solid theoretical background. However, they can be successfully applied only to extremely simple cases. Hence, the data-based modern approach, which tries to find sub-optimal solutions utilizing relevant data such as the state-transition and reward signals instead of rigorous mathematical analyses, is particularly attractive to practical applications. In this paper, we consider a couple of methods combining the modern SOC strategies and approximate inference together with machine-learning-based data treatment methods. Also, we apply the resultant methods to a variety of application domains including financial engineering, and observe their performance.

최근들어, 확률론적 최적제어(stochastic optimal control) 및 강화학습(reinforcement learning) 분야에서는 데이터를 활용하여 준최적 제어 전략을 찾는 문제를 위한 많은 연구 노력이 있어 왔다. 가치함수(value function) 기반 동적 계획법(dynamic programming)으로 최적제어기를 구하는 고전적인 이론은 확률론적 최적 제어 문제를 풀기위해 확고한 이론적 근거 아래 확립된바 있다. 하지만, 이러한 고전적 이론은 매우 간단한 경우에만 성공적으로 적용될 수 있다. 그러므로, 엄밀한 수학적 분석 대신에 상태 전이 및 보상 신호 값 등의 관련 데이터를 활용하여 준최적해를 구하고자 하는 데이터 기반 현대적 접근 방법들은 실용적인 응용분야에서 특히 매력적이다. 본 논문에서는 확률론적 최적제어 전략과 근사적 추론 및 기계학습 기반 데이터 처리 방법을 접목하는 방법론들을 고려한다. 그리고 이러한 고려를 통하여 얻어진 방법론들을 금융공학을 포함한 다양한 응용 분야에 적용하고 그들의 성능을 관찰해보도록 한다.

Keywords

References

  1. D.P. Bertsekas, Dynamic Programming and Optimal Control, vol. II, 4th edition, Athena Scientific, 2012.
  2. R.S. Sutton and A.G. Barto, Reinforcement Learning: An Introduction, MIT Press, 1998.
  3. D.P. Bertsekas and J. Tsitsiklis, Neuro-Dynamic Programming, Athena Scientific, 1996.
  4. K. Rawlik, M. Toussaint and S. Vijayakumar, "On stochastic optimal control and reinforcement learning by approximate inference", Proceedings of International Conference on Robotics Science and Systems, pp. 3052-3056, 2012.
  5. M.G. Azar, V. Gmez and H.J. Kappen, "Dynamic policy programming with function approximation," Proceedings of 14th International Conference on Artificial Intelligence and Statistics (AISTATS), 2011.
  6. C.M. Bishop, Pattern Recognition and Learning, Springer, 2006.
  7. K.P. Murphy, Machine Learning: A Probabilistic Perspective, MIT press. 2012.
  8. R. Lioutikov, A. Paraschos, J. Peters and G. Neumann, "Sample-based information-theoretic stochastic optimal control", Proceedings of the International Conference on Robotics and Automation, pp. 3896-3902, 2014.
  9. J. Peters, K. Mulling and Y. Altun, "Relative entropy policy search", Proceedings of the 24th National Conference on Artificial Intelligence (AAAI), pp. 1607-1612, 2010.
  10. M. Dai, Q. Zhang and Q.J. Zhu, "Trend following trading under a regime switching model," SIAM Journal on Financial Mathematics, vol. 1, pp. 780-810, 2010. https://doi.org/10.1137/090770552
  11. H.T. Kong, Q. Zhang and G.G. Yin, "A trend-following strategy: Conditions for optimality," Automatica, vol. 47, no. 4, pp. 661-667, 2011. https://doi.org/10.1016/j.automatica.2011.01.039
  12. J. Yu and Q. Zhang, "Optimal trend-following trading rules under a three-state regime switching model," Mathematical Control and Related Fields, vol. 2, no. 1, pp. 81-100, 2012. https://doi.org/10.3934/mcrf.2012.2.81
  13. J.A. Primbs, "A control systems based look at financial engineering," Tutorial from the presentation, The Control of Financial Portfolios, 2009.
  14. D.J. Higham, An Introduction to Financial Option Valuation: Mathematics, Stochastics and Computation, Cambridge University Press, 2004.
  15. P. Carr, K. Ellis and V. Gupta, "Static hedging of exotic options," The Journal of Finance, vol. 53, pp. 1165-1190, 1998. https://doi.org/10.1111/0022-1082.00048
  16. E. Derman, D. Ergener, and I. Kani, "Static options replication," Journal of Derivatives, vol. 2, pp. 78-95, 1995. https://doi.org/10.3905/jod.1995.407927
  17. S. Chung, P. Shih and W. Tsai, "Static hedging and pricing american knock-out options," Journal of Derivatives, vol. 37, pp. 23-48, 2013.
  18. M. Nalholm and R. Poulsen, "Static hedging of barrier options under general asset dynamics: Unification and application," Journal of Derivatives , vol. 13, pp. 46-60, 2006. https://doi.org/10.3905/jod.2006.635420
  19. M. Kamal, "When you cannot hedge continuously: The corrections to Black-Scholes," Goldman Sachs Equity Derivatives Research, 1998.
  20. F. Trabelsi and A. Trad, "Discrete hedging in a continuous- time model," Applied Mathematical Finance, vol. 9, pp. 189-217, 2002. https://doi.org/10.1080/1350486022000013672
  21. P. Carr, "Semi-static hedging of barrier options under Poission jumps," International Journal of Theoretical and Applied Finance, vol. 14, pp. 1091- 1111, 2011. https://doi.org/10.1142/S0219024911006668
  22. M. Jeannin, M. Pistorius, "Pricing and hedging barrier options in a hyper-exponential additive model," International Journal of Theoretical and Applied Finance, vol. 13, pp. 657-681, 2010. https://doi.org/10.1142/S0219024910005954
  23. W. Yip, D. Stephens and S. Olhede, "Hedging strategies and minimal variance portfolios for european and exotic options in a Levy market", Mathematical Finance, vol. 20, pp. 617-646, 2010. https://doi.org/10.1111/j.1467-9965.2010.00414.x
  24. J. Huang, M.G. Subrahmanyam and G. Yu, "Pricing and hedging american options: A recursive integration method," The Review of Financial Studies, vol. 9, pp. 277-300, 1996. https://doi.org/10.1093/rfs/9.1.277
  25. R.J. Frey, "Hidden Markov models with univariate Gaussian outcomes," Technical Report, Stony Brook University, 2009.
  26. T. Schaul, "Benchmarking exponential natural evolution strategies on the noiseless and noisy blackbox optimization testbeds," Proceedings of GECCO' 12, 2012.
  27. Y. Wang and S. Boyd, "Approximate dynamic programming via iterated Bellman inequalities," International Journal of Robust and Nonlinear Control, vol. 25, pp. 1472-1496, 2015. https://doi.org/10.1002/rnc.3152
  28. J. Park, S. Ji, K. Sung, K. Park, "Trend-following based on hidden Markov model and modern evolution strategy," Proceedings of 2015 Information and Control Symposium, pp. 52-54, 2015.