DOI QR코드

DOI QR Code

Comparison of value-based Reinforcement Learning Algorithms in Cart-Pole Environment

  • Byeong-Chan Han (Dept. of Electronic Engineering, Jeju National University) ;
  • Ho-Chan Kim (Dept. of Electrical Engineering, Jeju National University) ;
  • Min-Jae Kang (Dept of Electronic Engineering, Jeju National University)
  • 투고 : 2023.07.02
  • 심사 : 2023.07.11
  • 발행 : 2023.08.31

초록

Reinforcement learning can be applied to a wide variety of problems. However, the fundamental limitation of reinforcement learning is that it is difficult to derive an answer within a given time because the problems in the real world are too complex. Then, with the development of neural network technology, research on deep reinforcement learning that combines deep learning with reinforcement learning is receiving lots of attention. In this paper, two types of neural networks are combined with reinforcement learning and their characteristics were compared and analyzed with existing value-based reinforcement learning algorithms. Two types of neural networks are FNN and CNN, and existing reinforcement learning algorithms are SARSA and Q-learning.

키워드

과제정보

This research was supported by the 2023 scientific promotion program funded by Jeju National University

참고문헌

  1. Richard Sutton and Andrew Barto. Reinforcement Learning: An Introduction. MIT Press, 1998.
  2. BROCKMAN, Greg, et al. Openai gym. arXiv preprint arXiv:1606.01540, 2016.
  3. D. -H. Lee, V. V. Quang, S.Jo and J. -J. Lee, "Online Support Vector Regression based value function approximation for Reinforcement Learning," 2009 IEEE International Symposium on Industrial Electronics, Seoul, Korea (South), 2009, pp. 449-454, doi: https://doi.org/10.1109/ISIE.2009.5222726.
  4. Rammohan, Sreehari, et al. "Value-Based Reinforcement Learning for Continuous Control Robotic Manipulation in Multi-Task Sparse Reward Settings." arXiv preprint arXiv:2107.13356(2021).
  5. WATKINS, Christopher JCH; DAYAN, Peter. Q-learning. Machine learning, 1992, 8: 279-292. https://doi.org/10.1023/A:1022676722315
  6. D. Pandey and P. Pandey, "Approximate Q-Learning: An Introduction," 2010 Second International Conference on Machine Learning and Computing, Bangalore, India, 2010, pp. 317-320, doi: https://doi.org/10.1109/ICMLC.2010.38.
  7. N. Kantasewi, S. Marukatat, S. Thainimit and O. Manabu, "Multi Q-Table Q-Learning," 2019 10th International Conference of Information and Communication Technology for Embedded Systems(IC-ICTES), Bangkok, Thailand, 2019, pp. 1-7, doi: https://doi.org/10.1109/ICTEmSys.2019.8695963.
  8. RUMMERY, Gavin A.; NIRANJAN, Mahesan. On-line Q-learning using connectionist systems. Cambridge, UK: University of Cambridge, Department of Engineering, 1994.
  9. L. Harwin and S. P., "Comparison of SARSA algorithm and Temporal Difference Learning Algorithm for Robotic Path Planning for Static Obstacles," 2019 Third International Conference on Inventive Systems and Control (ICISC), Coimbatore, India, 2019, pp. 472-476, doi: https://doi.org/10.1109/ICISC44355.2019.9036354.
  10. T. Lu, K. Zhang and Y. Shi, "Sarsa-based Model Predictive Control with Improved Performance and Computational Complexity," 2022 IEEE 5th International Conference on Industrial Cyber-Physical Systems (ICPS), Coventry, United Kingdom, 2022, pp. 01-06, doi: https://doi.org/10.1109/ICPS51978.2022.9816896.
  11. MNIH, Volodymyr, et al. Human-level control through deep reinforcement learning. nature, 2015, 518.7540: 529-533. https://doi.org/10.1038/nature14236
  12. L. Hou, Z. Wang and H. Long, "An Improvement for Value-Based Reinforcement Learning Method Through Increasing Discount Factor Substitution," 2021 IEEE 24th International Conference on Computational Science and Engineering (CSE), Shenyang, China, 2021, pp. 94-100, doi: https://doi.org/10.1109/CSE53436.2021.00023.