DOI QR코드

DOI QR Code

On the Reward Function of Latent SAC Reinforcement Learning to Improve Longitudinal Driving Performance

종방향 주행성능향상을 위한 Latent SAC 강화학습 보상함수 설계

  • Jo, Sung-Bean (Dept. of Electrical Engineering, Pusan National University) ;
  • Jeong, Han-You (Dept. of Electrical Engineering, Pusan National University)
  • Received : 2021.12.10
  • Accepted : 2021.12.29
  • Published : 2021.12.31

Abstract

In recent years, there has been a strong interest in the end-to-end autonomous driving based on deep reinforcement learning. In this paper, we present a reward function of latent SAC deep reinforcement learning to improve the longitudinal driving performance of an agent vehicle. While the existing reward function significantly degrades the driving safety and efficiency, the proposed reward function is shown to maintain an appropriate headway distance while avoiding the front vehicle collision.

최근 심층강화학습을 활용한 종단간 자율주행에 대한 관심이 크게 증가하고 있다. 본 논문에서는 차량의 종방향 주행 성능을 개선하는 잠재 SAC 기반 심층강화학습의 보상함수를 제시한다. 기존 강화학습 보상함수는 주행 안전성과 효율성이 크게 저하되는 반면 제시하는 보상함수는 전방 차량과의 충돌위험을 회피하면서 적절한 차간거리를 유지할 수 있음을 보인다.

Keywords

Acknowledgement

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education(2019R1I1A3A01060890).

References

  1. Z. Zhu and H. Zhao, "A survey of deep RL and IL for autonomous driving policy learning," arXiv preprint, arXiv:2101.01993, 2021.
  2. H. Abdou et al, "End-to-end deep conditional imitation learning for autonomous driving," Proc. of IEEE ICM'19, pp.346-334, 2019.
  3. M. Bansal, K. Alex, and O. Abhijit, "Chauffeurnet: Learning to drive by imitating the best and synthesizing the wors," arXiv preprint arXiv: 1812.03079, 2018.
  4. W. Zeng et al. "End-to-end interpretable neural motion planner," Proc. of the IEEE CVPR'19, 2019.
  5. J. Chen, E. L. Shengbo, and T. Masayoshi, "Interpretable end-to-end urban autonomous driving with latent deep reinforcement learning," IEEE Trans on Intelli. Transpt. Syst., 2021.
  6. A. Dosovitskiy et al. "CARLA: An open urban driving simulator," Conf. on Robot Learning. 2017.
  7. V. Mnih et al. "Human-level control through deep reinforcement learning," Nature, vol.518, no.7540 pp.529-533, 2015. https://doi.org/10.1038/nature14236
  8. R. J. Williams, "Simple statistical gradient-following algorithms for connectionist reinforcement learning," Machine Learning, vol.8, no.3, pp.229-256, 1992. DOI: 10.1007/BF00992696
  9. T. P. Lillicrap et al. "Continuous control with deep reinforcement learning," arXiv preprint, arXiv: 1349.02971, 2015.
  10. T. Haarnoja et al. "Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor," Intern. Conf. on Machine Learning, 2018.
  11. D. P. Kingma, and W. Max, "Auto-encoding variational bayes," arXiv preprint arXiv:1312.6114, 2013.
  12. D. Zhao, Z. Xia, and Q. Zhang, "Model-free optimal control based intelligent cruise control with hardware-in-the-loop demonstration," IEEE Comput. Intelli. Mag., vol.12, no.2, pp.56-69, 2017. https://doi.org/10.1109/MCI.2017.2670463
  13. C. Desjardins and B. Chaib-Draa, "Cooperative adaptive cruise control: A reinforcement learning approach," IEEE Trans. on intelli. transpt. syst., vol.12, no.4, pp.1248-1260, 2011. https://doi.org/10.1109/TITS.2011.2157145