Search | Korea Science

Hwang, Gyu-Young;Kim, Ju-Bong;Heo, Joo-Seong;Han, Youn-Hee
- KIPS Transactions on Computer and Communication Systems
- /
- v.10 no.5
- /
- pp.155-162
- /
- 2021
n-step TD learning is a combination of Monte Carlo method and one-step TD learning. If appropriate n is selected, n-step TD learning is known as an algorithm that performs better than Monte Carlo method and 1-step TD learning, but it is difficult to select the best values of n. In order to solve the difficulty of selecting the values of n in n-step TD learning, in this paper, using the characteristic that overestimation of Q can improve the performance of initial learning and that all n-step returns have similar values for Q ≈ Q^*, we propose a new learning target, which is composed of the maximum and the mean of all k-step returns for 1 ≤ k ≤ n. Finally, in OpenAI Gym's Atari game environment, we compare the proposed algorithm with n-step TD learning and proved that the proposed algorithm is superior to n-step TD learning algorithm.
https://doi.org/10.3745/KTCCS.2021.10.5.155 인용 PDF KSCI