References
- R. S. Sutton, "Learning to Predict by Method of Temporal Differences," Machine Learning, vol. 3, 1, pp. 9-44, 1988.
- T. Jaakkola, M. Jordan, and S. P. Singh, " On the Convergence of Stochastic Iterative Dynamic Programming Algorithms," Neural Computation, vol. 6, pp. 341-362, 1992.
- C. J. C. H. Watkins and P. Dayan, "Technical Note: QLearning," Machine Learning, vol. 8, pp. 56-68, 1992.
- Y. Kashimura, A. Ueno, and S. Tatsumi, "A Continuous Action Space Representation by Particle Filter for Reinforcement Learning," JSAI2008, pp. 118-121, 2008.
- A. Notsu, H. Honda, H. Ichihashi, and H. Wada, "Contraction Algorithm in State and Action space for Qlearning," Proc. of SCIS&ISIS, pp. 93-96, 2009.
- A. Notsu, H. Wada, H. Honda, and H. Ichihashi, "Cell Division Approach for Search Space in Reinforcement Learning," International Journal of Computer Science and Network Security, vol. 8, no. 6, 2008.
- A. Ito and M. Kanabuchi, "Speeding up Multi-Agent Reinforcement Learning by Coarse-Graining of Perception Hunter Game as an Example," IEICE Trans. D, vol. J84-D1, no. 3, pp. 285-293, 2001.
- M. Nagayoshi, H. Muraoand, and H. Tamaki, "Switching Reinforcement Learning to Mimic an Infant's Motor Development Application to Two-dimensional Continuous Action Space," Proc. SICE Annual Conference 2010 (SICE 2010), pp.243-246 (TA09-3(on DVD-ROM)), 2010.