[KSCI] Korea Science Citation Index Service

Efficient Approximation of State Space for Reinforcement Learning Using Complex Network Models

Yi, Seung-Joon (서울대학교 전기컴퓨터공학부)
Eom, Jae-Hong (서울대학교 전기컴퓨터공학부)
Zhang, Byoung-Tak (서울대학교 전기컴퓨터공학부)

Publication Information

Journal of KIISE:Software and Applications / v.36, no.6, 2009 , pp. 479-490 More about this Journal

Abstract

A number of temporal abstraction approaches have been suggested so far to handle the high computational complexity of Markov decision problems (MDPs). Although the structure of temporal abstraction can significantly affect the efficiency of solving the MDP, to our knowledge none of current temporal abstraction approaches explicitly consider the relationship between topology and efficiency. In this paper, we first show that a topological measurement from complex network literature, mean geodesic distance, can reflect the efficiency of solving MDP. Based on this, we build an incremental method to systematically build temporal abstractions using a network model that guarantees a small mean geodesic distance. We test our algorithm on a realistic 3D game environment, and experimental results show that our model has subpolynomial growth of mean geodesic distance according to problem size, which enables efficient solving of resulting MDP.

Keywords

Reinforcement Learning; Temporal abstraction; Measurement of efficiency; Topological property; Complex network model; Mean geodesic distance;

Citations & Related Records

Reference

1	Barto, A. G., Mahadevan, S., 'Recent advances in hierarchical reinforcement learning,' Discrete Event Systems Journal, Vol.13, pp. 41-77, 2003 DOI ScienceOn
2	Fritzke, B., 'A growing neural gas network learns topologies,' In Proc. of the 7th Neural Information Processing Systems, pp. 625-632, 1995
3	Littman, M. L., Dean, T. L., Kaelbling, L. P., 'On the complexity of solving Markov decision problems,' Uncertainty in Artificial Intelligence, pp. 394-402, 1995
4	Dietterich, T. G., 'Hierarchical reinforcement learning with the MAXQ value function decomposition,' Journal of Artificial Intelligence Research, Vol.13, pp. 227-303, 2000
5	McGovern, A., Barto, A. G., 'Subgoal discovery for hierarchical reinforcement learning using learned policies,' In Proc. of the International Conference on Machine Learning, pp. 361-368, 2001
6	Jong, N.K., Stone, P., 'State abstraction discovery from irrelevant state variables,' In proc. of the 19th International Joint Conferences on Artificial Intelligence, pp. 752-757, 2005
7	da F. Costa, L., Rodrigues, F. A., Travieso, G., Boas, P. R. V., 'Characterization of complex networks: A survey of measurements,' 2005
8	Watkins, C. J., Dayan, P., 'Q-learning,' Machine Learning, Vol.8, pp. 279-292, 1992
9	Watts, D. J., Strogatz, S. H., 'Collective dynamics of 'small-world' networks,' Nature, Vol.393, pp. 404-407, 1998 PUBMED
10	Erdos, P., Renyi, A., 'On random graphs,' Publicationes Mathemticae (Debrecen), Vol.6, pp. 290- 297, 1959
11	Simsek, O., Wolfe, A. P., Barto, A. G., 'Identifying useful subgoals in reinforcement learning by local graph partitioning,' In Proc. of the 22nd International Conference on Machine Learning, pp. 816- 823, 2005
12	Adamic, L. A., Lukose, R. M., Puniyani, A. R., Huberman, B. A., 'Search in power-law networks,' Phys. Rev. E, Vol.64, pp. 46135-46143, 2001 DOI
13	Beleznay, F., Grobler, T., Szepesvari, C., 'Comparing value-function estimation algorithms in undiscounted problems,' 1999
14	Digney, B., 'Learning hierarchical control structure for multiple tasks and changing environments,' In Proc. of the 5th Conference on the Simulation of Adaptive Behavior, 1998
15	Kleinberg, J., 'The Small-World Phenomenon: An Algorithmic Perspective,' In Proc. of the 32nd ACM Symposium on Theory of Computing, pp. 163-170, 2000
16	Jose del R. Millan, Posenato, D., Dedieu, E., 'Continuous- action q-learning,' Machine Learning, Vol.49, pp. 241-265, 2002
17	Barabasi, A.L., Albert, R., 'Emergence of scaling in random networks,' Science, Vol.286, pp. 509- 512, 1999 DOI PUBMED ScienceOn
18	Pickett, M., Barto, A. G., 'Policyblocks: An algo-rithm for creating useful macroactions in reinforcement learning,' In Proc. of the 9th International Conference on Machine Learning, pp. 506- 513, 2002
19	Sutton, R. S., Precup, D., Singh, S. P., 'Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning,' Artificial Intelligence, Vol.112, pp. 181-211, 1999 DOI ScienceOn
20	Sutton, R. S., 'Reinforcement learning: A survey,' Journal of Artificial Intelligence Research, Vol.4, pp. 237-285, 1996

KSCI

Efficient Approximation of State Space for Reinforcement Learning Using Complex Network Models 복잡계망 모델을 사용한 강화 학습 상태 공간의 효율적인 근사

Efficient Approximation of State Space for Reinforcement Learning Using Complex Network Models