Browse > Article

Efficient Approximation of State Space for Reinforcement Learning Using Complex Network Models  

Yi, Seung-Joon (서울대학교 전기컴퓨터공학부)
Eom, Jae-Hong (서울대학교 전기컴퓨터공학부)
Zhang, Byoung-Tak (서울대학교 전기컴퓨터공학부)
Abstract
A number of temporal abstraction approaches have been suggested so far to handle the high computational complexity of Markov decision problems (MDPs). Although the structure of temporal abstraction can significantly affect the efficiency of solving the MDP, to our knowledge none of current temporal abstraction approaches explicitly consider the relationship between topology and efficiency. In this paper, we first show that a topological measurement from complex network literature, mean geodesic distance, can reflect the efficiency of solving MDP. Based on this, we build an incremental method to systematically build temporal abstractions using a network model that guarantees a small mean geodesic distance. We test our algorithm on a realistic 3D game environment, and experimental results show that our model has subpolynomial growth of mean geodesic distance according to problem size, which enables efficient solving of resulting MDP.
Keywords
Reinforcement Learning; Temporal abstraction; Measurement of efficiency; Topological property; Complex network model; Mean geodesic distance;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Barto, A. G., Mahadevan, S., 'Recent advances in hierarchical reinforcement learning,' Discrete Event Systems Journal, Vol.13, pp. 41-77, 2003   DOI   ScienceOn
2 Fritzke, B., 'A growing neural gas network learns topologies,' In Proc. of the 7th Neural Information Processing Systems, pp. 625-632, 1995
3 Littman, M. L., Dean, T. L., Kaelbling, L. P., 'On the complexity of solving Markov decision problems,' Uncertainty in Artificial Intelligence, pp. 394-402, 1995
4 Dietterich, T. G., 'Hierarchical reinforcement learning with the MAXQ value function decomposition,' Journal of Artificial Intelligence Research, Vol.13, pp. 227-303, 2000
5 McGovern, A., Barto, A. G., 'Subgoal discovery for hierarchical reinforcement learning using learned policies,' In Proc. of the International Conference on Machine Learning, pp. 361-368, 2001
6 Jong, N.K., Stone, P., 'State abstraction discovery from irrelevant state variables,' In proc. of the 19th International Joint Conferences on Artificial Intelligence, pp. 752-757, 2005
7 da F. Costa, L., Rodrigues, F. A., Travieso, G., Boas, P. R. V., 'Characterization of complex networks: A survey of measurements,' 2005
8 Watkins, C. J., Dayan, P., 'Q-learning,' Machine Learning, Vol.8, pp. 279-292, 1992
9 Erdos, P., Renyi, A., 'On random graphs,' Publicationes Mathemticae (Debrecen), Vol.6, pp. 290- 297, 1959
10 Watts, D. J., Strogatz, S. H., 'Collective dynamics of 'small-world' networks,' Nature, Vol.393, pp. 404-407, 1998   PUBMED
11 Beleznay, F., Grobler, T., Szepesvari, C., 'Comparing value-function estimation algorithms in undiscounted problems,' 1999
12 Digney, B., 'Learning hierarchical control structure for multiple tasks and changing environments,' In Proc. of the 5th Conference on the Simulation of Adaptive Behavior, 1998
13 Kleinberg, J., 'The Small-World Phenomenon: An Algorithmic Perspective,' In Proc. of the 32nd ACM Symposium on Theory of Computing, pp. 163-170, 2000
14 Jose del R. Millan, Posenato, D., Dedieu, E., 'Continuous- action q-learning,' Machine Learning, Vol.49, pp. 241-265, 2002
15 Barabasi, A.L., Albert, R., 'Emergence of scaling in random networks,' Science, Vol.286, pp. 509- 512, 1999   DOI   PUBMED   ScienceOn
16 Pickett, M., Barto, A. G., 'Policyblocks: An algo-rithm for creating useful macroactions in reinforcement learning,' In Proc. of the 9th International Conference on Machine Learning, pp. 506- 513, 2002
17 Simsek, O., Wolfe, A. P., Barto, A. G., 'Identifying useful subgoals in reinforcement learning by local graph partitioning,' In Proc. of the 22nd International Conference on Machine Learning, pp. 816- 823, 2005
18 Sutton, R. S., Precup, D., Singh, S. P., 'Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning,' Artificial Intelligence, Vol.112, pp. 181-211, 1999   DOI   ScienceOn
19 Sutton, R. S., 'Reinforcement learning: A survey,' Journal of Artificial Intelligence Research, Vol.4, pp. 237-285, 1996
20 Adamic, L. A., Lukose, R. M., Puniyani, A. R., Huberman, B. A., 'Search in power-law networks,' Phys. Rev. E, Vol.64, pp. 46135-46143, 2001   DOI