Efficient Approximation of State Space for Reinforcement Learning Using Complex Network Models

Yi, Seung-Joon;Eom, Jae-Hong;Zhang, Byoung-Tak;

Journal of KIISE:Software and Applications (한국정보과학회논문지:소프트웨어및응용)

Volume 36 Issue 6
/
Pages.479-490
/
2009
/
1229-6848(pISSN)

Korean Institute of Information Scientists and Engineers (한국정보과학회)

Efficient Approximation of State Space for Reinforcement Learning Using Complex Network Models

복잡계망 모델을 사용한 강화 학습 상태 공간의 효율적인 근사

이승준 (서울대학교 전기컴퓨터공학부) ;
엄재홍 (서울대학교 전기컴퓨터공학부) ;
장병탁 (서울대학교 전기컴퓨터공학부)

Published : 2009.06.15

PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

A number of temporal abstraction approaches have been suggested so far to handle the high computational complexity of Markov decision problems (MDPs). Although the structure of temporal abstraction can significantly affect the efficiency of solving the MDP, to our knowledge none of current temporal abstraction approaches explicitly consider the relationship between topology and efficiency. In this paper, we first show that a topological measurement from complex network literature, mean geodesic distance, can reflect the efficiency of solving MDP. Based on this, we build an incremental method to systematically build temporal abstractions using a network model that guarantees a small mean geodesic distance. We test our algorithm on a realistic 3D game environment, and experimental results show that our model has subpolynomial growth of mean geodesic distance according to problem size, which enables efficient solving of resulting MDP.

여러 가지 실세계 문제들은 마르코프 결정 문제(Markov decision problem) 들로 형식화하여 풀 수 있으나, 풀이 과정의 높은 계산 복잡도 때문에 실세계 문제들을 직접적으로 다루는 데 많은 어려움이 있다. 이를 해결하기 위해 많은 시간적 추상화(Temporal abstraction) 방법들이 제안되어 왔고 이를 자동화하기 위한 여러 방법들 또한 연구되어 왔으나, 이들 방법들은 명시적인 효율성 척도를 갖고 있지 않아 이론적인 성능 보장을 하지 못하는 문제가 있었다. 본 연구에서는 문제의 크기가 커지더라도 좋은 성능이 보장되는 자동적인 시간적 추상화 구현 방법에 대해 제안한다. 이를 위하여 네트워크 척도(Network measurements)를 이용하여 마르코프 결정 문제의 풀이 효율과 상태 궤적 그래프(State trajectory graph)의 위상 특성간의 관계를 분석하고, 네트워크 척도들 중 평균 측지 거리(Mean geodesic distance)가 마르코프 결정 문제의 풀이 성능과 밀접한 관계가 있다는 사실을 알아내었다. 이 사실을 기반으로 하여, 낮은 평균 측지 거리를 보장하는 복잡계망 모델(Complex network model)을 사용하여 시간적 추상화를 만들어 나가는 알고리즘을 제안한다. 제안된 알고리즘은 사실적인 3차원 게임 환경을 비롯한 여러 문제에 대해 테스트되었고, 문제 크기의 증가에도 불구하고 효율적인 풀이 성능을 보여 주었다.

Keywords

References

Barto, A. G., Mahadevan, S., 'Recent advances in hierarchical reinforcement learning,' Discrete Event Systems Journal, Vol.13, pp. 41-77, 2003 https://doi.org/10.1023/A:1022140919877
Sutton, R. S., Precup, D., Singh, S. P., 'Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning,' Artificial Intelligence, Vol.112, pp. 181-211, 1999 https://doi.org/10.1016/S0004-3702(99)00052-1
Fritzke, B., 'A growing neural gas network learns topologies,' In Proc. of the 7th Neural Information Processing Systems, pp. 625-632, 1995
Sutton, R. S., 'Reinforcement learning: A survey,' Journal of Artificial Intelligence Research, Vol.4, pp. 237-285, 1996
Watkins, C. J., Dayan, P., 'Q-learning,' Machine Learning, Vol.8, pp. 279-292, 1992
Littman, M. L., Dean, T. L., Kaelbling, L. P., 'On the complexity of solving Markov decision problems,' Uncertainty in Artificial Intelligence, pp. 394-402, 1995
Beleznay, F., Grobler, T., Szepesvari, C., 'Comparing value-function estimation algorithms in undiscounted problems,' 1999
Dietterich, T. G., 'Hierarchical reinforcement learning with the MAXQ value function decomposition,' Journal of Artificial Intelligence Research, Vol.13, pp. 227-303, 2000
Pickett, M., Barto, A. G., 'Policyblocks: An algo-rithm for creating useful macroactions in reinforcement learning,' In Proc. of the 9th International Conference on Machine Learning, pp. 506- 513, 2002
Digney, B., 'Learning hierarchical control structure for multiple tasks and changing environments,' In Proc. of the 5th Conference on the Simulation of Adaptive Behavior, 1998
McGovern, A., Barto, A. G., 'Subgoal discovery for hierarchical reinforcement learning using learned policies,' In Proc. of the International Conference on Machine Learning, pp. 361-368, 2001
Jong, N.K., Stone, P., 'State abstraction discovery from irrelevant state variables,' In proc. of the 19th International Joint Conferences on Artificial Intelligence, pp. 752-757, 2005
Simsek, O., Wolfe, A. P., Barto, A. G., 'Identifying useful subgoals in reinforcement learning by local graph partitioning,' In Proc. of the 22nd International Conference on Machine Learning, pp. 816- 823, 2005
da F. Costa, L., Rodrigues, F. A., Travieso, G., Boas, P. R. V., 'Characterization of complex networks: A survey of measurements,' 2005
Erdos, P., Renyi, A., 'On random graphs,' Publicationes Mathemticae (Debrecen), Vol.6, pp. 290- 297, 1959
Watts, D. J., Strogatz, S. H., 'Collective dynamics of 'small-world' networks,' Nature, Vol.393, pp. 404-407, 1998
Barabasi, A.L., Albert, R., 'Emergence of scaling in random networks,' Science, Vol.286, pp. 509- 512, 1999 https://doi.org/10.1126/science.286.5439.509
Jose del R. Millan, Posenato, D., Dedieu, E., 'Continuous- action q-learning,' Machine Learning, Vol.49, pp. 241-265, 2002
Kleinberg, J., 'The Small-World Phenomenon: An Algorithmic Perspective,' In Proc. of the 32nd ACM Symposium on Theory of Computing, pp. 163-170, 2000
Adamic, L. A., Lukose, R. M., Puniyani, A. R., Huberman, B. A., 'Search in power-law networks,' Phys. Rev. E, Vol.64, pp. 46135-46143, 2001 https://doi.org/10.1103/PhysRevE.64.046135

Journal of KIISE:Software and Applications (한국정보과학회논문지:소프트웨어및응용)

Efficient Approximation of State Space for Reinforcement Learning Using Complex Network Models

복잡계망 모델을 사용한 강화 학습 상태 공간의 효율적인 근사

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)