부분관찰 마코프의 의사결정과정을 이용한 지능형 에이전트 구현

Kim, Dong-Ho;Kim, Gi-Ung;

Communications of the Korean Institute of Information Scientists and Engineers (정보과학회지)

Volume 29 Issue 2
/
Pages.39-47
/
2011
/
1229-6821(pISSN)

Korean Institute of Information Scientists and Engineers (한국정보과학회)

부분관찰 마코프의 의사결정과정을 이용한 지능형 에이전트 구현

Kim, Dong-Ho (KAIST) ;
Kim, Gi-Ung (KAIST)

김동호 ;
김기웅

Published : 2011.02.28

KSCI

⟨ Previous Next ⟩

Abstract

Keywords

Acknowledgement

Supported by : 한국연구재단

References

M. L. Puterman, "Markov Decision Processes", John Wiley & Sons, 1994.
L. P. Kaelbling, M. L. Littman, and A. R. Cassandra, "Planning and acting in partially observable stochastic domains", Artificial Intelligence, 101:99-134, 1998. https://doi.org/10.1016/S0004-3702(98)00023-X
A. R. Cassandra, L. P. Kaelbling, and J. A. Kurien, "Acting under uncertainty: Discrete Bayesian models for mobile robot navigation", In Proceedings of IEEE/ RSJ International Concerence on Intelligent Robots and Systems, 1996.
C. Papadimitriou, and J. N. Tsitsiklis, "The complexity of Markov decision processes", Mathematics of Operations Research, 12(3):441-450, 1987. https://doi.org/10.1287/moor.12.3.441
E. J. Sondik, "The optimal control of partially observable Markov processes", PhD thesis, Stanford University, 1971.
G. E. Monahan, "A survey of partially observable Markov decision processes: Theory, models and algorithms", Management Science, 28(1):1-16, 1982. https://doi.org/10.1287/mnsc.28.1.1
W. Zhang, "Algorithms for partially observable Markov decision processes", PhD thesis, University of British Columbia, 1988.
A. R. Cassandra, L. P. Kaelbling, and M. L. Littman, "Acting optimally in partially observable stochastic domains", In Proceedings of the 12th National Conference on Artificial Intelligence, 1994.
N. L. Zhang, and W. Liu, "Planning in stochastic domains: Problem characteristics and approximation", Technical Report HKUST-CS96-31, Hong Kong University of Science and Technology, 1996.
J. Pineau, G. Gordon, and S. Thrun, "Point-based value iteration: an anytime algorithm for POMDPs", In Proceedings of IJCAI, 2003.
J. Pineau, G. Gordon, and S. Thrun, "Anytime pointbased approximations for large POMDPs", Journal of Artificial Intelligence Research, 27:335-380, 2006.
M. T. J. Spaan and N. Vlassis, "Perseus: Randomized point-based value iteration for POMDPs", Journal of Artificial Intelligence Research, 24:195-220. 2005.
T. Smith, and R. Simmons, "Heuristic search value iteration for POMDPs", In Proceedings of UAI, 2004.
T. Smith, and R. Simmons, "Point-based POMDP algorithms: improved analysis and implementation", In Proceedings of UAI, 2005.
J. D. Williams and S. Young, "Partially observable Markov decision processes for spoken dialog systems", Computer Speech and Language 21(2):393-422, 2007. https://doi.org/10.1016/j.csl.2006.06.008
T. Lane, "A Decision Theoretic, Semi-Supervised Model for Intrusion Detection", In M. Maloof, ed., Machine learning and data mining for computer security: Methods and applications, Springer-Verlag, 2006.
Q. Zhao, L. Tong, A. Swami, and Y. Chen, "Decentralized cognitive MAC for opportunistic spectrum access in ad hoc networks: A POMDP framework", IEEE Journal on Selected Areas in Communications, 25(3):589-600, 2007. https://doi.org/10.1109/JSAC.2007.070409
J. Park, K.-E. Kim, and S. Jo, "A POMDP approach to P300-based brain-computer interfaces", In Proceedings of ACM International Conference on Intelligence User Interfaces (IUI), 2010
N. Fraser, "Assessment of Interactive Systems", In Handbook of Standards and Resources for Spoken Language Systems, pages 564-614. Mouton de Gruyter, 1997.
C. Boutilier and D. Poole, "Computing optimal policies for partially observable decision processes using compact representations", In Proceedings of AAAI, 1996.
J. D. Williams and S. Young, "Scaling POMDPs for spoken dialog management", IEEE Transactions on Audio Speech and Language Process, 15(7):2116-2129, 2007. https://doi.org/10.1109/TASL.2007.902050
H. S. Sim, K.-E. Kim, J. H. Kim, D.-S. Chang, and M.-W. Koo, "Symbolic heuristic search value iteration for factored POMDPs", In Proceedings of AAAI, 2008.
D. Kim, H. S. Sim, K.-E. Kim, J. H. Kim, H. Kim, and J. W. Sung, "Effects of user modeling on POMDPbased dialogue systems", In Proceedings of Interspeech, 2008.
D. Kim, J. H. Kim, and K.-E. Kim, "Robust evaluation of POMDP-based dialogue systems", IEEE Transactions on Audio Speech and Language Process, to be published.
J. R. Wolpaw, N. Birbaumer, D. J. McFarland, G. Pfurtscheller, and T. M. Vaughan, "Brain-computer interfaces for communication and control", Clin. Neurophysiol. 113, 2002.
R. Fazel-Rezai, "Human error in P300 speller paradigm for brain-computer interface", In Proceedings of 29th Ann. Int. Conf. of IEEE Trans. EMBS, pp 2516-2519, 2007.
D. J. Krusienski, E. W. Sellers, D. J. McFarland, T. M. Vaughan, and J. R. Wolpaw, "Toward enhanced p300 speller performance", J. Neurosci. Methods 167:15-21, 2008. https://doi.org/10.1016/j.jneumeth.2007.07.017
J. Pineau, G. Gordon, and S. Thrun, "Policy-contingent abstraction for robust robot control", In Proceedings of UAI, 2003.
A. P. Wolfe, "POMDP homomorphisms", In Proceedings of NIPS, 2006.
K.-E. Kim, "Exploiting symmetries in POMDPs for point-based algorithms", In Proceedings of AAAI, 2008.
S. Sanner and C. Boutilier, "Practical solution techniques for first-order MDPs", Artificial Intelligence, 173:748-788, 2009. https://doi.org/10.1016/j.artint.2008.11.003
S. Sanner and K. Kersting, "Symbolic dynamic programming for first-order POMDPs", In Proceedings of AAAI, 2010.
Y. Virin, G. Shani, S. E. Shimony, and R. I. Brafman, "Scaling up: Solving POMDPs through value based clustering", In Proceedings of AAAI, 2007.
G. Shani, R. I. Brafman, and S. E. Shimony, "Forward search value iteration for POMDPs", In Proceedings of IJCAI, 2007.
T. Jaakkola, S. P. Singh, and M. I. Jordan, "Reinforcement learning algorithm for partially observable Markov decision problems", In Proceedings of NIPS, 1995.
J. Baxter and P. L. Bartlett, "Reinforcement learning in POMDPs via direct gradient ascent", In Proceedings of ICML, 2000.
R. Jaulmes, J. Pineau, and D. Precup, "Active learning in partially observable Markov decision processes", In Proceedings of ECML, 2005.
G. Shani, R. I. Brafman, and S. E. Shimony, "Modelbased online learning of POMDPs", In Proceedings of ECML, 2005.
D. Wierstra, and M. Wiering, "Utile distinction hidden Markov models", In Proceedings of ICML, 2004.
S. Ross, J. Pineau, and B. Chaib-draa, "Bayes-adaptive POMDPs", In Proceedings of NIPS, 2007.
C. Cai, X. Liao, and L. Carin, "Learning to explore and exploit in POMDPs", In Proceedings of NIPS, 2009.
S. Russell, "Learning agents for uncertain environments", In Proceedings of COLT, 1998.
A. Y. Ng and S. Russell, "Algorithms for inverse reinforcement learning", In Proceedings of ICML, 2000.
D. Ramachandran and E. Amir, "Bayesian inverse reinforcement learning", In Proceedings of IJCAI, 2007.
J. Choi and K.-E. Kim, "Inverse reinforcement learning in partially observable environments", In Proceedings of IJCAI, 2009.
E. A. Hansen, "Finite-memory control of partially observable systems", PhD thesis, University of Massachusetts at Amherst, 1998.

Communications of the Korean Institute of Information Scientists and Engineers (정보과학회지)

부분관찰 마코프의 의사결정과정을 이용한 지능형 에이전트 구현

Abstract

Keywords

Acknowledgement

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)