부분관찰 마코프의 의사결정과정을 이용한 지능형 에이전트 구현

Kim, Dong-Ho;Kim, Gi-Ung;

정보과학회지 (Communications of the Korean Institute of Information Scientists and Engineers)

제29권2호
/
Pages.39-47
/
2011
/
1229-6821(pISSN)

한국정보과학회 (Korean Institute of Information Scientists and Engineers)

부분관찰 마코프의 의사결정과정을 이용한 지능형 에이전트 구현

김동호 ;
김기웅

Kim, Dong-Ho (KAIST) ;
Kim, Gi-Ung (KAIST)

발행 : 2011.02.28

KSCI

⟨ 이전 논문 다음 논문 ⟩

초록

키워드

과제정보

연구 과제 주관 기관 : 한국연구재단

참고문헌

M. L. Puterman, "Markov Decision Processes", John Wiley & Sons, 1994.
L. P. Kaelbling, M. L. Littman, and A. R. Cassandra, "Planning and acting in partially observable stochastic domains", Artificial Intelligence, 101:99-134, 1998. https://doi.org/10.1016/S0004-3702(98)00023-X
A. R. Cassandra, L. P. Kaelbling, and J. A. Kurien, "Acting under uncertainty: Discrete Bayesian models for mobile robot navigation", In Proceedings of IEEE/ RSJ International Concerence on Intelligent Robots and Systems, 1996.
C. Papadimitriou, and J. N. Tsitsiklis, "The complexity of Markov decision processes", Mathematics of Operations Research, 12(3):441-450, 1987. https://doi.org/10.1287/moor.12.3.441
E. J. Sondik, "The optimal control of partially observable Markov processes", PhD thesis, Stanford University, 1971.
G. E. Monahan, "A survey of partially observable Markov decision processes: Theory, models and algorithms", Management Science, 28(1):1-16, 1982. https://doi.org/10.1287/mnsc.28.1.1
W. Zhang, "Algorithms for partially observable Markov decision processes", PhD thesis, University of British Columbia, 1988.
A. R. Cassandra, L. P. Kaelbling, and M. L. Littman, "Acting optimally in partially observable stochastic domains", In Proceedings of the 12th National Conference on Artificial Intelligence, 1994.
N. L. Zhang, and W. Liu, "Planning in stochastic domains: Problem characteristics and approximation", Technical Report HKUST-CS96-31, Hong Kong University of Science and Technology, 1996.
J. Pineau, G. Gordon, and S. Thrun, "Point-based value iteration: an anytime algorithm for POMDPs", In Proceedings of IJCAI, 2003.
J. Pineau, G. Gordon, and S. Thrun, "Anytime pointbased approximations for large POMDPs", Journal of Artificial Intelligence Research, 27:335-380, 2006.
M. T. J. Spaan and N. Vlassis, "Perseus: Randomized point-based value iteration for POMDPs", Journal of Artificial Intelligence Research, 24:195-220. 2005.
T. Smith, and R. Simmons, "Heuristic search value iteration for POMDPs", In Proceedings of UAI, 2004.
T. Smith, and R. Simmons, "Point-based POMDP algorithms: improved analysis and implementation", In Proceedings of UAI, 2005.
J. D. Williams and S. Young, "Partially observable Markov decision processes for spoken dialog systems", Computer Speech and Language 21(2):393-422, 2007. https://doi.org/10.1016/j.csl.2006.06.008
T. Lane, "A Decision Theoretic, Semi-Supervised Model for Intrusion Detection", In M. Maloof, ed., Machine learning and data mining for computer security: Methods and applications, Springer-Verlag, 2006.
Q. Zhao, L. Tong, A. Swami, and Y. Chen, "Decentralized cognitive MAC for opportunistic spectrum access in ad hoc networks: A POMDP framework", IEEE Journal on Selected Areas in Communications, 25(3):589-600, 2007. https://doi.org/10.1109/JSAC.2007.070409
J. Park, K.-E. Kim, and S. Jo, "A POMDP approach to P300-based brain-computer interfaces", In Proceedings of ACM International Conference on Intelligence User Interfaces (IUI), 2010
N. Fraser, "Assessment of Interactive Systems", In Handbook of Standards and Resources for Spoken Language Systems, pages 564-614. Mouton de Gruyter, 1997.
C. Boutilier and D. Poole, "Computing optimal policies for partially observable decision processes using compact representations", In Proceedings of AAAI, 1996.
J. D. Williams and S. Young, "Scaling POMDPs for spoken dialog management", IEEE Transactions on Audio Speech and Language Process, 15(7):2116-2129, 2007. https://doi.org/10.1109/TASL.2007.902050
H. S. Sim, K.-E. Kim, J. H. Kim, D.-S. Chang, and M.-W. Koo, "Symbolic heuristic search value iteration for factored POMDPs", In Proceedings of AAAI, 2008.
D. Kim, H. S. Sim, K.-E. Kim, J. H. Kim, H. Kim, and J. W. Sung, "Effects of user modeling on POMDPbased dialogue systems", In Proceedings of Interspeech, 2008.
D. Kim, J. H. Kim, and K.-E. Kim, "Robust evaluation of POMDP-based dialogue systems", IEEE Transactions on Audio Speech and Language Process, to be published.
J. R. Wolpaw, N. Birbaumer, D. J. McFarland, G. Pfurtscheller, and T. M. Vaughan, "Brain-computer interfaces for communication and control", Clin. Neurophysiol. 113, 2002.
R. Fazel-Rezai, "Human error in P300 speller paradigm for brain-computer interface", In Proceedings of 29th Ann. Int. Conf. of IEEE Trans. EMBS, pp 2516-2519, 2007.
D. J. Krusienski, E. W. Sellers, D. J. McFarland, T. M. Vaughan, and J. R. Wolpaw, "Toward enhanced p300 speller performance", J. Neurosci. Methods 167:15-21, 2008. https://doi.org/10.1016/j.jneumeth.2007.07.017
J. Pineau, G. Gordon, and S. Thrun, "Policy-contingent abstraction for robust robot control", In Proceedings of UAI, 2003.
A. P. Wolfe, "POMDP homomorphisms", In Proceedings of NIPS, 2006.
K.-E. Kim, "Exploiting symmetries in POMDPs for point-based algorithms", In Proceedings of AAAI, 2008.
S. Sanner and C. Boutilier, "Practical solution techniques for first-order MDPs", Artificial Intelligence, 173:748-788, 2009. https://doi.org/10.1016/j.artint.2008.11.003
S. Sanner and K. Kersting, "Symbolic dynamic programming for first-order POMDPs", In Proceedings of AAAI, 2010.
Y. Virin, G. Shani, S. E. Shimony, and R. I. Brafman, "Scaling up: Solving POMDPs through value based clustering", In Proceedings of AAAI, 2007.
G. Shani, R. I. Brafman, and S. E. Shimony, "Forward search value iteration for POMDPs", In Proceedings of IJCAI, 2007.
T. Jaakkola, S. P. Singh, and M. I. Jordan, "Reinforcement learning algorithm for partially observable Markov decision problems", In Proceedings of NIPS, 1995.
J. Baxter and P. L. Bartlett, "Reinforcement learning in POMDPs via direct gradient ascent", In Proceedings of ICML, 2000.
R. Jaulmes, J. Pineau, and D. Precup, "Active learning in partially observable Markov decision processes", In Proceedings of ECML, 2005.
G. Shani, R. I. Brafman, and S. E. Shimony, "Modelbased online learning of POMDPs", In Proceedings of ECML, 2005.
D. Wierstra, and M. Wiering, "Utile distinction hidden Markov models", In Proceedings of ICML, 2004.
S. Ross, J. Pineau, and B. Chaib-draa, "Bayes-adaptive POMDPs", In Proceedings of NIPS, 2007.
C. Cai, X. Liao, and L. Carin, "Learning to explore and exploit in POMDPs", In Proceedings of NIPS, 2009.
S. Russell, "Learning agents for uncertain environments", In Proceedings of COLT, 1998.
A. Y. Ng and S. Russell, "Algorithms for inverse reinforcement learning", In Proceedings of ICML, 2000.
D. Ramachandran and E. Amir, "Bayesian inverse reinforcement learning", In Proceedings of IJCAI, 2007.
J. Choi and K.-E. Kim, "Inverse reinforcement learning in partially observable environments", In Proceedings of IJCAI, 2009.
E. A. Hansen, "Finite-memory control of partially observable systems", PhD thesis, University of Massachusetts at Amherst, 1998.

정보과학회지 (Communications of the Korean Institute of Information Scientists and Engineers)

부분관찰 마코프의 의사결정과정을 이용한 지능형 에이전트 구현

초록

키워드

과제정보

참고문헌

자세히 찾기