부분관찰 마코프의 의사결정과정을 이용한 지능형 에이전트 구현

  • Published : 2011.02.28

Abstract

Keywords

Acknowledgement

Supported by : 한국연구재단

References

  1. M. L. Puterman, "Markov Decision Processes", John Wiley & Sons, 1994.
  2. L. P. Kaelbling, M. L. Littman, and A. R. Cassandra, "Planning and acting in partially observable stochastic domains", Artificial Intelligence, 101:99-134, 1998. https://doi.org/10.1016/S0004-3702(98)00023-X
  3. A. R. Cassandra, L. P. Kaelbling, and J. A. Kurien, "Acting under uncertainty: Discrete Bayesian models for mobile robot navigation", In Proceedings of IEEE/ RSJ International Concerence on Intelligent Robots and Systems, 1996.
  4. C. Papadimitriou, and J. N. Tsitsiklis, "The complexity of Markov decision processes", Mathematics of Operations Research, 12(3):441-450, 1987. https://doi.org/10.1287/moor.12.3.441
  5. E. J. Sondik, "The optimal control of partially observable Markov processes", PhD thesis, Stanford University, 1971.
  6. G. E. Monahan, "A survey of partially observable Markov decision processes: Theory, models and algorithms", Management Science, 28(1):1-16, 1982. https://doi.org/10.1287/mnsc.28.1.1
  7. W. Zhang, "Algorithms for partially observable Markov decision processes", PhD thesis, University of British Columbia, 1988.
  8. A. R. Cassandra, L. P. Kaelbling, and M. L. Littman, "Acting optimally in partially observable stochastic domains", In Proceedings of the 12th National Conference on Artificial Intelligence, 1994.
  9. N. L. Zhang, and W. Liu, "Planning in stochastic domains: Problem characteristics and approximation", Technical Report HKUST-CS96-31, Hong Kong University of Science and Technology, 1996.
  10. J. Pineau, G. Gordon, and S. Thrun, "Point-based value iteration: an anytime algorithm for POMDPs", In Proceedings of IJCAI, 2003.
  11. J. Pineau, G. Gordon, and S. Thrun, "Anytime pointbased approximations for large POMDPs", Journal of Artificial Intelligence Research, 27:335-380, 2006.
  12. M. T. J. Spaan and N. Vlassis, "Perseus: Randomized point-based value iteration for POMDPs", Journal of Artificial Intelligence Research, 24:195-220. 2005.
  13. T. Smith, and R. Simmons, "Heuristic search value iteration for POMDPs", In Proceedings of UAI, 2004.
  14. T. Smith, and R. Simmons, "Point-based POMDP algorithms: improved analysis and implementation", In Proceedings of UAI, 2005.
  15. J. D. Williams and S. Young, "Partially observable Markov decision processes for spoken dialog systems", Computer Speech and Language 21(2):393-422, 2007. https://doi.org/10.1016/j.csl.2006.06.008
  16. T. Lane, "A Decision Theoretic, Semi-Supervised Model for Intrusion Detection", In M. Maloof, ed., Machine learning and data mining for computer security: Methods and applications, Springer-Verlag, 2006.
  17. Q. Zhao, L. Tong, A. Swami, and Y. Chen, "Decentralized cognitive MAC for opportunistic spectrum access in ad hoc networks: A POMDP framework", IEEE Journal on Selected Areas in Communications, 25(3):589-600, 2007. https://doi.org/10.1109/JSAC.2007.070409
  18. J. Park, K.-E. Kim, and S. Jo, "A POMDP approach to P300-based brain-computer interfaces", In Proceedings of ACM International Conference on Intelligence User Interfaces (IUI), 2010
  19. N. Fraser, "Assessment of Interactive Systems", In Handbook of Standards and Resources for Spoken Language Systems, pages 564-614. Mouton de Gruyter, 1997.
  20. C. Boutilier and D. Poole, "Computing optimal policies for partially observable decision processes using compact representations", In Proceedings of AAAI, 1996.
  21. J. D. Williams and S. Young, "Scaling POMDPs for spoken dialog management", IEEE Transactions on Audio Speech and Language Process, 15(7):2116-2129, 2007. https://doi.org/10.1109/TASL.2007.902050
  22. H. S. Sim, K.-E. Kim, J. H. Kim, D.-S. Chang, and M.-W. Koo, "Symbolic heuristic search value iteration for factored POMDPs", In Proceedings of AAAI, 2008.
  23. D. Kim, H. S. Sim, K.-E. Kim, J. H. Kim, H. Kim, and J. W. Sung, "Effects of user modeling on POMDPbased dialogue systems", In Proceedings of Interspeech, 2008.
  24. D. Kim, J. H. Kim, and K.-E. Kim, "Robust evaluation of POMDP-based dialogue systems", IEEE Transactions on Audio Speech and Language Process, to be published.
  25. J. R. Wolpaw, N. Birbaumer, D. J. McFarland, G. Pfurtscheller, and T. M. Vaughan, "Brain-computer interfaces for communication and control", Clin. Neurophysiol. 113, 2002.
  26. R. Fazel-Rezai, "Human error in P300 speller paradigm for brain-computer interface", In Proceedings of 29th Ann. Int. Conf. of IEEE Trans. EMBS, pp 2516-2519, 2007.
  27. D. J. Krusienski, E. W. Sellers, D. J. McFarland, T. M. Vaughan, and J. R. Wolpaw, "Toward enhanced p300 speller performance", J. Neurosci. Methods 167:15-21, 2008. https://doi.org/10.1016/j.jneumeth.2007.07.017
  28. J. Pineau, G. Gordon, and S. Thrun, "Policy-contingent abstraction for robust robot control", In Proceedings of UAI, 2003.
  29. A. P. Wolfe, "POMDP homomorphisms", In Proceedings of NIPS, 2006.
  30. K.-E. Kim, "Exploiting symmetries in POMDPs for point-based algorithms", In Proceedings of AAAI, 2008.
  31. S. Sanner and C. Boutilier, "Practical solution techniques for first-order MDPs", Artificial Intelligence, 173:748-788, 2009. https://doi.org/10.1016/j.artint.2008.11.003
  32. S. Sanner and K. Kersting, "Symbolic dynamic programming for first-order POMDPs", In Proceedings of AAAI, 2010.
  33. Y. Virin, G. Shani, S. E. Shimony, and R. I. Brafman, "Scaling up: Solving POMDPs through value based clustering", In Proceedings of AAAI, 2007.
  34. G. Shani, R. I. Brafman, and S. E. Shimony, "Forward search value iteration for POMDPs", In Proceedings of IJCAI, 2007.
  35. T. Jaakkola, S. P. Singh, and M. I. Jordan, "Reinforcement learning algorithm for partially observable Markov decision problems", In Proceedings of NIPS, 1995.
  36. J. Baxter and P. L. Bartlett, "Reinforcement learning in POMDPs via direct gradient ascent", In Proceedings of ICML, 2000.
  37. R. Jaulmes, J. Pineau, and D. Precup, "Active learning in partially observable Markov decision processes", In Proceedings of ECML, 2005.
  38. G. Shani, R. I. Brafman, and S. E. Shimony, "Modelbased online learning of POMDPs", In Proceedings of ECML, 2005.
  39. D. Wierstra, and M. Wiering, "Utile distinction hidden Markov models", In Proceedings of ICML, 2004.
  40. S. Ross, J. Pineau, and B. Chaib-draa, "Bayes-adaptive POMDPs", In Proceedings of NIPS, 2007.
  41. C. Cai, X. Liao, and L. Carin, "Learning to explore and exploit in POMDPs", In Proceedings of NIPS, 2009.
  42. S. Russell, "Learning agents for uncertain environments", In Proceedings of COLT, 1998.
  43. A. Y. Ng and S. Russell, "Algorithms for inverse reinforcement learning", In Proceedings of ICML, 2000.
  44. D. Ramachandran and E. Amir, "Bayesian inverse reinforcement learning", In Proceedings of IJCAI, 2007.
  45. J. Choi and K.-E. Kim, "Inverse reinforcement learning in partially observable environments", In Proceedings of IJCAI, 2009.
  46. E. A. Hansen, "Finite-memory control of partially observable systems", PhD thesis, University of Massachusetts at Amherst, 1998.