Cognitive Computing II: Machine Vision-Language Learning - 실생활 시각언어 학습

  • Published : 2012.01.18

Abstract

Keywords

Acknowledgement

Supported by : 한국연구재단

References

  1. Friederici, A. D., Towards a neural basis of auditory sentence processing, Trends in Cognitive Sciences, 6: 78-84, 2002. https://doi.org/10.1016/S1364-6613(00)01839-8
  2. Gazzaniga, M. S., Ivry, R. B., & Mangun, G. R. Cognitive Neuroscience: The Biology of the Mind (3rd Ed.), Norton, 2009.
  3. 장병탁, 여무송, Cognitive Computing I: Multisensory Perceptual Intelligence-실세계 지각행동 지능, 정보과학회지, 30(1):75-87, 2012.
  4. 장병탁, 김현수, Cognitive Computing III: Deep Dynamic Prediction-실시간 예측결정 추론, 정보과학회지, 30(1): 101-111, 2012.
  5. Ernst, M & Banks, M., Humans integrate visual and haptic information in a statistically optimal fashion, Nature, 415(24):429-433.
  6. Mitchell, T. M., Shinkareva, S. V., Carlson, A., Chang, K. M., Malave, V. L., Mason, R. A., and Just, M. A., Predicting human brain activity associated with the meanings of nouns, Science, 320: 1191, 2008. https://doi.org/10.1126/science.1152876
  7. Trommershaeuser, J., Koerding, K., and Landy, M. S. (Eds.), Sensory Cue Integration, Oxford University Press, 2011.
  8. Marr, D., Vision, Freeman and Company, 1982.
  9. Jain, R. & Kasturi, R., & Schunck, B. G., Machine Vision, McGraw-Hill, 1995.
  10. Poggio, T. & Shelton, C. Machine learning, machine vision and the brain, Al Magazine, 20(3):37-55, 1999.
  11. Serre, T., Oliva, A., Poggio. T., A feedforward architecture accounts for rapid categorization., Proc. Natl. Acad. Sci. USA, 104(15):6424-9, 2007. https://doi.org/10.1073/pnas.0700622104
  12. Slocum, J., A survey of machine translation: its history, current status, and future prospects, Computational Linguistics, 11(1):1-17, 1985.
  13. Maas, H. D., The Saarbrilcken automatic translation system (SUSY), Proc. of the Third European Congress on Overcoming Language Barrier, 1 :586-592, 1977.
  14. King, M., EUROTRA-a european system for machine translation citation information, Lebende Sprachen, 26(1): 12-14,1981.
  15. Winograd, T. & Flores, F., Understanding Computers and Cognition: A New Foundation for Design, Ablex Publ Corp., 1986.
  16. Lee, K-F., Hon, H.-W, Hwang, M.-Y., Mahajan, S., & Reddy, R., The SPHINX speech recognition system, Proc. of 1989 International Conference on Acoustics, Speech, and Signal Processing (ICASSP-89), pp.445- 448, 1989.
  17. Carroll, D. W., Psychology of Language (5th Ed.), Wadsworth, 2008.
  18. Schnelle, H., Language in the Brain, Cambridge University Press, 2010.
  19. Pulvermtiller, F., & Fadiga, L., Active perception: Sensorimotor circuits as a cortical basis for language, Nature Reviews Neuroscience, 11 (5):351-360,2010. https://doi.org/10.1038/nrn2811
  20. Rickheit, G., Weiss, S., & Eikmeyer, H.-J., Cognitive Linguistics: Theories, Models, and Methods (in German), UTB, 2010.
  21. Steels, L., Grounding symbols through evolutionary language games. In: Cangelosi, A. & Parisi, D. (Eds.) Simulating the Evolution of Language, Springer, 2001.
  22. Wachsmuth, I. & Knoblich, G. (Eds.), Modeling Communication with Robots and Virtual Humans, Springer, 2008.
  23. Kutas, M., & Hillyard, S. A., Reading senseless sentences: Brain potentials reflect semantic incongruity. Science, 207:203-208, 1980. https://doi.org/10.1126/science.7350657
  24. Lewis, J.W., Cortical networks related to human use of tools, Neuroscientist, 12(3):211-231,2006. https://doi.org/10.1177/1073858406288327
  25. Nam, J.-S., Bergmann, K, Waltinger, U., Kopp, S., Wachsmuth, I., & Zhang, B.-T., Deciphering the communicative code in speech and gesture dialogues by autoencoding hypernetworks, Embodied & Situated Language Processing (ESLP 2011), pp.15, 2011.
  26. Yu, C., Schermerhorn, P. & Scheutz, M., Adaptive eye gaze patterns in interactions with human and artificial agents, ACM Transactions on Interactive Intelligent Systems (in press), 2011.
  27. Smith, L. & Yu, C., Infants rapidly learn word-referent mappings via cross-situational statistics, Cognition, 106: 1558-156, 2008. https://doi.org/10.1016/j.cognition.2007.06.010
  28. Frank, M. C., Goodman, N. D., Lai, P., & Tenenbaum, J. B., Informative communication in word production and word learning, Proc. of the 31st Annual Meeting of the Cognitive Science Society, 2009.
  29. Fei-Fei, L. & Li, L.-J. What, where and who? Telling the story of an image by activity classification, scene recognition and object categorization, Studies in Computational intelligence: Computer Vision, Vol. 285, Springer, 2010.
  30. Gupta, S. & Mooney, R., Using closed captions as supervision for video activity recognition, Proc. Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-2010), pp. 1083-1088, 2010.
  31. Everingham, M., Zisserman, A, Williams, C. K I., Gool, L. V., Allan, M., Bishop, C. M., Chapelle, O., Dalal, N., Deselaers, T., Dorko, G., et aI., The 2005 PASCAL Visual Object Classes Challenge, LNCS 3944, pp.117-176, 2006.
  32. Torralba, A., Russell, B.C., & Yuen, J., LabelMe: Online image annotation and applications, Proc. of the IEEE, 98(8):1467-1484, 2010. https://doi.org/10.1109/JPROC.2010.2050290
  33. Yuen, J., Russell, B., Ce Liu, & Torralba, A., LabelMe video: Building a video database with human annotations, IEEE 12th International Conference on Computer Vision, pp.1451-1458, 2009.
  34. 장병탁, SNU Videome Project: Human-level machine learning from videos (in Korean), 정보과학회지, 29(2): 17-31, 2011.
  35. Plunkett, K., Theories of early language acquisition, Trends in Cognitive Sciences, 1(4):146-153, 1997. https://doi.org/10.1016/S1364-6613(97)01039-5
  36. Zhang, B.-T., Hypernetworks: A molecular evolutionary architecture for cognitive learning and memory, IEEE Computational Intelligence Magazine, 3(3):49-63, 2008.
  37. Zhang, B.-T. & Kang, M.-G., Bayesian mixture modeling of joint vision-language concepts from videos, NIPS-2011 Workshop on Integrating Language and Vision, poster, 2011 .
  38. Zhang, B.-T., Lee, E.-S., Heo, M.-O., & Kang, M.-G., Modeling situated language learning in early childhood via hypernetworks, Embodied & Situated Language Processing (ESLP 2011), pp.48, 2011.
  39. Lee, C.-Y., Kim, E.-S., Kim, J.-S., & Zhang, B.-T., Interaction of language and vision memories in TV drama watching: An EEG study, Embodied & Situated Language Processing (ESLP 2011), pp.49, 2011 .
  40. Ha, J.-W. & Zhang, B.-T., Text-to-image generation based on crossmodal association with hierarchical hypergraphs, 2011 NIPS Workshop on Integrating Vision and Language, poster, 2011.