DOI QR코드

DOI QR Code

A review of speech perception: The first step for convergence on speech engineering

말소리지각에 대한 종설: 음성공학과의 융복합을 위한 첫 단계

  • Received : 2017.10.30
  • Accepted : 2017.12.20
  • Published : 2017.12.28

Abstract

People observe a lot of events in our environment and we do not have any difficulty to perceive events including speech perception. Like perception of biological motion, two main theorists have debated on speech perception. The purpose of this review article is to briefly describe speech perception and compare these two theories of speech perception. Motor theorists claim that speech perception is special to human because we both produce and perceive articulatory events that are processed by innate neuromotor commands. However, direct perception theorists claim that speech perception is not different from nonspeech perception because we only need to detect information directly like all other kinds of event. It is important to grasp the fundamental idea of how human perceive articulatory events for the convergence on speech engineering. Thus, this basic review of speech perception is expected to be able to used for AI, voice recognition technology, speech recognition system, etc.

사람들은 항상 사건들과 접하고 말소리 지각과 같은 사건을 지각하는데 별 어려움이 없다. 생물학적 운동의 지각과 마찬가지로, 말소리 지각에 대한 두 이론이 논쟁해 왔다. 이 논문의 목적은 말소리 지각에 대해 설명하고 말소리 지각에 대한 운동이론과 직접지각 이론을 비교하는 것이다. 운동이론학자들은 인간은 운동신경의 명령에 의해 말소리를 지각하고 생성해 내기 때문에 인간은 말소리 지각에 있어서 특별한 감각을 가지고 있다고 주장해 왔다. 하지만, 직접지각 이론학자들은 말소리 지각은 여느 다른 소리를 지각하는 것과 다르지 않다고 제안했다. 왜냐하면, 말소리를 지각하는 것은 다른 모든 사건을 지각하는 것과 마찬가지로 필요한 정보를 직접 탐지하면 되기 때문이다. 음성공학과의 융합에 있어서 이러한 인간의 기본적인 말소리 지각 능력을 먼저 이해하는 것이 중요하다. 따라서 이러한 말소리 지각에 대한 기본적인 이해는 인공 지능, 음성 인식 기술, 음성 인식 시스템 등에 사용될 수 있을 것으로 기대된다.

Keywords

References

  1. Y. Lee, "A review of event perception: The first step for convergence on robotics", Journal of Digital Convergence, Vol. 13, No. 4, pp. 357-368, 2015. https://doi.org/10.14400/JDC.2015.13.4.357
  2. A. Yim, D. Kim, and S. Rhee, "Korean ESL learners' perception of English segments: a cochlear implant simulation study", Phonetics and Speech Sciences, Vol. 6, No. 3, pp. 91-99, 2014. https://doi.org/10.13064/KSSS.2014.6.3.091
  3. R. L. Diehl, and K. R. Kluender, "On the objects of speech perception", Ecological Psychology, Vol. 1, pp. 121-144, 1989. https://doi.org/10.1207/s15326969eco0102_2
  4. C. A. Fowler, "An event approach to the study of speech perception from a direct-realist perspective", Journal of Phonetics, Vol. 14, pp. 3-28, 1986.
  5. J. A. S. Helpso, B. Tuller, E. Vatikiotis-Bateson, and C. A. Fowler, "Functionally specific articulatory cooperation following jaw perturbations during speech: Evidence for coordinative structures", J. of Experimental Psychology: Human Perception and Performance, Vol. 10, pp. 812-832, 1984. https://doi.org/10.1037/0096-1523.10.6.812
  6. P. B. Denes, and E. N. Pinson, "The speech chain: The physics and biology of spoken language", New York: W. H. Freeman and Company, 1996.
  7. R. E. Remez, P. E. Rubin, D. B. Pisoni, and T. D. Carrell, "Speech perception without traditional speech cues", Science, Vol. 212, pp. 947-950, 1981. https://doi.org/10.1126/science.7233191
  8. G. Johansson, "Visual perception of biological motion and a model for its analysis", Perception & Psychophysics, Vol. 14, pp. 201-211, 1973. https://doi.org/10.3758/BF03212378
  9. C. A. Fowler, and B. Rakerd, "Work group on speech and sign language", In W. H. Warren & R. E. Shaw (Eds.), Persistence and Change, Hillsdale, NJ: Erlbaum, pp.283-298, 1985.
  10. A. M. Liberman, and I. G. Mattingly, "The motor theory of speech perception revised", Cognition, Vol. 21, pp. 1-36, 1985. https://doi.org/10.1016/0010-0277(85)90021-6
  11. P. Eiman, E. R. Siqueland, P. Jusczyk, and J. Vigorito, "Speech perception in infants", Science, Vol. 171, pp. 125-138, 1985.
  12. R. L. Diehl, and M. A. Walsh, "An auditory basis for the stimulus-length effect in the perception of stops and glides", J. of Acoustical Society of America, Vol. 85, pp. 2154-2164, 1989. https://doi.org/10.1121/1.397864
  13. A. M. Liberman, K. S. Harris, J. Kinney, and H. Lane, "The discrimination of relative onset-time of the components of certain speech and nonspeech patterns", J. of Experimental Psychology, Vol. 61, pp. 379-388.
  14. D. B. Pisoni, T. D. Carrell, and S. J. Gans, "Perception of the duration of rapid spectrum changes in speech and nonspeech signals", Perception & Psychophysics, Vol. 34, pp. 314-322, 1983. https://doi.org/10.3758/BF03203043
  15. J. L. Miller, and A. M. Liberman, "Some effects of later-occurring information on the perception of stop consonant and semivowel", Perception & Psychophysics, Vol. 25, pp. 457-465, 1979. https://doi.org/10.3758/BF03213823
  16. P. Eimas, "The equivalence of cues in the perception of speech by infants", Infant Behavior and Development, Vol. 8, pp. 125-138, 1985. https://doi.org/10.1016/S0163-6383(85)80001-1
  17. C. T. Best, M. Studdert-Kennedy, S. Manuel, and J. Rubin-Spitz, "Discovering phonetic coherence in acoustic patterns", Perception & Psychophysics, Vol. 45, pp. 237-250, 1989. https://doi.org/10.3758/BF03210703
  18. C. A. Fowler, and B. Galantucci, "The relation of speech perceptio nand speech production", In D. B. Pisoni & R. E. Remez (Eds.), The Handbook of Speech Perception, Oxford, UK: Blackwell, pp. 633-652, 2005.
  19. J. J. Gibson, "A theory of direct visual perception" In J. Royce & W. Rozeboom (Eds.), The Psychology of Knowing, New York and London: Gordon and Breach, pp. 215-227, 1972.
  20. C. A. Fowler, "Sound-producing sources as objects of perception: Rate normalization and nonspeech perception", J. of Acoustical Society of America, Vol. 88, pp. 1236-1249, 1990. https://doi.org/10.1121/1.399701
  21. E. Yoon, "The effects of perceptual training on speech production: Focusing on Korean vowels", Studies in Foreign Language Education, Vol. 22, No. 2, pp. 1-27, 2013.
  22. J. Hwang, "Voice recognition performance improvement using the convergence of Bayesian method and selective speech feature extraction", J. of the Korea Convergence Society, Vol. 7, No. 6, pp. 7-11, 2016. https://doi.org/10.15207/JKCS.2016.7.6.007
  23. J. Lee, J. Lee, and J. Lee, "Speech recognition of Korean phonemes 'ㅅ','ㅈ','ㅊ' based on sign distribution volatility", J. of KIISE: Computing Practices and Letters, Vol. 19, No. 7, pp. 377-382, 2013.
  24. S. Nam, E. Jean, and I. Park, "A real-time embedded speech recognition system", The Institute of Electronics Engineers of Korea-Computer and Information, Vol. 40, No. 1, pp. 74-81, 2003.
  25. Y. Lee, and S. Kim, "Study on the situational satisfaction survey of smart phone based on voice recognition technology", J. of Digital Convergence, Vol. 15, No. 8, pp. 351-357, 2017. https://doi.org/10.14400/JDC.2017.15.4.351
  26. H. Yuk, and B. Cho, "A study on the humanistic measure about cultural changes of voice recognition technology", J. of Digital Convergence, Vol. 13, No. 8, pp. 21-31, 2015. https://doi.org/10.14400/JDC.2015.13.8.21
  27. M. Chung, S. Park, B. Chae, and J. Lee, "Analyses of major research trends in artificial intelligence through analysis of thesis data", J. of Digital Convergence, Vol. 15, No. 5, pp. 225-233, 2017. https://doi.org/10.14400/JDC.2017.15.2.225