DOI QR코드

DOI QR Code

Comparative Analysis of Speech Recognition Open API Error Rate

  • Kim, Juyoung (Graduate School of Smart Convergence Kwangwoon University) ;
  • Yun, Dai Yeol (Department of Plasma Bioscience and Display, KwangWoon University) ;
  • Kwon, Oh Seok (Department of Plasma Bioscience and Display, KwangWoon University Graduate School) ;
  • Moon, Seok-Jae (Department of Computer Science, Kwangwoon University) ;
  • Hwang, Chi-gon (Department of Computer Engineering, Institute of Information Technology, Kwangwoon University)
  • Received : 2021.04.23
  • Accepted : 2021.05.04
  • Published : 2021.06.30

Abstract

Speech recognition technology refers to a technology in which a computer interprets the speech language spoken by a person and converts the contents into text data. This technology has recently been combined with artificial intelligence and has been used in various fields such as smartphones, set-top boxes, and smart TVs. Examples include Google Assistant, Google Home, Samsung's Bixby, Apple's Siri and SK's NUGU. Google and Daum Kakao offer free open APIs for speech recognition technologies. This paper selects three APIs that are free to use by ordinary users, and compares each recognition rate according to the three types. First, the recognition rate of "numbers" and secondly, the recognition rate of "Ga Na Da Hangul" are conducted, and finally, the experiment is conducted with the complete sentence that the author uses the most. All experiments use real voice as input through a computer microphone. Through the three experiments and results, we hope that the general public will be able to identify differences in recognition rates according to the applications currently available, helping to select APIs suitable for specific application purposes.

Keywords

References

  1. Park, Hyeon-Sin, et al. "Trend of state-of-the-art machine learning-based speech recognition technology." The Magazine of the IEIE 41.3 (2014): 18-27. https://doi.org/10.5573/ieie.2014.51.6.018
  2. Chang Soo Ko. "The Prospects of Natural Language Process." Urimal, 31.0 (2012): 5-22.
  3. Qian, Yao, et al. "On the training aspects of deep neural network (DNN) for parametric TTS synthesis." 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2014.
  4. Kalchbrenner, Nal, Edward Grefenstette, and Phil Blunsom. "A convolutional neural network for modelling sentences." arXiv preprint arXiv:1404.2188 (2014).
  5. Hochreiter, Sepp, and Jurgen Schmidhuber. "Long short-term memory." Neural computation 9.8 (1997): 1735-1780. https://doi.org/10.1162/neco.1997.9.8.1735
  6. Chetlur, Sharan, et al. "cudnn: Efficient primitives for deep learning." arXiv preprint arXiv:1410.0759 (2014).
  7. Seung Joo Choi, and Jong-Bae Kim. "Comparison Analysis of Speech Recognition Open APIs' Accuracy." Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology 7.8 (2017): 411-418. https://doi.org/10.14257/ajmahs.2017.08.76
  8. B. Sklar, Digital Communications, Prentice Hall, pp. 187, 1998
  9. Young, Steve J., and Sj Young. "The HTK hidden Markov model toolkit: Design and philosophy." (1993): 69.
  10. Muller, Meinard. "Dynamic time warping." Information retrieval for music and motion (2007): 69-84.
  11. Schuster, Mike, and Kuldip K. Paliwal. "Bidirectional recurrent neural networks." IEEE transactions on Signal Processing 45.11 (1997): 2673-2681. https://doi.org/10.1109/78.650093
  12. Liang, Jinling, et al. "Robust synchronization of an array of coupled stochastic discrete-time delayed neural networks." IEEE Transactions on Neural Networks 19.11 (2008): 1910-1921. https://doi.org/10.1109/TNN.2008.2003250
  13. Corkrey, Ross, and Lynne Parkinson. "Interactive voice response: review of studies 1989-2000." Behavior Research Methods, Instruments, & Computers 34.3 (2002): 342-353. https://doi.org/10.3758/BF03195462
  14. Higuchi, Takuya, et al. "Robust MVDR beamforming using time-frequency masks for online/offline ASR in noise." 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2016.