[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.9708/jksci.2021.26.10.001

Error Correction for Korean Speech Recognition using a LSTM-based Sequence-to-Sequence Model

Jin, Hye-won (School of Software, Soongsil University)
Lee, A-Hyeon (School of Software, Soongsil University)
Chae, Ye-Jin (School of Software, Soongsil University)
Park, Su-Hyun (School of Software, Soongsil University)
Kang, Yu-Jin (School of Software, Soongsil University)
Lee, Soowon (School of Software, Soongsil University)

Publication Information

Journal of the Korea Society of Computer and Information / v.26, no.10, 2021 , pp. 1-7 More about this Journal

Abstract

Recently, since most of the research on correcting speech recognition errors is based on English, there is not enough research on Korean speech recognition. Compared to English speech recognition, however, Korean speech recognition has many errors due to the linguistic characteristics of Korean language, such as Korean Fortis and Korean Liaison, thus research on Korean speech recognition is needed. Furthermore, earlier works primarily focused on editorial distance algorithms and syllable restoration rules, making it difficult to correct the error types of Korean Fortis and Korean Liaison. In this paper, we propose a context-sensitive post-processing model of speech recognition using a LSTM-based sequence-to-sequence model and Bahdanau attention mechanism to correct Korean speech recognition errors caused by the pronunciation. Experiments showed that by using the model, the speech recognition performance was improved from 64% to 77% for Fortis, 74% to 90% for Liaison, and from 69% to 84% for average recognition than before. Based on the results, it seems possible to apply the proposed model to real-world applications based on speech recognition.

Keywords

Speech Recognition; Error Correction; Korean; LSTM; Sequence-to-Sequence; Bahdanau Attention;

Citations & Related Records

Reference

1	K. Nam, "A Study on Processing of Speech Recognition Korean Words," JCCT, vol. 5, no. 4, pp. 407-412, Nov. 2019, doi:10.17703/JCCT.2019.5.4.407. DOI
2	Ilya Sutskever, Oriol Vinyals, Quoc V. Le, "Sequence to Sequence Learning with Neural Networks", Neural Information Processing Systems, pp.3104-3112, 2014, arXiv:1409.3215.
3	Eiichi Tanaka and Tamotsu Kasai, "Synchronization and Substitution Error-correcting codes for the Levenshtein Metric", IEEE Trans. Information Theory, Vol.IT-22, No.2, pp.156-176, Mar. 1976, doi: 10.1109/TIT.1976.1055532. DOI
4	Seung-joo Choi , Jong-bae Kim, "Comparison Analysis of Speech Recognition Open APIs' Accuracy", Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology, Vol.7, No.8, pp. 411-418, Jul 2017, doi:10.35873/ajmahs.2017.7.8.038. DOI
5	Kyubyong Park, KSS Dataset: Korean Single Speaker Speech Dataset, https://kaggle.com/bryanpark/korean-single-speaker-speech-dataset.
6	Hyun-Woo Oh, Koen-Nyeong Lee, Dong-Suk Yook, "Performance Comparison Of Open Apis For Speech Recognition", The Journal Of The Acoustical Society Of Korea, 2019.
7	Dong-Hee Lim, Seung-Shick Kang, Du-seong Chang, "Word spacing Error Correction for the Postprocessing of Speech Recognition", Proceedings of the Korean Information Science Society Conference, vol.33, no. 1(B), pp.25-27, June. 2006.
8	D. Bahdanau, K. Cho, Y. Bengio, "Neural machine translation by jointly learning to align and translate", Proc. Int. Conf. Learn. Representations, 2014, 2014. arXiv:1409.0473.
9	Sang-Hyun Seo, Jae-Hong Kim, Hae-Jin Kim, Mi-Jin Kim, "Post-Processing of Voice Recognition Using Phonologic Rules and Morphologic analysis", Annual Conference on Human and Language Technology, pp.495-499, Oct. 1997.
10	So-Yeon Min, Kwang-Hyong Lee, Dong-Seon Lee, Dong-Yeop Ryu, "A Study on Quantitative Evaluation Method for STT Engine Accuracy based on Korean Characteristics", Journal of Korea Academia-Industrial cooperation Society, Vol.21, No 7, pp.699-707, Jul 2020, doi:10.5762/KAIS.2020.21.7.699. DOI
11	ETRI, ETRI Speech Recognition, https://aiopen.etri.re.kr.
12	Ye-jin Kim, Young-min Park, Sang-woo Kang, Sang-keon Jung, Cheong-jae Lee, Jung-yun Seo, "Post-Processing of Speech Recognition Using Phonological Variables and Improved Edit-distance", Annual Conference on Human and Language Technology, pp.9 - 12, Oct. 2014.
13	Seung-Hyeon Park, "Correction of Korean Spelling Errors Using Cosine Similarity Algorithm", Graduate School of Industrial Technology Convergence, Chosun University, 2016.