[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.9728/dcs.2017.18.1.93

LSTM RNN-based Korean Speech Recognition System Using CTC

Lee, Donghyun (Department of Computer Science and Engineering, Sogang University)
Lim, Minkyu (Department of Computer Science and Engineering, Sogang University)
Park, Hosung (Department of Computer Science and Engineering, Sogang University)
Kim, Ji-Hwan (Department of Computer Science and Engineering, Sogang University)

Publication Information

Journal of Digital Contents Society / v.18, no.1, 2017 , pp. 93-99 More about this Journal

Abstract

A hybrid approach using Long Short Term Memory (LSTM) Recurrent Neural Network (RNN) has showed great improvement in speech recognition accuracy. For training acoustic model based on hybrid approach, it requires forced alignment of HMM state sequence from Gaussian Mixture Model (GMM)-Hidden Markov Model (HMM). However, high computation time for training GMM-HMM is required. This paper proposes an end-to-end approach for LSTM RNN-based Korean speech recognition to improve learning speed. A Connectionist Temporal Classification (CTC) algorithm is proposed to implement this approach. The proposed method showed almost equal performance in recognition rate, while the learning speed is 1.27 times faster.

Keywords

Connectionist temporal classification; Long short term memory; Recurrent neural network; Acoustic model; Speech recognition;

Citations & Related Records

Times Cited By KSCI : 1 (Citation Analysis)

Reference
Cited By KSCI

1	A. Acero et al., "Live search for mobile: web services by voice on the cellphone," in Proceeding of the Interspeech, Brisbane, Australia, pp. 5256-5259, 2008.
2	J. Jiang et al., Automatic online evaluation of intelligent assistants, in Opportunities and Challenges for Next-Generation Applied Intelligence, Berlin, Germany: Springer, pp. 285-290, 2009.
3	S. Kim and J. Ahn, "Speech Recognition System in Car Noise Environment," The Journal of Digital Contents Society, Vol. 10, No. 1, pp. 121-127, Mar. 2009.
4	L. Rabiner and B. Juang, Fundamentals of Speech Recognition, 1st ed. Englewood Cliffs, NJ: Prentice Hall, 1993.
5	D. Su, X. Wu, and L. Xu, "GMM-HMM acoustic model training by a two level procedure with gaussian components determined by automatic model selection," in Proceeding of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Dallas: TX, pp. 4890-4893, 2010.
6	H. Hermansky, D. Ellis, and S. Sharma, "Tandem connectionist feature extraction for conventional HMM systems," in Proceeding of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Istanbul, Turkey, pp. 1635-1638, 2000.
7	T. Mikolov and G. Zweig, Context dependent recurrent neural network language model, Microsoft Research, Redmond: WA, Technical Report MSR-TR-2012-92, 2012.
8	A. Graves et al., "Hybrid speech recognition with deep bidirectional LSTM," in Proceeding of the IEEE Automatic Speech Recognition and Understanding Workshop, Olomouc, Czech Republic, pp. 273-278, 2013.
9	G. Hinton et al., "Deep Neural Networks for Acoustic Modeling in Speech Recognition," The IEEE Signal Processing Magazine, Vol. 29, No. 6, pp. 82-97, Oct.2012. DOI
10	L. Deng, G. Hinton, and B. Kingsbury, "New types of deep neural network learning for speech recognition for speech recognition and related applications: An overview," in Proceeding of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Vancouver, Canada, pp. 8599-8603, May. 2013.
11	A. Graves and N. Jaitly, "Towards end-to-end speech recognition with recurrent neural networks," in Proceeding of the 31st International Conference on Machine Learning, Beijing, China, pp. 1764-1772, 2014.
12	A. Graves, Supervised sequence labelling with recurrent neural networks, Ph.D. dissertation, Technische Universitat Munchen, Munchen, Germany, 2008.
13	A. Graves et. al, "Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks," in Proceeding of the 23rd International Conference on Machine Learning, Pittsburgh: PA, pp. 369-376, 2006.
14	S. Hochreiter and J. Schmidhuber, "Long Short-Term Memory," Neural Computation, Vol. 9, No. 8, pp. 1735-1780, Nov. 1997. DOI
15	Y. Rao, A. Senior, and H. Sak, "Flat start training of CD-CTC-sMBR LSTM RNN acoustic models," in Proceeding of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Shanghai, China, pp. 5405-5409, 2016.
16	H. Sak, A. Senior, and F. Beaufays, "Long short-term memory based recurrent neural network architectures for large vocabulary speech recognition," arXiv:1402.1128, pp. 1-5, Feb. 2014.
17	M. Liwicki, A. Graves, H. Bunke and J. Schmidhuber, "A novel approach to on-line handwriting recognition based on bidirectional long short-term memory networks," in Proceeding of the 9th International Conference on Document Analysis and Recognition, Curitiba, Brazil, pp. 367-371, 2017.
18	Y. Miao et al., "EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding," in Proceeding of the IEEE Automatic Speech Recognition and Understanding Workshop, Scottsdale: AZ, pp. 167-174, 2015.

8	(2018) Journal of Digital Contents Society A Study on Factors Affecting the Investment Intention of Information Security / 19 (8) , 1515
1	(2019) Journal of Digital Contents Society Forecast of Bee Swarming using Data Fusion and LSTM / 20 (1) , 1
12	(2017) 디지털콘텐츠학회 논문지 Smart Beehive using Data Fused Preprocessing and Artificial Neural networks / 20 (12) , 2321
4	(2017) 정보처리학회논문지. 소프트웨어 및 데이터 공학 문장에 포함된 외국어의 자연스러운 발음 표현을 위한 LSTM 방법 / 8 (4) , 163

KSCI

LSTM RNN-based Korean Speech Recognition System Using CTC CTC를 이용한 LSTM RNN 기반 한국어 음성인식 시스템

LSTM RNN-based Korean Speech Recognition System Using CTC