LSTM RNN-based Korean Speech Recognition System Using CTC |
Lee, Donghyun
(Department of Computer Science and Engineering, Sogang University)
Lim, Minkyu (Department of Computer Science and Engineering, Sogang University) Park, Hosung (Department of Computer Science and Engineering, Sogang University) Kim, Ji-Hwan (Department of Computer Science and Engineering, Sogang University) |
1 | A. Acero et al., "Live search for mobile: web services by voice on the cellphone," in Proceeding of the Interspeech, Brisbane, Australia, pp. 5256-5259, 2008. |
2 | J. Jiang et al., Automatic online evaluation of intelligent assistants, in Opportunities and Challenges for Next-Generation Applied Intelligence, Berlin, Germany: Springer, pp. 285-290, 2009. |
3 | S. Kim and J. Ahn, "Speech Recognition System in Car Noise Environment," The Journal of Digital Contents Society, Vol. 10, No. 1, pp. 121-127, Mar. 2009. |
4 | L. Rabiner and B. Juang, Fundamentals of Speech Recognition, 1st ed. Englewood Cliffs, NJ: Prentice Hall, 1993. |
5 | D. Su, X. Wu, and L. Xu, "GMM-HMM acoustic model training by a two level procedure with gaussian components determined by automatic model selection," in Proceeding of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Dallas: TX, pp. 4890-4893, 2010. |
6 | H. Hermansky, D. Ellis, and S. Sharma, "Tandem connectionist feature extraction for conventional HMM systems," in Proceeding of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Istanbul, Turkey, pp. 1635-1638, 2000. |
7 | T. Mikolov and G. Zweig, Context dependent recurrent neural network language model, Microsoft Research, Redmond: WA, Technical Report MSR-TR-2012-92, 2012. |
8 | A. Graves et al., "Hybrid speech recognition with deep bidirectional LSTM," in Proceeding of the IEEE Automatic Speech Recognition and Understanding Workshop, Olomouc, Czech Republic, pp. 273-278, 2013. |
9 | G. Hinton et al., "Deep Neural Networks for Acoustic Modeling in Speech Recognition," The IEEE Signal Processing Magazine, Vol. 29, No. 6, pp. 82-97, Oct.2012. DOI |
10 | L. Deng, G. Hinton, and B. Kingsbury, "New types of deep neural network learning for speech recognition for speech recognition and related applications: An overview," in Proceeding of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Vancouver, Canada, pp. 8599-8603, May. 2013. |
11 | A. Graves and N. Jaitly, "Towards end-to-end speech recognition with recurrent neural networks," in Proceeding of the 31st International Conference on Machine Learning, Beijing, China, pp. 1764-1772, 2014. |
12 | A. Graves, Supervised sequence labelling with recurrent neural networks, Ph.D. dissertation, Technische Universitat Munchen, Munchen, Germany, 2008. |
13 | A. Graves et. al, "Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks," in Proceeding of the 23rd International Conference on Machine Learning, Pittsburgh: PA, pp. 369-376, 2006. |
14 | S. Hochreiter and J. Schmidhuber, "Long Short-Term Memory," Neural Computation, Vol. 9, No. 8, pp. 1735-1780, Nov. 1997. DOI |
15 | Y. Rao, A. Senior, and H. Sak, "Flat start training of CD-CTC-sMBR LSTM RNN acoustic models," in Proceeding of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Shanghai, China, pp. 5405-5409, 2016. |
16 | H. Sak, A. Senior, and F. Beaufays, "Long short-term memory based recurrent neural network architectures for large vocabulary speech recognition," arXiv:1402.1128, pp. 1-5, Feb. 2014. |
17 | M. Liwicki, A. Graves, H. Bunke and J. Schmidhuber, "A novel approach to on-line handwriting recognition based on bidirectional long short-term memory networks," in Proceeding of the 9th International Conference on Document Analysis and Recognition, Curitiba, Brazil, pp. 367-371, 2017. |
18 | Y. Miao et al., "EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding," in Proceeding of the IEEE Automatic Speech Recognition and Understanding Workshop, Scottsdale: AZ, pp. 167-174, 2015. |