Browse > Article
http://dx.doi.org/10.13064/KSSS.2017.9.2.095

DNN-based acoustic modeling for speech recognition of native and foreign speakers  

Kang, Byung Ok (한국전자통신연구원 음성지능연구그룹)
Kwon, Oh-Wook (충북대학교)
Publication Information
Phonetics and Speech Sciences / v.9, no.2, 2017 , pp. 95-101 More about this Journal
Abstract
This paper proposes a new method to train Deep Neural Network (DNN)-based acoustic models for speech recognition of native and foreign speakers. The proposed method consists of determining multi-set state clusters with various acoustic properties, training a DNN-based acoustic model, and recognizing speech based on the model. In the proposed method, hidden nodes of DNN are shared, but output nodes are separated to accommodate different acoustic properties for native and foreign speech. In an English speech recognition task for speakers of Korean and English respectively, the proposed method is shown to slightly improve recognition accuracy compared to the conventional multi-condition training method.
Keywords
automatic speech recognition; Deep Neural Network (DNN); acoustic model;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 Abdel-Hamid, O., Mohamed, A., Jiang, H., Deng, L., Penn, G., & Yu, D. (2014). Convolutional neural networks for speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(10), 1533-1545.   DOI
2 Sak, H., Senior, A., & Beaufays, F. (2014). Long short-term recurrent neural network architectures for large scale acoustic modeling. Proceedings of INTERSPEECH-2014 (pp. 338-342). 2014.
3 Young, S. J., Odell, J. J., & Woodland, P. C. (1994). Tree-based state tying for high accuracy acoustic modelling. Proceedings of ARPA Human Language Technology Workshop (pp. 307-312). 1994.
4 Chen, X., & Cheng, J. (2012). Acoustic modeling for native and non-native Mandarin speech recognition. Proceedings of International Symposium on Chinese Spoken Language Processing. 2012.
5 Lee, S., Kang, B., Chung, H., & Park, J. (2015). A useful feature-engineering approach for a LVCSR System based on CD-DNN-HMM Algorithm. Proceedings of the 2015 European Signal Processing Conference (pp. 1436-1440). September, 2015.
6 Kang, B., Jung, H., & Kwon, O. (2013). Noise robust spontaneous speech recognition using multi-space GMM. Proceedings of INTERNOISE-2013. Innsbruck, Austria. September, 2013.
7 Kang, B., & Kwon. O. (2016). Combining multiple acoustic models in GMM spaces for robust speech recognition. IEICE Transactions on Information and Systems, 99(3), 724-730.
8 Lee, S., Kang, B., Chung, H., & Lee, Y. (2014). Intra- and inter-frame features for automatic speech recognition. ETRI Journal, 36(3), 514-517.   DOI
9 Mohamed, A. R., Hinton, G., & Penn, G. (2012). Understanding how deep belief networks perform acoustic modelling. Proceedings of the International Conference on Acoustics, Speech and Signal Processing (pp. 4274-4276). 2012.
10 Carnegie Mellon University. Carnegie Mellon Pronunciation Dictionary. Retrieved from http://www.speech.cs.cmu.edu/cgi-bin/ cmudict on March 2, 2015.
11 Kwon, O., Lee, K., Roh, Y., Huang, J., Choi, S., Kim, Y., Jeon, H.,Oh, Y., Lee, Y., Kang, B., Chung, E., Park, J., & Lee, Y. (2015). GenieTutor: A computer assisted second language learning system based on spoken language understanding. Proceedings of the International Workshop on Spoken Dialog System (IWSDS 2015). Busan, South Korea. January, 2015.
12 Paul, D. B., & Baker, J. M. (1992). The design for the Wall Street Journal-based CSR corpus. Proceedings of ICSLP-1992 (pp. 899-902). October, 1992.
13 Chung, H., Park, J., Jeon, H., & Lee, Y. (2009). Fast speech recognition for voice destination entry in a car navigation system. Proceedings of INTERSPEECH-2009 (pp. 975-978). Brighton, UK. 2009.