DNN-based acoustic modeling for speech recognition of native and foreign speakers

Kang, Byung Ok;Kwon, Oh-Wook;

doi:10.13064/KSSS.2017.9.2.095

Phonetics and Speech Sciences (말소리와 음성과학)

Volume 9 Issue 2
/
Pages.95-101
/
2017
/
2005-8063(pISSN)
/
2586-5854(eISSN)

Korean Society of Speech Sciences (한국음성학회)

DOI QR Code

DNN-based acoustic modeling for speech recognition of native and foreign speakers

원어민 및 외국인 화자의 음성인식을 위한 심층 신경망 기반 음향모델링

강병옥 (한국전자통신연구원 음성지능연구그룹) ;
권오욱 (충북대학교)

Received : 2017.05.05
Accepted : 2017.06.13
Published : 2017.06.30

https://doi.org/10.13064/KSSS.2017.9.2.095 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

This paper proposes a new method to train Deep Neural Network (DNN)-based acoustic models for speech recognition of native and foreign speakers. The proposed method consists of determining multi-set state clusters with various acoustic properties, training a DNN-based acoustic model, and recognizing speech based on the model. In the proposed method, hidden nodes of DNN are shared, but output nodes are separated to accommodate different acoustic properties for native and foreign speech. In an English speech recognition task for speakers of Korean and English respectively, the proposed method is shown to slightly improve recognition accuracy compared to the conventional multi-condition training method.

Keywords

References

Abdel-Hamid, O., Mohamed, A., Jiang, H., Deng, L., Penn, G., & Yu, D. (2014). Convolutional neural networks for speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(10), 1533-1545. https://doi.org/10.1109/TASLP.2014.2339736
Sak, H., Senior, A., & Beaufays, F. (2014). Long short-term recurrent neural network architectures for large scale acoustic modeling. Proceedings of INTERSPEECH-2014 (pp. 338-342). 2014.
Young, S. J., Odell, J. J., & Woodland, P. C. (1994). Tree-based state tying for high accuracy acoustic modelling. Proceedings of ARPA Human Language Technology Workshop (pp. 307-312). 1994.
Chen, X., & Cheng, J. (2012). Acoustic modeling for native and non-native Mandarin speech recognition. Proceedings of International Symposium on Chinese Spoken Language Processing. 2012.
Kang, B., Jung, H., & Kwon, O. (2013). Noise robust spontaneous speech recognition using multi-space GMM. Proceedings of INTERNOISE-2013. Innsbruck, Austria. September, 2013.
Kang, B., & Kwon. O. (2016). Combining multiple acoustic models in GMM spaces for robust speech recognition. IEICE Transactions on Information and Systems, 99(3), 724-730.
Lee, S., Kang, B., Chung, H., & Lee, Y. (2014). Intra- and inter-frame features for automatic speech recognition. ETRI Journal, 36(3), 514-517. https://doi.org/10.4218/etrij.14.0213.0181
Lee, S., Kang, B., Chung, H., & Park, J. (2015). A useful feature-engineering approach for a LVCSR System based on CD-DNN-HMM Algorithm. Proceedings of the 2015 European Signal Processing Conference (pp. 1436-1440). September, 2015.
Mohamed, A. R., Hinton, G., & Penn, G. (2012). Understanding how deep belief networks perform acoustic modelling. Proceedings of the International Conference on Acoustics, Speech and Signal Processing (pp. 4274-4276). 2012.
Carnegie Mellon University. Carnegie Mellon Pronunciation Dictionary. Retrieved from http://www.speech.cs.cmu.edu/cgi-bin/ cmudict on March 2, 2015.
Kwon, O., Lee, K., Roh, Y., Huang, J., Choi, S., Kim, Y., Jeon, H.,Oh, Y., Lee, Y., Kang, B., Chung, E., Park, J., & Lee, Y. (2015). GenieTutor: A computer assisted second language learning system based on spoken language understanding. Proceedings of the International Workshop on Spoken Dialog System (IWSDS 2015). Busan, South Korea. January, 2015.
Chung, H., Park, J., Jeon, H., & Lee, Y. (2009). Fast speech recognition for voice destination entry in a car navigation system. Proceedings of INTERSPEECH-2009 (pp. 975-978). Brighton, UK. 2009.
Paul, D. B., & Baker, J. M. (1992). The design for the Wall Street Journal-based CSR corpus. Proceedings of ICSLP-1992 (pp. 899-902). October, 1992.

Phonetics and Speech Sciences (말소리와 음성과학)

DNN-based acoustic modeling for speech recognition of native and foreign speakers

원어민 및 외국인 화자의 음성인식을 위한 심층 신경망 기반 음향모델링

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)