[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.7776/ASK.2019.38.6.703

A study on recognition improvement of velopharyngeal insufficiency patient's speech using various types of deep neural network

Kim, Min-seok (Department of Computer Science and Engineering, Incheon National University)
Jung, Jae-hee (Department of Computer Science and Engineering, Incheon National University)
Jung, Bo-kyung (Department of Computer Science and Engineering, Incheon National University)
Yoon, Ki-mu (Department of Computer Science and Engineering, Incheon National University)
Bae, Ara (Department of Computer Science and Engineering, Incheon National University)
Kim, Wooil (Department of Computer Science and Engineering, Incheon National University)

Publication Information

The Journal of the Acoustical Society of Korea / v.38, no.6, 2019 , pp. 703-709 More about this Journal

Abstract

This paper proposes speech recognition systems employing Convolutional Neural Network (CNN) and Long Short Term Memory (LSTM) structures combined with Hidden Markov Moldel (HMM) to effectively recognize the speech of VeloPharyngeal Insufficiency (VPI) patients, and compares the recognition performance of the systems to the Gaussian Mixture Model (GMM-HMM) and fully-connected Deep Neural Network (DNNHMM) based speech recognition systems. In this paper, the initial model is trained using normal speakers' speech and simulated VPI speech is used for generating a prior model for speaker adaptation. For VPI speaker adaptation, selected layers are trained in the CNN-HMM based model, and dropout regulatory technique is applied in the LSTM-HMM based model, showing 3.68 % improvement in recognition accuracy. The experimental results demonstrate that the proposed LSTM-HMM-based speech recognition system is effective for VPI speech with small-sized speech data, compared to conventional GMM-HMM and fully-connected DNN-HMM system.

Keywords

VeloPharyngeal Insufficiency (VPI); Speech recognition; Convolutional neural network; Long short term memory; Deep neural network;

Citations & Related Records

Times Cited By KSCI : 2 (Citation Analysis)

Reference
Cited By KSCI

1	S. G. Fletcher, "Theory and instrumentation for quantitative measurement of nasality," J. Cleft Palate. 7, 601-609 (1970).
2	J. -E. Lee, W. -E. Kim, K. H. Kim, M. -W. Sung, and T. -K. Kwon,"Research on construction of the Korean speech corpus in patient with velopharyngeal insufficiency" (in Korean), JKORL. 55, 498-507 (2012).
3	M. Y. Sung, H. Kim, T. -K. Kwon, and M. -W. Sung, "Analysis on vowel and consonants sounds of patient's speech with velopharyngeal insufficiency (VPI) and simulated speech" (in Korean), JKIICE. 18, 1740-1748 (2014).
4	M. Y. Sung, T. -K. Kwon, M. -W. Sung, and W. Kim, "Effective recognition of velopharyngeal insufficiency (VPI) patient's speech using simulated speech model" (in Korean), JKIICE. 19, 1243-1250 (2015).
5	K. Yoon and W. Kim, "Effective recognition of velopharyngeal insufficiency (VPI) patient's speech using DNN-HMM-based system" (in Korean), JKIICE. 23, 33-38 (2019).
6	HTK Speech Recognition Toolkit, http://htk.eng.cam.ac.uk/, (Last viewed March 11, 2015).
7	ETSI ES 201 108, Standard Document, v1.1.2.(2000-04)., 2000.
8	J. L. Gauvain and C. H. Lee, "Maximum a posteriori estimation for multivariate Gaussian mixture observations of markov chains," IEEE Trans. on Speech and Audio Proc. 2, 291-298 (1994). DOI
9	C. J. Leggetter and P. C. Woodland, "Maximum likelihood linear regression for speaker adaptation of continuous density HMMs," Computer Speech and Language, 9, 171-185 (1995). DOI
10	J. -T. Huang, J. Li, D. Yu, L. Deng, and Y. Gong, "Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers," Proc. IEEE ICASSP. 7304-7308 (2013).
11	W. Hu, Y. Qian, and F. K. Soong, "A DNN-based acoustic modeling of tonal language and its application to Mandarin pronunciation training," Proc. IEEE ICASSP. 3206-3210 (2014).
12	S. Park, Y. Jeong, and H. S. Kim, "Multiresolution CNN for reverberant speech recognition," Proc. 20th Conf. O-COCOSDA. 1-4 (2017).
13	A. Senior, H. Sak, and I. Shafran, "Context dependent phone models for LSTM RNN acoustic modeling," Proc. IEEE ICASSP. 4585-4589 (2015).
14	S. J. Rennie, V. Goel, and S. Thomas, "Annealed dropout training of deep networks," Proc. IEEE SLT. 159-164 (2014).
15	S. Hochreiter and J. Schmichuber, "Long short-term memory," Neural Computation, 9, 1735-1780 (1997). DOI

KSCI

A study on recognition improvement of velopharyngeal insufficiency patient's speech using various types of deep neural network 심층신경망 구조에 따른 구개인두부전증 환자 음성 인식 향상 연구

A study on recognition improvement of velopharyngeal insufficiency patient's speech using various types of deep neural network