Browse > Article
http://dx.doi.org/10.7776/ASK.2019.38.6.703

A study on recognition improvement of velopharyngeal insufficiency patient's speech using various types of deep neural network  

Kim, Min-seok (Department of Computer Science and Engineering, Incheon National University)
Jung, Jae-hee (Department of Computer Science and Engineering, Incheon National University)
Jung, Bo-kyung (Department of Computer Science and Engineering, Incheon National University)
Yoon, Ki-mu (Department of Computer Science and Engineering, Incheon National University)
Bae, Ara (Department of Computer Science and Engineering, Incheon National University)
Kim, Wooil (Department of Computer Science and Engineering, Incheon National University)
Abstract
This paper proposes speech recognition systems employing Convolutional Neural Network (CNN) and Long Short Term Memory (LSTM) structures combined with Hidden Markov Moldel (HMM) to effectively recognize the speech of VeloPharyngeal Insufficiency (VPI) patients, and compares the recognition performance of the systems to the Gaussian Mixture Model (GMM-HMM) and fully-connected Deep Neural Network (DNNHMM) based speech recognition systems. In this paper, the initial model is trained using normal speakers' speech and simulated VPI speech is used for generating a prior model for speaker adaptation. For VPI speaker adaptation, selected layers are trained in the CNN-HMM based model, and dropout regulatory technique is applied in the LSTM-HMM based model, showing 3.68 % improvement in recognition accuracy. The experimental results demonstrate that the proposed LSTM-HMM-based speech recognition system is effective for VPI speech with small-sized speech data, compared to conventional GMM-HMM and fully-connected DNN-HMM system.
Keywords
VeloPharyngeal Insufficiency (VPI); Speech recognition; Convolutional neural network; Long short term memory; Deep neural network;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 S. G. Fletcher, "Theory and instrumentation for quantitative measurement of nasality," J. Cleft Palate. 7, 601-609 (1970).
2 J. -E. Lee, W. -E. Kim, K. H. Kim, M. -W. Sung, and T. -K. Kwon,"Research on construction of the Korean speech corpus in patient with velopharyngeal insufficiency" (in Korean), JKORL. 55, 498-507 (2012).
3 M. Y. Sung, H. Kim, T. -K. Kwon, and M. -W. Sung, "Analysis on vowel and consonants sounds of patient's speech with velopharyngeal insufficiency (VPI) and simulated speech" (in Korean), JKIICE. 18, 1740-1748 (2014).
4 M. Y. Sung, T. -K. Kwon, M. -W. Sung, and W. Kim, "Effective recognition of velopharyngeal insufficiency (VPI) patient's speech using simulated speech model" (in Korean), JKIICE. 19, 1243-1250 (2015).
5 K. Yoon and W. Kim, "Effective recognition of velopharyngeal insufficiency (VPI) patient's speech using DNN-HMM-based system" (in Korean), JKIICE. 23, 33-38 (2019).
6 HTK Speech Recognition Toolkit, http://htk.eng.cam.ac.uk/, (Last viewed March 11, 2015).
7 ETSI ES 201 108, Standard Document, v1.1.2.(2000-04)., 2000.
8 J. L. Gauvain and C. H. Lee, "Maximum a posteriori estimation for multivariate Gaussian mixture observations of markov chains," IEEE Trans. on Speech and Audio Proc. 2, 291-298 (1994).   DOI
9 C. J. Leggetter and P. C. Woodland, "Maximum likelihood linear regression for speaker adaptation of continuous density HMMs," Computer Speech and Language, 9, 171-185 (1995).   DOI
10 J. -T. Huang, J. Li, D. Yu, L. Deng, and Y. Gong, "Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers," Proc. IEEE ICASSP. 7304-7308 (2013).
11 W. Hu, Y. Qian, and F. K. Soong, "A DNN-based acoustic modeling of tonal language and its application to Mandarin pronunciation training," Proc. IEEE ICASSP. 3206-3210 (2014).
12 S. Park, Y. Jeong, and H. S. Kim, "Multiresolution CNN for reverberant speech recognition," Proc. 20th Conf. O-COCOSDA. 1-4 (2017).
13 A. Senior, H. Sak, and I. Shafran, "Context dependent phone models for LSTM RNN acoustic modeling," Proc. IEEE ICASSP. 4585-4589 (2015).
14 S. J. Rennie, V. Goel, and S. Thomas, "Annealed dropout training of deep networks," Proc. IEEE SLT. 159-164 (2014).
15 S. Hochreiter and J. Schmichuber, "Long short-term memory," Neural Computation, 9, 1735-1780 (1997).   DOI