A study on recognition improvement of velopharyngeal insufficiency patient's speech using various types of deep neural network
![]() |
Kim, Min-seok
(Department of Computer Science and Engineering, Incheon National University)
Jung, Jae-hee (Department of Computer Science and Engineering, Incheon National University) Jung, Bo-kyung (Department of Computer Science and Engineering, Incheon National University) Yoon, Ki-mu (Department of Computer Science and Engineering, Incheon National University) Bae, Ara (Department of Computer Science and Engineering, Incheon National University) Kim, Wooil (Department of Computer Science and Engineering, Incheon National University) |
1 | S. G. Fletcher, "Theory and instrumentation for quantitative measurement of nasality," J. Cleft Palate. 7, 601-609 (1970). |
2 | J. -E. Lee, W. -E. Kim, K. H. Kim, M. -W. Sung, and T. -K. Kwon,"Research on construction of the Korean speech corpus in patient with velopharyngeal insufficiency" (in Korean), JKORL. 55, 498-507 (2012). |
3 | M. Y. Sung, H. Kim, T. -K. Kwon, and M. -W. Sung, "Analysis on vowel and consonants sounds of patient's speech with velopharyngeal insufficiency (VPI) and simulated speech" (in Korean), JKIICE. 18, 1740-1748 (2014). |
4 | M. Y. Sung, T. -K. Kwon, M. -W. Sung, and W. Kim, "Effective recognition of velopharyngeal insufficiency (VPI) patient's speech using simulated speech model" (in Korean), JKIICE. 19, 1243-1250 (2015). |
5 | K. Yoon and W. Kim, "Effective recognition of velopharyngeal insufficiency (VPI) patient's speech using DNN-HMM-based system" (in Korean), JKIICE. 23, 33-38 (2019). |
6 | HTK Speech Recognition Toolkit, http://htk.eng.cam.ac.uk/, (Last viewed March 11, 2015). |
7 | ETSI ES 201 108, Standard Document, v1.1.2.(2000-04)., 2000. |
8 | J. L. Gauvain and C. H. Lee, "Maximum a posteriori estimation for multivariate Gaussian mixture observations of markov chains," IEEE Trans. on Speech and Audio Proc. 2, 291-298 (1994). DOI |
9 | C. J. Leggetter and P. C. Woodland, "Maximum likelihood linear regression for speaker adaptation of continuous density HMMs," Computer Speech and Language, 9, 171-185 (1995). DOI |
10 | J. -T. Huang, J. Li, D. Yu, L. Deng, and Y. Gong, "Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers," Proc. IEEE ICASSP. 7304-7308 (2013). |
11 | W. Hu, Y. Qian, and F. K. Soong, "A DNN-based acoustic modeling of tonal language and its application to Mandarin pronunciation training," Proc. IEEE ICASSP. 3206-3210 (2014). |
12 | S. Park, Y. Jeong, and H. S. Kim, "Multiresolution CNN for reverberant speech recognition," Proc. 20th Conf. O-COCOSDA. 1-4 (2017). |
13 | A. Senior, H. Sak, and I. Shafran, "Context dependent phone models for LSTM RNN acoustic modeling," Proc. IEEE ICASSP. 4585-4589 (2015). |
14 | S. J. Rennie, V. Goel, and S. Thomas, "Annealed dropout training of deep networks," Proc. IEEE SLT. 159-164 (2014). |
15 | S. Hochreiter and J. Schmichuber, "Long short-term memory," Neural Computation, 9, 1735-1780 (1997). DOI |
![]() |