Table. 1 GMM-HMM based speech recognition result of speaker adaptation (word accuracy, %).
Table. 2 DNN-HMM based speech recognition result of speaker adaptation (word accuracy, %).
Table. 3 Speech recognition result of LIN-based DNN adaptation (word accuracy, %).
Table. 4 Speech recognition result of LON-based DNN adaptation (word accuracy, %).
Table. 5 Speech recognition result of speaker adaptation for different amount of data (word accuracy, %).
References
- S. G. Fletcher, "Theory and instrumentation for quantitative measurement of nasality," Cleft Palate Journal, vol. 7, pp. 601-609, 1970.
- J. Lee, W. Kim, K. Kim, M. Sung and T. Kwon, "Research on Construction of the Korean Speech Corpus in Patient with Velopharyngeal Insufficiency," Korean Journal of Otorhinolaryngol - Head & Neck Surgery, vol. 55, no. 8, pp. 498-507, 2012 . https://doi.org/10.3342/kjorl-hns.2012.55.8.498
- M. Sung, H. Kim, T. Kwon, M. Sung, and W. Kim, "Analysis on Vowel and Consonants Sounds of Patient's Speech with Velopharyngeal Insufficiency (VPI) and Simulated Speech," Journal of Korea Institute of Information and Communication Engineering, vol. 18, no. 7, pp. 1740-1748, July 2014 . https://doi.org/10.6109/jkiice.2014.18.7.1740
- M. Sung, T. Kwon, M. Sung, and W. Kim, "Effective Recognition of Velopharyngeal Insufficiency (VPI) Patient's Speech Using Simulated Speech Model," Journal of Korea Institute of Information and Communication Engineering, vol. 19, no. 5, pp. 1243- 1250, May 2015 . https://doi.org/10.6109/jkiice.2015.19.5.1243
- S. Young, HTK Book, Ver. 3.4, Cambridge, UK: Cambridge University Press, 2006.
- ETSI standard document, ETSI ES 201 108 v1.1.2 (2000-04), Feb. 2000.
- J. L. Gauvain and C. H. Lee, "Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains," IEEE Trans. on Speech and Audio Proc., vol. 2, no. 2, pp. 291-298, 1994. https://doi.org/10.1109/89.279278
- C. J. Leggetter and P. C. Woodland, "Maximum Likelihood Linear Regression for Speaker Adaptation of Continuous Density HMMs," Computer Speech and Language, 9, pp. 171-185, 1995. https://doi.org/10.1006/csla.1995.0010
- J.-T. Huang, J. Li, D. Yu, L. Deng and Y. Gong, "Crosslanguage knowledge transfer using multilingual deep neural network with shared hidden layers," IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP-2013), pp. 7304-7308, 2013.
- W. Hu, Y. Qian and F. K. Soong, "A DNN-based acoustic modeling of tonal language and its application to Mandarin pronunciation training," IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP-2014), pp. 3206-3210, 2014.
- S. Liu, and K. C. Sim, "On combining DNN and GMM with unsupervised speaker adaptation for robust automatic speech recognition," IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP-2014), pp. 195-199, 2014.
- D. Yu, L. Deng, Automatic Speech Recognition; A Deep Learning Approach, Springer, 2015.