[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.13064/KSSS.2016.8.3.031

Development of articulatory estimation model using deep neural network

You, Heejo (고려대학교 심리학과)
Yang, Hyungwon (고려대학교 영어영문학과)
Kang, Jaekoo (고려대학교 영어영문학과)
Cho, Youngsun (고려대학교 영어영문학과)
Hwang, Sung Hah (고려대학교 영어영문학과)
Hong, Yeonjung (고려대학교 영어영문학과)
Cho, Yejin (고려대학교 영어영문학과)
Kim, Seohyun (고려대학교 영어영문학과)
Nam, Hosung (고려대학교)

Publication Information

Phonetics and Speech Sciences / v.8, no.3, 2016 , pp. 31-38 More about this Journal

Abstract

Speech inversion (acoustic-to-articulatory mapping) is not a trivial problem, despite the importance, due to the highly non-linear and non-unique nature. This study aimed to investigate the performance of Deep Neural Network (DNN) compared to that of traditional Artificial Neural Network (ANN) to address the problem. The Wisconsin X-ray Microbeam Database was employed and the acoustic signal and articulatory pellet information were the input and output in the models. Results showed that the performance of ANN deteriorated as the number of hidden layers increased. In contrast, DNN showed lower and more stable RMS even up to 10 deep hidden layers, suggesting that DNN is capable of learning acoustic-articulatory inversion mapping more efficiently than ANN.

Keywords

the Wisconsin X-ray Microbeam Database; speech inversion; artificial neural network; deep neural network;

Citations & Related Records

Reference

1	Ghosh, P. K. & Narayanan, S. (2011). Automatic speech recognition using articulatory features from subject-independent acoustic-to-articulatory inversion. The Journal of the Acoustical Society of America, 130(4), EL251-EL257. DOI
2	Sondhi, M. M. & Resnick, J. R. (1983). The inverse problem for the vocal tract: Numerical methods, acoustical experiments, and speech synthesis. The Journal of the Acoustical Society of America, 73(3), 985-1002. DOI
3	Wilson, I., Gick, B., O'Brien, M. G., Shea, C., & Archibald, J. (2006). Ultrasound technology and second language acquisition research. Proceedings of the 8th Generative Approaches to Second Language Acquisition Conference (GASLA 2006) (pp. 148-152).
4	Wrench, A. A., Gibbon, F., McNeill, A. M., & Wood, S. (2002). An EPG therapy protocol for remediation and assessment of articulation disorders. ICSLP.
5	Dusan, S. (2001). Methods for integrating phonetic and phonological knowledge in speech inversion. Proceedings of the International Conference on Speech, Signal and Image Processing. Malta.
6	Engwall, O. (2006). Evaluation of speech inversion using an articulatory classifier. Proceedings of the 7th International Seminar on Speech Production (pp. 469-476).
7	Papcun, G., Hochberg, J., Thomas, T. R., Laroche, F., Zacks, J., & Levy, S. (1992). Inferring articulation and recognizing gestures from acoustics with a neural network trained on x-ray microbeam data. The Journal of the Acoustical Society of America, 92(2), 688-700. DOI
8	Zacks, J. & Thomas, T. R. (1994). A new neural network for articulatory speech recognition and its application to vowel identification. Computer Speech & Language, 8(3), 189-209. DOI
9	Richmond, K. (2001). Mixture density networks, human articulatory data and acoustic-to-articulatory inversion of continuous speech. Proceedings of Workshop on Innovation in Speech Processing (WISP 2001) (pp. 259-276).
10	Qin, C. & Carreira-Perpinan, M. A. (2010). Articulatory inversion of american english /r/ by conditional density modes. Proceedings of 11th Annual Conference of the International Speech Communication Association (Interspeech 2010) (pp. 1998-2001)
11	Richmond, K., Hoole, P., & King, S. (2011). Announcing the Electromagnetic Articulography (Day 1) Subset of the mngu0 Articulatory Corpus. Proceedings of 12th Annual Conference of the International Speech Communication Association (Interspeech 2011) (pp. 1505-1508).
12	Mitra, V., Nam, H., Espy-Wilson, C., Saltzman, E., & Goldstein, L. (2011). Articulatory information for noise robust speech recognition. Audio, Speech, and Language Processing, IEEE Transaction on Audio, Speech, and Language Processing, 19(7), 1913-1924. DOI
13	Najnin, S. & Banerjee, B. (2015). Improved speech inversion using general regression neural network. The Journal of the Acoustical Society of America,138(3), EL229-EL235. DOI
14	Tu, J. V. (1996). Advantages and disadvantages of using artificial neural networks versus logistic regression for predicting medical outcomes. Journal of clinical epidemiology, 49(11), 1225-1231. DOI
15	Hinton, G. E., Osindero, S., & Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural computation, 18(7), 1527-1554. DOI
16	Simpson, A. J. (2015). Taming the ReLU with Parallel Dither in a Deep Neural Network (arXiv preprint). Retrieved from http://arxiv.org/abs/1509.05173 on September 17, 2015

KSCI

Development of articulatory estimation model using deep neural network 심층신경망을 이용한 조음 예측 모형 개발

Development of articulatory estimation model using deep neural network