[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.5370/KIEE.2014.63.6.813

Monosyllable Speech Recognition through Facial Movement Analysis

Kang, Dong-Won (Dept. of Biomedical Engineering, Konkuk University)
Seo, Jeong-Woo (Dept. of Biomedical Engineering, Konkuk University)
Choi, Jin-Seung (Dept. of Biomedical Engineering, Konkuk University)
Choi, Jae-Bong (Department of Mechanical Systems Engineering, Hansung University)
Tack, Gye-Rae (Dept. of Biomedical Engineering, BK21+ Research Institute of Biomedical Engineering, Konkuk University)

Publication Information

The Transactions of The Korean Institute of Electrical Engineers / v.63, no.6, 2014 , pp. 813-819 More about this Journal

Abstract

The purpose of this study was to extract accurate parameters of facial movement features using 3-D motion capture system in speech recognition technology through lip-reading. Instead of using the features obtained through traditional camera image, the 3-D motion system was used to obtain quantitative data for actual facial movements, and to analyze 11 variables that exhibit particular patterns such as nose, lip, jaw and cheek movements in monosyllable vocalizations. Fourteen subjects, all in 20s of age, were asked to vocalize 11 types of Korean vowel monosyllables for three times with 36 reflective markers on their faces. The obtained facial movement data were then calculated into 11 parameters and presented as patterns for each monosyllable vocalization. The parameter patterns were performed through learning and recognizing process for each monosyllable with speech recognition algorithms with Hidden Markov Model (HMM) and Viterbi algorithm. The accuracy rate of 11 monosyllables recognition was 97.2%, which suggests the possibility of voice recognition of Korean language through quantitative facial movement analysis.

Keywords

3-D motion capture system; Facial motion; Hidden Markov model; Speech recognition;

Citations & Related Records

Times Cited By KSCI : 4 (Citation Analysis)

Reference
Cited By KSCI

1	T. Chen, H. P. Graf, and K. Wang, "Speech-assisted video processing: Interpolation and low-bitrate coding," 28th Annual Asilomar Conference on Signals, Systems, and Computers (Asilomar '94), pp. 957-979, 1994.
2	G. Baily, E. Vatikiotis-Bateson, and P. Perrier, Visual and audio-visual speech processing, MIT press, 2004.
3	A. Bagai, H. Gandhi, R. Goyal, M. Kohli, and T. V. Prasad, "Lip-Reading using Neural Networks, International Journal of Computer Science and Network Security," vol. 9, no. 4, pp. 108-111, 2009.
4	H. Mehrotra, G. Agrawal, and M. C. Srivastava, "Automatic Lip Contour Tracking and Visual Character Recognition for Computerized Lip Reading," International Journal of Computer Science, vol. 4, no. 1, pp. 62-71, 2009.
5	W. J. Ma, X. Zhou, L. A. Ross, J. J. Foxe, and L. C. Parra, "Lip reading aids word recognition most in moderate noise: a Bayesian explanation using highdimensional feature space," PLoS ONE, vol. 4, no. 3, pp. 1-14, 2009. DOI ScienceOn
6	J. J. Shin, J. Lee, and D. J. Kim, "Real-time lip reading system for isolated Korean word recognition," Pattern Recognition, vol. 44, pp. 559-571, 2011. DOI ScienceOn
7	M. G. Song, T. P. Thanh, J. Y. Kim, and S.T. Hwang, "A Study on Lip Detection based on Eye Localization for Visual Speech Recognition in Mobile Environment," Journal of Korean institute of intelligent systems, vol. 19, no. 4, pp. 478-484, 2009. 과학기술학회마을 DOI ScienceOn
8	Y. T. Won, H. D. Kim, M. R. Lee, B. S. Jang, and H. S. Kwak, "A Character Speech Animation System for Language Education for Each Hearing Impaired Person, Journal of digital contents society, vol. 9, no. 3, pp. 389-398, 2008. 과학기술학회마을
9	K. H. Lee, J. J. Kum, and S. B. Rhee, "Design & Implementation of Lipreading System using the Articulatory Controls Analysis of the Korean 5 Vowels," The Journal of Korean association of computer education, vol. 8, no. 4, pp. 281-288, 2007. 과학기술학회마을
10	J. Ma, R. Cole, B. Pellom, W. Ward, and B. Wise, "Accurate Visible Speech Synthesis Based on Concatenating Variable Length Motion Capture Data," IEEE Transactions on Visualization and computer Graphics, vol. 12, no. 2, pp. 266-276, 2006. DOI ScienceOn
11	G. Bailly, F. Elisei, M. Odisio, D. Pele, D. Caillière, and K. Grein-Cochard, "Talking faces for MPEG-4 compliant scalable face-to-face telecommunication," Proceedings of the Smart Objects Conference (SOC '03), pp. 204-207, 2003.
12	P. Scanlon, and R. Reilly, "Feature analysis for automatic speech reading," Proc. of the IEEE Int. Conf. on Multimedia Signal Processing (MMSP '01), pp. 625-630, 2001.
13	R. SCOTT, "Sparking life: notes on the performance capture sessions for the lord of the rings: the two towers," ACM SIG-GRAPH Computer Graphics, vol. 37, no. 4, pp. 17-21, 2003. DOI
14	Y. Cao, P. Faloutsos, E. Kohler, and F. Pighin, "Real-time speech motion synthesis from recorded motions," In Proceedings of Eurographics/SIGGRAPH Symposium on Computer Animation (SCA '04), pp. 345-353, 2004.
15	S. Dupont, and J. Luettin, "Audio-visual speech modeling for continuous speech recognition," In IEEE Trans-actions on Multimedia, vol. 2, pp. 141-151, 2000. DOI ScienceOn
16	C. G. Lee, I. M. So, Y. U. Kim, J. R. Kim, S. K. Kang, and S. T. Jung, "Implementation of three dimension lip reading system using stereo vision," Proceedings of Korea multimedia society conference (KMMS '04), pp. 489-492, 2004.
17	H. S. Koh, S. M. Han, J. U. Chu, S. H. Park, J. B. Choi, G. W. Choi, D. S. Hwang, and I. C. Youn, "The three-dimensional lip shape tracking system using stereo camera," Proceedings of the Korean Society of Precision Engineering (KSPE '11) Conference, pp. 979-980, 2011. 과학기술학회마을
18	K. H. Lee, R. Yong, and S. O. Kim, "A study on speechreading the Korean 8 vowels," Journal of the Korea society of computer and information, vol. 14, no. 3, pp. 173-182, 2009. 과학기술학회마을
19	J. Y. Kim, S. H. Min, and S. H. Choi, "Robustness of Bimodal Speech Recognition on Degradation of Lip Parameter Estimation Performance," Journal of the Korean Society of Phonetic Science and Speech Technology, vol. 10, no. 2, pp. 29-33, 2003. 과학기술학회마을
20	K. H. Nam, and C. S. Bae, "A study on the lip shape recognition algorithm using 3-D Model," The Journal of the Korean Institute of Maritime Information & Communication Sciences, vol. 6, no. 5, pp. 783-788, 2002. 과학기술학회마을
21	G. Galatas, G. Potamianos, D. Kosmopoulos, C. McMurrough, and F. Makedon, "Bilingual Corpus for AVASR using Multiple Sensors and Depth Information," Auditory-Visual Speech Processing (AVSP '11), pp. 103-106, 2011.
22	I. S. Pandzic, and R. Forchheimer, MPEG-4 Facial Animation: The Standard, Implementation, and Applications, John Wiley and Sons, Inc., New York, 2002.
23	A. Srinivasan, "Speech Recognition Using Hidden Markov Model," Applied Mathematical Sciences, vol. 5, no. 79, pp. 3943-3948, 2011.
24	D. A. Pierre, Optimization Theory with Applications, Dover Publications, Inc., New York, 1986.
25	N. Eveno, A. Capiler, and P. Y. Coulon, "Accurate and quasi-automatic lip tracking," IEEE Transactions of Circuits and Systems for Video Technology, vol. 14, no. 5, pp. 706-715, 2004. DOI ScienceOn
26	X. D. Huang, Y. Ariki, and M.A. Jack, Hidden Markov Models for Speech Recognition, Edinburgh Univ. Press, Edinburgh, 1990.

KSCI

Monosyllable Speech Recognition through Facial Movement Analysis 안면 움직임 분석을 통한 단음절 음성인식

Monosyllable Speech Recognition through Facial Movement Analysis