Browse > Article
http://dx.doi.org/10.15701/kcgs.2020.26.3.49

Speech Animation Synthesis based on a Korean Co-articulation Model  

Jang, Minjung (KAIST, Visual Media Lab.)
Jung, Sunjin (KAIST, Visual Media Lab.)
Noh, Junyong (KAIST, Visual Media Lab.)
Abstract
In this paper, we propose a speech animation synthesis specialized in Korean through a rule-based co-articulation model. Speech animation has been widely used in the cultural industry, such as movies, animations, and games that require natural and realistic motion. Because the technique for audio driven speech animation has been mainly developed for English, however, the animation results for domestic content are often visually very unnatural. For example, dubbing of a voice actor is played with no mouth motion at all or with an unsynchronized looping of simple mouth shapes at best. Although there are language-independent speech animation models, which are not specialized in Korean, they are yet to ensure the quality to be utilized in a domestic content production. Therefore, we propose a natural speech animation synthesis method that reflects the linguistic characteristics of Korean driven by an input audio and text. Reflecting the features that vowels mostly determine the mouth shape in Korean, a coarticulation model separating lips and the tongue has been defined to solve the previous problem of lip distortion and occasional missing of some phoneme characteristics. Our model also reflects the differences in prosodic features for improved dynamics in speech animation. Through user studies, we verify that the proposed model can synthesize natural speech animation.
Keywords
Speech animation; Co-articulation; Forced-alignment; Prosodic features;
Citations & Related Records
Times Cited By KSCI : 3  (Citation Analysis)
연도 인용수 순위
1 S.-W. Kim, H. Lee, K.-H. Choi, and S.-Y. Park, "A talking head system for korean text," World Academy of Science, Engineering and Technology, vol. 50, 2005.
2 오현화, 김인철, 김동수, and 진성일, "한국어 모음 입술독해를 위한 시공간적 특징에 관한 연구," 한국음향학회지, pp. 19-26, 2002.   DOI
3 H.-J. Hyung, B.-K. Ahn, D. Choi, D. Lee, and D.-W. Lee, "Evaluation of a korean lip-sync system for an android robot," 2016 13th International Conference on Ubiquitous Robots and Ambient Intelligence (URAI), IEEE, pp. 78-82, 2016.
4 정일홍 and 김은지, "한국어 음소를 이용한 자연스러운 3d 립싱크 애니메이션," 한국디지털콘텐츠학회 논문지, vol. 9, no. 2, pp. 331-339, 2008.
5 김태은 and 박유신, "한글 문자 입력에 따른 얼굴 에니메이션," 한국전자통신학회 논문지, vol. 4, pp. 116-122, 2009.
6 P. Edwards, C. Landreth, E. Fiume, and K. Singh, "Jali: an animator-centric viseme model for expressive lip synchronization," ACM Transactions on Graphics (TOG), vol. 35, no. 4, p. 127, 2016.
7 Y.-C. Wang and R. T.-H. Tsai, "Rule-based korean grapheme to phoneme conversion using sound patterns," Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, Volume 2, pp. 843-850, 2009.
8 H. P. Graf, E. Cosatto, V. Strom, and F. J. Huang, "Visual prosody: Facial movements accompanying speech," Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition, IEEE, pp. 396-401, 2002.
9 김탁훈, "애니메이션 캐릭터의 한국어 립싱크 연구: 영어권 애니메이션의 립싱크 기법을 기반으로," 만화애니메이션 연구, pp. 97-114, 2008.
10 I. Albrecht, J. Haber, and H.-P. Seidel, "Automatic generation of non-verbal facial expressions from speech," Advances in Modelling, Animation and Rendering, Springer, London, pp. 283-293, 2002.
11 신지영, 한국어의 말소리. 박이정출판사, 2014.
12 J.-R. Park, C.-W. Choi, and M.-Y. Park, "Human-like fuzzy lip synchronization of 3d facial model based on speech speed," Proceedings of the Korean Institute of Intelligent Systems Conference, Korean Institute of Intelligent Systems, pp. 416-419, 2006.
13 K. Tjaden and G. E. Wilding, "Rate and loudness manipulations in dysarthria," Journal of Speech, Language, and Hearing Research, 2004.
14 S. L. Taylor, M. Mahler, B.-J. Theobald, and I. Matthews, "Dynamic units of visual speech," Proceedings of the 11th ACM SIGGRAPH/Eurographics conference on Computer Animation, pp. 275-284, 2012.
15 R. D. Kent and F. D. Minifie, "Coarticulation in recent speech production models," Journal of phonetics, vol. 5, no. 2, pp. 115-133, 1977.   DOI
16 이광희, 고우현, 지상훈, 남경태, and 이상무, "시청각 정보를 활용한 음성 오인식률 개선 알고리즘," 한국정밀공학회 학술발표대회 논문집, pp. 341-342, 2010.
17 임성민, 구자현, and 김회린, "어텐션 기반 엔드투엔드 음성인식 시각화 분석," 말소리와 음성과학, vol. 11, no. 1, pp. 41-49, 2019.   DOI
18 M. McAuliffe, M. Socolof, S. Mihuc, M. Wagner, and M. Sonderegger, "Montreal forced aligner: Trainable text-speech alignment using kaldi." Interspeech, pp. 498-502, 2017.
19 김종록, "외국인을 위한 한국어 동사 활용 사전 돌아보기," 한글, no. 295, pp. 73-134, 2012.
20 D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hannemann, P. Motlicek, Y. Qian, P. Schwarz, J. Silovsky, G. Stemmer, and K. Vesely, "The kaldi speech recognition toolkit," IEEE 2011 workshop on automatic speech recognition and understanding, IEEE Signal Processing Society, 2011.
21 임홍빈, "한국어의 불규칙 활용에 대하여," 韓國學究論文集, no. 3, pp. 1-21, 2014.
22 양순임, "'ㅎ'불규칙용언의 표기 규정에 대한 고찰," 한민족어문학, vol. 62, pp. 315-338, 2012.
23 Y. Zhou, Z. Xu, C. Landreth, E. Kalogerakis, S. Maji, and K. Singh, "Visemenet: Audio-driven animator-centric speech animation," ACM Transactions on Graphics (TOG), vol. 37, no. 4, pp. 1-10, 2018.
24 G. S. Turner and G. Weismer, "Characteristics of speaking rate in the dysarthria associated with amyotrophic lateral sclerosis," Journal of Speech, Language, and Hearing Research, vol. 36, no. 6, pp. 1134-1144, 1993.   DOI