Search | Korea Science

Kim, Hyoung-Geun;Park, Chul-Ha
- The KIPS Transactions:PartB
- /
- v.15B no.3
- /
- pp.179-188
- /
- 2008
In this paper, knowledge based text to facial sequence image system for interaction of lecturer and learner in cyber universities is studied. The system is defined by the synthesis of facial sequence image which is synchronized the lip according to the text information based on grammatical characteristic of hangul. For the implementation of the system, the transformation method that the text information is transformed into the phoneme code, the deformation rules of mouse shape which can be changed according to the code of phonemes, and the synthesis method of facial sequence image by using deformation rules of mouse shape are proposed. In the proposed method, all syllables of hangul are represented 10 principal mouse shape and 78 compound mouse shape according to the pronunciation characteristics of the basic consonants and vowels, and the characteristics of the articulation rules, respectively. To synthesize the real time facial sequence image able to realize the PC, the 88 mouth shape stored data base are used without the synthesis of mouse shape in each frame. To verify the validity of the proposed method the various synthesis of facial sequence image transformed from the text information is accomplished, and the system that can be applied the PC is implemented using the proposed method.
https://doi.org/10.3745/KIPSTB.2008.15-B.3.179 인용 PDF KSCI

Sakurai, Ryuhei;Shimba, Taiki;Yamazoe, Hirotake;Lee, Joo-Ho
- The Journal of Korea Robotics Society
- /
- v.13 no.1
- /
- pp.16-25
- /
- 2018
The talking head (TH) indicates an utterance face animation generated based on text and voice input. In this paper, we propose the generation method of TH with facial expression and intonation by speech input only. The problem of generating TH from speech can be regarded as a regression problem from the acoustic feature sequence to the facial code sequence which is a low dimensional vector representation that can efficiently encode and decode a face image. This regression was modeled by bidirectional RNN and trained by using SAVEE database of the front utterance face animation database as training data. The proposed method is able to generate TH with facial expression and intonation TH by using acoustic features such as MFCC, dynamic elements of MFCC, energy, and F0. According to the experiments, the configuration of the BLSTM layer of the first and second layers of bidirectional RNN was able to predict the face code best. For the evaluation, a questionnaire survey was conducted for 62 persons who watched TH animations, generated by the proposed method and the previous method. As a result, 77% of the respondents answered that the proposed method generated TH, which matches well with the speech.
https://doi.org/10.7746/jkros.2018.13.1.016 인용 PDF KSCI

Ju, Myung-Ho;Kang, Hang-Bong
- The KIPS Transactions:PartB
- /
- v.18B no.1
- /
- pp.21-28
- /
- 2011
A 3D face shape derived from 2D images may be useful in many applications, such as face recognition, face synthesis and human computer interaction. To do this, we develop a fast 3D Active Appearance Model (3D-AAM) method using depth estimation. The training images include specific 3D face poses which are extremely different from one another. The landmark's depth information of landmarks is estimated from the training image sequence by using the approximated Jacobian matrix. It is added at the test phase to deal with the 3D pose variations of the input face. Our experimental results show that the proposed method can efficiently fit the face shape, including the variations of facial expressions and 3D pose variations, better than the typical AAM, and can estimate accurate 3D face shape from images.
https://doi.org/10.3745/KIPSTB.2011.18B.1.021 인용 PDF KSCI