Browse > Article
http://dx.doi.org/10.15701/kcgs.2022.28.3.23

Emotion-based Real-time Facial Expression Matching Dialogue System for Virtual Human  

Kim, Kirak (Sogang University of Art&Technology)
Yeon, Heeyeon (Sogang University of Artificial Intelligence)
Eun, Taeyoung (Sogang University of Computer Science Dept)
Jung, Moonryul (Sogang University of Art&Technology)
Abstract
Virtual humans are implemented with dedicated modeling tools like Unity 3D Engine in virtual space (virtual reality, mixed reality, metaverse, etc.). Various human modeling tools have been introduced to implement virtual human-like appearance, voice, expression, and behavior similar to real people, and virtual humans implemented via these tools can communicate with users to some extent. However, most of the virtual humans so far have stayed unimodal using only text or speech. As AI technologies advance, the outdated machine-centered dialogue system is now changing to a human-centered, natural multi-modal system. By using several pre-trained networks, we implemented an emotion-based multi-modal dialogue system, which generates human-like utterances and displays appropriate facial expressions in real-time.
Keywords
Virtual Human; Multi-Modal Dialogue; Unity; Dialogue based on Emotions; GPT-2; RoBERTa;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Ekman, Paul, and Wallace V. Friesen. "Facial action co ding system." Environmental Psychology & Nonverbal Behavior (1978).
2 Li, Yanran, et al. "Dailydialog: A manually labelled multi-turn dialogue dataset." arXiv preprint arXiv:1710.03957 (2017).
3 Smith, Eric Michael, et al. "Can you put it all together: Evaluating conversational agents' ability to blend skills." arXiv preprint arXiv:2004.08449 (2020).
4 Kucherenko, Taras, et al. "Gesticulator: A framework for semantically-aware speech-driven gesture generation." In Proceedings of the 2020 International Conference on Multimodal Interaction (pp.242-250).(2020).
5 이재현, 박경주.대화형 가상 현실에서 아바타의 립싱크.컴퓨터그래픽스학회논문지,26(4),9-15.(2020).
6 McDonnell, Rachel, et al. "Model for predicting perception of facial action unit activation using virtual humans." Computers & Graphics 100 81-92. (2021).   DOI
7 Lee, Lik-Hang, et al. "All one needs to know about me taverse: A complete survey on technological singularity, virtual ecosystem, and research agenda." arXiv preprint arXiv:2110.05352 (2021).
8 Utrecht University Department of Information and Computing Sciences Virtual Worlds division IVA 2016 Tutorial September 20 (2016)
9 Lewis, John P., et al. "Practice and theory of blendshape facial models." Eurographics (State of the Art Reports) 1.8 2. (2014).
10 Cohn, Jeffrey F., Zara Ambadar, and Paul Ekman. "Obs erver-based measurement of facial expression with the Facial Action Coding System." The handbook of emot ion elicitation and assessment 1.3 203-221. (2007).
11 Friesen, W. "EMFACS-7: Emotional Facial Action Coding System. Unpublished manual/W. Frisen, P. Ekman." (1983).
12 Liu, Yinhan, et al. "Roberta: A robustly optimized bert pretraining approach." arXiv preprint arXiv:1907.11692 (2019).
13 Zhang, Saizheng, et al. "Personalizing dialogue agents: I have a dog, do you have pets too?." arXiv preprint arXiv:1801.07243 (2018).
14 Rashkin, Hannah, et al. "Towards empathetic open-domain conversation models: A new benchmark and dataset." arXiv preprint arXiv:1811.00207 (2018).
15 Wahlster, W. Dialogue systems go multimodal: The smartkom experience. In SmartKom: foundations of multimodal dialogue systems (pp. 3-27). Springer, Berlin, Heidelberg.(2006).
16 Zupan, Jure. "Introduction to artificial neural network (ANN) methods: what they are and how to use them." Acta Chimica Slovenica 41 327-327.(1994).
17 Radford, Alec, et al. "Improving language understanding by generative pre-training." (2018).
18 Pham, H. X., Wang, Y., & Pavlovic, V. "End-to-end learning for 3d facial animation from speech." Proceedings of the 20th ACM International Conference on Multimodal Interaction. (pp. 361-365). (2018).