Browse > Article
http://dx.doi.org/10.7472/jksii.2022.23.2.71

Korean Emotional Speech and Facial Expression Database for Emotional Audio-Visual Speech Generation  

Baek, Ji-Young (Dept. of Electronic Engineering, Sangmyung University)
Kim, Sera (Dept. of Electronic Engineering, Sangmyung University)
Lee, Seok-Pil (Dept. of Electronic Engineering, Sangmyung University)
Publication Information
Journal of Internet Computing and Services / v.23, no.2, 2022 , pp. 71-77 More about this Journal
Abstract
In this paper, a database is collected for extending the speech synthesis model to a model that synthesizes speech according to emotions and generating facial expressions. The database is divided into male and female data, and consists of emotional speech and facial expressions. Two professional actors of different genders speak sentences in Korean. Sentences are divided into four emotions: happiness, sadness, anger, and neutrality. Each actor plays about 3300 sentences per emotion. A total of 26468 sentences collected by filming this are not overlap and contain expression similar to the corresponding emotion. Since building a high-quality database is important for the performance of future research, the database is assessed on emotional category, intensity, and genuineness. In order to find out the accuracy according to the modality of data, the database is divided into audio-video data, audio data, and video data.
Keywords
Speech Synthesis; Speech Emotion; Database; Multi Modal;
Citations & Related Records
연도 인용수 순위
  • Reference
1 A. H. Ali, M. Magdy, M. Alfawzy, M. Ghaly and H. Abbas, "Arabic Speech Synthesis using Deep Neural Networks," 2020 International Conference on Communications, Signal Processing, and their Applications (ICCSPA), pp. 1-6, 2021. https://doi.org/10.1109/ICCSPA49915.2021.9385731   DOI
2 L. Liu et al., "Controllable Emphatic Speech Synthesis based on Forward Attention for Expressive Speech Synthesis," 2021 IEEE Spoken Language Technology Workshop (SLT). IEEE, 2021. https://doi.org/10.1109/SLT48900.2021.9383537   DOI
3 Moataz El Ayadi, Mohamed S. Kamel, Fakhri Karray, "Survey on speech emotion recognition: Features, classification schemes, and databases", Pattern Recognition, Volume 44, Issue 3, pp.572-587, 2011. https://doi.org/10.1016/j.patcog.2010.09.020   DOI
4 Pawel Tarnowski, Marcin Kolodziej, Andrzej Majkowski, Remigiusz J. Rak, "Emotion recognition using facial expressions", Procedia Computer Science, Volume 108, pp.1175-1184, 2017. https://doi.org/10.1016/j.procs.2017.05.025   DOI
5 Tottenham N, Tanaka JW, Leon AC, McCarry T, Nurse M, Hare TA, et al. "The NimStim set of facial expressions: judgments from untrained research participants.", Psychiatry Research, Volume 168, Issue 3, pp.242-249, 2009. https://doi.org/10.1016/j.psychres.2008.05.006   DOI
6 S. R. Livingstone and F. A. Russo, "The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): a dynamic, multimodal set of facial and vocal expressions in North American English," PLoS ONE, vol. 13, no. 5, pp. e0196391, 2018. https://doi.org/10.1371/journal.pone.0196391   DOI
7 H. Choi, S. Park, J. Park, and M. Hahn, "Emotional Speech Synthesis for Multi-Speaker Emotional Dataset Using WaveNet Vocoder", 2019 IEEE International Conference on Consumer Electronics (ICCE), pp. 1-2, 2019. https://doi.org/10.1109/ICCE.2019.8661919   DOI