[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.7472/jksii.2022.23.2.71

Korean Emotional Speech and Facial Expression Database for Emotional Audio-Visual Speech Generation

Baek, Ji-Young (Dept. of Electronic Engineering, Sangmyung University)
Kim, Sera (Dept. of Electronic Engineering, Sangmyung University)
Lee, Seok-Pil (Dept. of Electronic Engineering, Sangmyung University)

Publication Information

Journal of Internet Computing and Services / v.23, no.2, 2022 , pp. 71-77 More about this Journal

Abstract

In this paper, a database is collected for extending the speech synthesis model to a model that synthesizes speech according to emotions and generating facial expressions. The database is divided into male and female data, and consists of emotional speech and facial expressions. Two professional actors of different genders speak sentences in Korean. Sentences are divided into four emotions: happiness, sadness, anger, and neutrality. Each actor plays about 3300 sentences per emotion. A total of 26468 sentences collected by filming this are not overlap and contain expression similar to the corresponding emotion. Since building a high-quality database is important for the performance of future research, the database is assessed on emotional category, intensity, and genuineness. In order to find out the accuracy according to the modality of data, the database is divided into audio-video data, audio data, and video data.

Keywords

Speech Synthesis; Speech Emotion; Database; Multi Modal;

Citations & Related Records

Reference

1	A. H. Ali, M. Magdy, M. Alfawzy, M. Ghaly and H. Abbas, "Arabic Speech Synthesis using Deep Neural Networks," 2020 International Conference on Communications, Signal Processing, and their Applications (ICCSPA), pp. 1-6, 2021. https://doi.org/10.1109/ICCSPA49915.2021.9385731 DOI
2	L. Liu et al., "Controllable Emphatic Speech Synthesis based on Forward Attention for Expressive Speech Synthesis," 2021 IEEE Spoken Language Technology Workshop (SLT). IEEE, 2021. https://doi.org/10.1109/SLT48900.2021.9383537 DOI
3	Moataz El Ayadi, Mohamed S. Kamel, Fakhri Karray, "Survey on speech emotion recognition: Features, classification schemes, and databases", Pattern Recognition, Volume 44, Issue 3, pp.572-587, 2011. https://doi.org/10.1016/j.patcog.2010.09.020 DOI
4	Pawel Tarnowski, Marcin Kolodziej, Andrzej Majkowski, Remigiusz J. Rak, "Emotion recognition using facial expressions", Procedia Computer Science, Volume 108, pp.1175-1184, 2017. https://doi.org/10.1016/j.procs.2017.05.025 DOI
5	Tottenham N, Tanaka JW, Leon AC, McCarry T, Nurse M, Hare TA, et al. "The NimStim set of facial expressions: judgments from untrained research participants.", Psychiatry Research, Volume 168, Issue 3, pp.242-249, 2009. https://doi.org/10.1016/j.psychres.2008.05.006 DOI
6	S. R. Livingstone and F. A. Russo, "The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): a dynamic, multimodal set of facial and vocal expressions in North American English," PLoS ONE, vol. 13, no. 5, pp. e0196391, 2018. https://doi.org/10.1371/journal.pone.0196391 DOI
7	H. Choi, S. Park, J. Park, and M. Hahn, "Emotional Speech Synthesis for Multi-Speaker Emotional Dataset Using WaveNet Vocoder", 2019 IEEE International Conference on Consumer Electronics (ICCE), pp. 1-2, 2019. https://doi.org/10.1109/ICCE.2019.8661919 DOI

KSCI

Korean Emotional Speech and Facial Expression Database for Emotional Audio-Visual Speech Generation 대화 영상 생성을 위한 한국어 감정음성 및 얼굴 표정 데이터베이스

Korean Emotional Speech and Facial Expression Database for Emotional Audio-Visual Speech Generation