Browse > Article
http://dx.doi.org/10.14695/KJSOS.2021.24.1.91

Hi, KIA! Classifying Emotional States from Wake-up Words Using Machine Learning  

Kim, Taesu (KAIST 산업디자인학과)
Kim, Yeongwoo (KAIST 산업디자인학과)
Kim, Keunhyeong (KAIST 문화과학기술대학원)
Kim, Chul Min (KAIST 원자력및양자공학과)
Jun, Hyung Seok (현대자동차 기아디자인센터 기아내장디자인실)
Suk, Hyeon-Jeong (KAIST 산업디자인학과)
Publication Information
Science of Emotion and Sensibility / v.24, no.1, 2021 , pp. 91-104 More about this Journal
Abstract
This study explored users' emotional states identified from the wake-up words -"Hi, KIA!"- using a machine learning algorithm considering the user interface of passenger cars' voice. We targeted four emotional states, namely, excited, angry, desperate, and neutral, and created a total of 12 emotional scenarios in the context of car driving. Nine college students participated and recorded sentences as guided in the visualized scenario. The wake-up words were extracted from whole sentences, resulting in two data sets. We used the soundgen package and svmRadial method of caret package in open source-based R code to collect acoustic features of the recorded voices and performed machine learning-based analysis to determine the predictability of the modeled algorithm. We compared the accuracy of wake-up words (60.19%: 22%~81%) with that of whole sentences (41.51%) for all nine participants in relation to the four emotional categories. Accuracy and sensitivity performance of individual differences were noticeable, while the selected features were relatively constant. This study provides empirical evidence regarding the potential application of the wake-up words in the practice of emotion-driven user experience in communication between users and the artificial intelligence system.
Keywords
Voice-User Interface (VUI); Wake-Up Words; Machine-Learning; Acoustic Feature; svmRadial; Emotional User Scenario;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Jones, C. M., & Jonsson, I. M. (2007). Performance analysis of acoustic emotion recognition for in-car conversational interfaces. In International Conference on Universal Access in Human-Computer Interaction (pp. 411-420). Berlin, Heidelberg, DOI: 10.1007/978-3-540-73281-5_44   DOI
2 Wiegand, G., Mai, C., Hollander, K., & Hussmann, H. (2019). InCarAR: A Design Space Towards 3D Augmented Reality Applications in Vehicles. In Proceedings of the 11th International Conference on Automotive User Interfaces and Interactive Vehicular Applications, Utrecht, Netherlands (pp. 1-13), DOI: 10.1145/3342197.3344539   DOI
3 Kim, Y., Kim, T., Kim, G., Jeon, H., & Suk. H. J. (2020). Hi Kia~, hi... kia..., HI KIA!! Proceeding of Fall Conference of Korean Society for Emotion and Sensibility (pp. 21-22), Daejeon.
4 Nass, C., Jonsson, I. M., Harris, H., Reaves, B., Endo, J., Brave, S., & Takayama, L. (2005). Improving automotive safety by pairing driver emotion and car voice emotion. In Proceedings of CHI '05 Extended Abstracts on Human Factors in Computing Systems 2-7 (pp. 1973-1976), Portland, Oregon, USA. DOI: 10.1145/1056808.1057070   DOI
5 Ogilvy, J. (2011). Facing the Fold: Essays on Scenario Planning (pp. 11-29). Devon: Triarchy Press.
6 Jang, K., & Kim, T. (2005). The pragmatic elements concerned with the sounds of utterance. Korean Semantics, 18, 175-196.
7 Jones, C. M., & Jonsson, I. M. (2005). Automatic recognition of affective cues in the speech of car drivers to allow appropriate responses. In Proceedings of the 17th Australia conference on Computer-Human Interaction: Citizens Online: Considerations for Today and the Future (pp. 1-10), Narrabundah, Australia, Nov. 2005. DOI: 10.5555/1108368.1108397   DOI
8 Nordstrom, H., & Laukka, P. (2019). The time course of emotion recognition in speech and music. The Journal of the Acoustical Society of America, 145(5), 3058-3074. DOI: 10.1121/1.5108601   DOI
9 Alcamo, J. (2008). Chapter six the SAS approach: combining qualitative and quantitative knowledge in environmental scenarios. Developments in integrated environmental assessment, 2, 123-150. DOI: /10.1016/S1574-101X(08)00406-7   DOI
10 Davitz, J. R. (1964). The communication of emotional meaning. Oxford, England: McGraw Hill.
11 Swain, M., Routray, A., & Kabisatpathy, P. (2018). Databases, features and classifiers for speech emotion recognition: a review. International Journal of Speech Technology, 21(1), 93-120. DOI: 10.1007/s10772-018-9491-z   DOI
12 Russell, J. A. (1980). A circumplex model of affect. Journal of personality and social psychology, 39(6), 1161-1178. DOI: 10.1037/h0077714   DOI
13 Voicebot. ai. (2020). In-car voice assistant consumer adoption report. Retrieved from https://voicebot.ai/wp-content/uploads/2020/02/in_car_voice_assistant_consumer_adoption_report_2020_voicebot.pdf
14 Kepuska, V. Z., & Klein, T. B. (2009). A novel wakeup-word speech recognition system, wake-up-word recognition task, technology and evaluation. Nonlinear Analysis: Theory, Methods & Applications, 71(12), e2772-e2789. DOI: 10.1016/j.na.2009.06.089   DOI
15 Park. J., Park, J., & Sohn, J. (2013). Acoustic parameters for induced emotion categorizing and dimensional approach. Science of Emotion and Sensibility, 16(1), 117-124.
16 Schuller, B., Lang, M., & Rigoll, G. (2006). Recognition of spontaneous emotions by speech within automotive environment. Proceedings of German Annual Conference of Acoustics, Braunschweig, Germany, Mar, 2006.