Search | Korea Science

Speech Emotion Recognition Using 2D-CNN with Mel-Frequency Cepstrum Coefficients

Eom, Youngsik;Bang, Junseong
- Journal of information and communication convergence engineering
- /
- v.19 no.3
- /
- pp.148-154
- /
- 2021
With the advent of context-aware computing, many attempts were made to understand emotions. Among these various attempts, Speech Emotion Recognition (SER) is a method of recognizing the speaker's emotions through speech information. The SER is successful in selecting distinctive 'features' and 'classifying' them in an appropriate way. In this paper, the performances of SER using neural network models (e.g., fully connected network (FCN), convolutional neural network (CNN)) with Mel-Frequency Cepstral Coefficients (MFCC) are examined in terms of the accuracy and distribution of emotion recognition. For Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) dataset, by tuning model parameters, a two-dimensional Convolutional Neural Network (2D-CNN) model with MFCC showed the best performance with an average accuracy of 88.54% for 5 emotions, anger, happiness, calm, fear, and sadness, of men and women. In addition, by examining the distribution of emotion recognition accuracies for neural network models, the 2D-CNN with MFCC can expect an overall accuracy of 75% or more.
https://doi.org/10.6109/jicce.2021.19.3.148 인용 PDF KSCI

Alexithymia and the Recognition of Facial Emotion in Schizophrenic Patients (정신분열병 환자에서의 감정표현불능증과 얼굴정서인식결핍)

Noh, Jin-Chan;Park, Sung-Hyouk;Kim, Kyung-Hee;Kim, So-Yul;Shin, Sung-Woong;Lee, Koun-Seok
- Korean Journal of Biological Psychiatry
- /
- v.18 no.4
- /
- pp.239-244
- /
- 2011
Objectives Schizophrenic patients have been shown to be impaired in both emotional self-awareness and recognition of others' facial emotions. Alexithymia refers to the deficits in emotional self-awareness. The relationship between alexithymia and recognition of others' facial emotions needs to be explored to better understand the characteristics of emotional deficits in schizophrenic patients. Methods Thirty control subjects and 31 schizophrenic patients completed the Toronto Alexithymia Scale-20-Korean version (TAS-20K) and facial emotion recognition task. The stimuli in facial emotion recognition task consist of 6 emotions (happiness, sadness, anger, fear, disgust, and neutral). Recognition accuracy was calculated within each emotion category. Correlations between TAS-20K and recognition accuracy were analyzed. Results The schizophrenic patients showed higher TAS-20K scores and lower recognition accuracy compared with the control subjects. The schizophrenic patients did not demonstrate any significant correlations between TAS-20K and recognition accuracy, unlike the control subjects. Conclusions The data suggest that, although schizophrenia may impair both emotional self-awareness and recognition of others' facial emotions, the degrees of deficit can be different between emotional self-awareness and recognition of others' facial emotions. This indicates that the emotional deficits in schizophrenia may assume more complex features.
PDF KSCI

A study on the enhancement of emotion recognition through facial expression detection in user's tendency (사용자의 성향 기반의 얼굴 표정을 통한 감정 인식률 향상을 위한 연구)

Lee, Jong-Sik;Shin, Dong-Hee
- Science of Emotion and Sensibility
- /
- v.17 no.1
- /
- pp.53-62
- /
- 2014
Despite the huge potential of the practical application of emotion recognition technologies, the enhancement of the technologies still remains a challenge mainly due to the difficulty of recognizing emotion. Although not perfect, human emotions can be recognized through human images and sounds. Emotion recognition technologies have been researched by extensive studies that include image-based recognition studies, sound-based studies, and both image and sound-based studies. Studies on emotion recognition through facial expression detection are especially effective as emotions are primarily expressed in human face. However, differences in user environment and their familiarity with the technologies may cause significant disparities and errors. In order to enhance the accuracy of real-time emotion recognition, it is crucial to note a mechanism of understanding and analyzing users' personality traits that contribute to the improvement of emotion recognition. This study focuses on analyzing users' personality traits and its application in the emotion recognition system to reduce errors in emotion recognition through facial expression detection and improve the accuracy of the results. In particular, the study offers a practical solution to users with subtle facial expressions or low degree of emotion expression by providing an enhanced emotion recognition function.
https://doi.org/10.14695/KJSOS.2014.17.1.53 인용 PDF KSCI

Transformer-based transfer learning and multi-task learning for improving the performance of speech emotion recognition (음성감정인식 성능 향상을 위한 트랜스포머 기반 전이학습 및 다중작업학습)

Park, Sunchan;Kim, Hyung Soon
- The Journal of the Acoustical Society of Korea
- /
- v.40 no.5
- /
- pp.515-522
- /
- 2021
It is hard to prepare sufficient training data for speech emotion recognition due to the difficulty of emotion labeling. In this paper, we apply transfer learning with large-scale training data for speech recognition on a transformer-based model to improve the performance of speech emotion recognition. In addition, we propose a method to utilize context information without decoding by multi-task learning with speech recognition. According to the speech emotion recognition experiments using the IEMOCAP dataset, our model achieves a weighted accuracy of 70.6 % and an unweighted accuracy of 71.6 %, which shows that the proposed method is effective in improving the performance of speech emotion recognition.
https://doi.org/10.7776/ASK.2021.40.5.515 인용 PDF KSCI

Multimodal Parametric Fusion for Emotion Recognition

Kim, Jonghwa
- International journal of advanced smart convergence
- /
- v.9 no.1
- /
- pp.193-201
- /
- 2020
The main objective of this study is to investigate the impact of additional modalities on the performance of emotion recognition using speech, facial expression and physiological measurements. In order to compare different approaches, we designed a feature-based recognition system as a benchmark which carries out linear supervised classification followed by the leave-one-out cross-validation. For the classification of four emotions, it turned out that bimodal fusion in our experiment improves recognition accuracy of unimodal approach, while the performance of trimodal fusion varies strongly depending on the individual. Furthermore, we experienced extremely high disparity between single class recognition rates, while we could not observe a best performing single modality in our experiment. Based on these observations, we developed a novel fusion method, called parametric decision fusion (PDF), which lies in building emotion-specific classifiers and exploits advantage of a parametrized decision process. By using the PDF scheme we achieved 16% improvement in accuracy of subject-dependent recognition and 10% for subject-independent recognition compared to the best unimodal results.
https://doi.org/10.7236/IJASC.2020.9.1.193 인용 PDF KSCI

Emotion Recognition Implementation with Multimodalities of Face, Voice and EEG

Udurume, Miracle;Caliwag, Angela;Lim, Wansu;Kim, Gwigon
- Journal of information and communication convergence engineering
- /
- v.20 no.3
- /
- pp.174-180
- /
- 2022
Emotion recognition is an essential component of complete interaction between human and machine. The issues related to emotion recognition are a result of the different types of emotions expressed in several forms such as visual, sound, and physiological signal. Recent advancements in the field show that combined modalities, such as visual, voice and electroencephalography signals, lead to better result compared to the use of single modalities separately. Previous studies have explored the use of multiple modalities for accurate predictions of emotion; however the number of studies regarding real-time implementation is limited because of the difficulty in simultaneously implementing multiple modalities of emotion recognition. In this study, we proposed an emotion recognition system for real-time emotion recognition implementation. Our model was built with a multithreading block that enables the implementation of each modality using separate threads for continuous synchronization. First, we separately achieved emotion recognition for each modality before enabling the use of the multithreaded system. To verify the correctness of the results, we compared the performance accuracy of unimodal and multimodal emotion recognitions in real-time. The experimental results showed real-time user emotion recognition of the proposed model. In addition, the effectiveness of the multimodalities for emotion recognition was observed. Our multimodal model was able to obtain an accuracy of 80.1% as compared to the unimodality, which obtained accuracies of 70.9, 54.3, and 63.1%.
https://doi.org/10.56977/jicce.2022.20.3.174 인용 PDF KSCI

Research of Real-Time Emotion Recognition Interface Using Multiple Physiological Signals of EEG and ECG (뇌파 및 심전도 복합 생체신호를 이용한 실시간 감정인식 인터페이스 연구)

Shin, Dong-Min;Shin, Dong-Il;Shin, Dong-Kyoo
- Journal of Korea Game Society
- /
- v.15 no.2
- /
- pp.105-114
- /
- 2015
We propose a real time user interface that utilizes emotion recognition by physiological signals. To improve the problem that was low accuracy of emotion recognition through the traditional EEG(ElectroEncephaloGram), We developed a physiological signals-based emotion recognition system mixing relative power spectrum values of theta/alpha/beta/gamma EEG waves and autonomic nerve signal ratio of ECG (ElectroCardioGram). We propose both a data map and weight value modification algorithm to recognize six emotions of happy, fear, sad, joy, anger, and hatred. The datamap that stores the user-specific probability value is created and the algorithm updates the weighting to improve the accuracy of emotion recognition corresponding to each EEG channel. Also, as we compared the results of the EEG/ECG bio-singal complex data and single data consisting of EEG, the accuracy went up 23.77%. The proposed interface system with high accuracy will be utillized as a useful interface for controlling the game spaces and smart spaces.
https://doi.org/10.7583/JKGS.2015.15.2.105 인용 PDF KSCI

Emotion Recognition based on Tracking Facial Keypoints (얼굴 특징점 추적을 통한 사용자 감성 인식)

Lee, Yong-Hwan;Kim, Heung-Jun
- Journal of the Semiconductor & Display Technology
- /
- v.18 no.1
- /
- pp.97-101
- /
- 2019
Understanding and classification of the human's emotion play an important tasks in interacting with human and machine communication systems. This paper proposes a novel emotion recognition method by extracting facial keypoints, which is able to understand and classify the human emotion, using active Appearance Model and the proposed classification model of the facial features. The existing appearance model scheme takes an expression of variations, which is calculated by the proposed classification model according to the change of human facial expression. The proposed method classifies four basic emotions (normal, happy, sad and angry). To evaluate the performance of the proposed method, we assess the ratio of success with common datasets, and we achieve the best 93% accuracy, average 82.2% in facial emotion recognition. The results show that the proposed method effectively performed well over the emotion recognition, compared to the existing schemes.
PDF KSCI

Half-Against-Half Multi-class SVM Classify Physiological Response-based Emotion Recognition

Vanny, Makara;Ko, Kwang-Eun;Park, Seung-Min;Sim, Kwee-Bo
- Journal of the Korean Institute of Intelligent Systems
- /
- v.23 no.3
- /
- pp.262-267
- /
- 2013
The recognition of human emotional state is one of the most important components for efficient human-human and human- computer interaction. In this paper, four emotions such as fear, disgust, joy, and neutral was a main problem of classifying emotion recognition and an approach of visual-stimuli for eliciting emotion based on physiological signals of skin conductance (SC), skin temperature (SKT), and blood volume pulse (BVP) was used to design the experiment. In order to reach the goal of solving this problem, half-against-half (HAH) multi-class support vector machine (SVM) with Gaussian radial basis function (RBF) kernel was proposed showing the effective techniques to improve the accuracy rate of emotion classification. The experimental results proved that the proposed was an efficient method for solving the emotion recognition problems with the accuracy rate of 90% of neutral, 86.67% of joy, 85% of disgust, and 80% of fear.
https://doi.org/10.5391/JKIIS.2013.23.3.262 인용 PDF KSCI

Classification of Three Different Emotion by Physiological Parameters

Jang, Eun-Hye;Park, Byoung-Jun;Kim, Sang-Hyeob;Sohn, Jin-Hun
- Journal of the Ergonomics Society of Korea
- /
- v.31 no.2
- /
- pp.271-279
- /
- 2012
Objective: This study classified three different emotional states(boredom, pain, and surprise) using physiological signals. Background: Emotion recognition studies have tried to recognize human emotion by using physiological signals. It is important for emotion recognition to apply on human-computer interaction system for emotion detection. Method: 122 college students participated in this experiment. Three different emotional stimuli were presented to participants and physiological signals, i.e., EDA(Electrodermal Activity), SKT(Skin Temperature), PPG(Photoplethysmogram), and ECG (Electrocardiogram) were measured for 1 minute as baseline and for 1~1.5 minutes during emotional state. The obtained signals were analyzed for 30 seconds from the baseline and the emotional state and 27 features were extracted from these signals. Statistical analysis for emotion classification were done by DFA(discriminant function analysis) (SPSS 15.0) by using the difference values subtracting baseline values from the emotional state. Results: The result showed that physiological responses during emotional states were significantly differed as compared to during baseline. Also, an accuracy rate of emotion classification was 84.7%. Conclusion: Our study have identified that emotions were classified by various physiological signals. However, future study is needed to obtain additional signals from other modalities such as facial expression, face temperature, or voice to improve classification rate and to examine the stability and reliability of this result compare with accuracy of emotion classification using other algorithms. Application: This could help emotion recognition studies lead to better chance to recognize various human emotions by using physiological signals as well as is able to be applied on human-computer interaction system for emotion recognition. Also, it can be useful in developing an emotion theory, or profiling emotion-specific physiological responses as well as establishing the basis for emotion recognition system in human-computer interaction.
https://doi.org/10.5143/JESK.2012.31.2.271 인용 PDF KSCI

Search Result 93, Processing Time 0.032 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)