• Title/Summary/Keyword: Speech Emotion Recognition

Search Result 135, Processing Time 0.021 seconds

Speech Emotion Recognition by Speech Signals on a Simulated Intelligent Robot (모의 지능로봇에서 음성신호에 의한 감정인식)

  • Jang, Kwang-Dong;Kwon, Oh-Wook
    • Proceedings of the KSPS conference
    • /
    • 2005.11a
    • /
    • pp.163-166
    • /
    • 2005
  • We propose a speech emotion recognition method for natural human-robot interface. In the proposed method, emotion is classified into 6 classes: Angry, bored, happy, neutral, sad and surprised. Features for an input utterance are extracted from statistics of phonetic and prosodic information. Phonetic information includes log energy, shimmer, formant frequencies, and Teager energy; Prosodic information includes pitch, jitter, duration, and rate of speech. Finally a patten classifier based on Gaussian support vector machines decides the emotion class of the utterance. We record speech commands and dialogs uttered at 2m away from microphones in 5different directions. Experimental results show that the proposed method yields 59% classification accuracy while human classifiers give about 50%accuracy, which confirms that the proposed method achieves performance comparable to a human.

  • PDF

Emotion Recognition based on Multiple Modalities

  • Kim, Dong-Ju;Lee, Hyeon-Gu;Hong, Kwang-Seok
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.12 no.4
    • /
    • pp.228-236
    • /
    • 2011
  • Emotion recognition plays an important role in the research area of human-computer interaction, and it allows a more natural and more human-like communication between humans and computer. Most of previous work on emotion recognition focused on extracting emotions from face, speech or EEG information separately. Therefore, a novel approach is presented in this paper, including face, speech and EEG, to recognize the human emotion. The individual matching scores obtained from face, speech, and EEG are combined using a weighted-summation operation, and the fused-score is utilized to classify the human emotion. In the experiment results, the proposed approach gives an improvement of more than 18.64% when compared to the most successful unimodal approach, and also provides better performance compared to approaches integrating two modalities each other. From these results, we confirmed that the proposed approach achieved a significant performance improvement and the proposed method was very effective.

An Emotion Recognition Technique using Speech Signals (음성신호를 이용한 감정인식)

  • Jung, Byung-Wook;Cheun, Seung-Pyo;Kim, Youn-Tae;Kim, Sung-Shin
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.18 no.4
    • /
    • pp.494-500
    • /
    • 2008
  • In the field of development of human interface technology, the interactions between human and machine are important. The research on emotion recognition helps these interactions. This paper presents an algorithm for emotion recognition based on personalized speech signals. The proposed approach is trying to extract the characteristic of speech signal for emotion recognition using PLP (perceptual linear prediction) analysis. The PLP analysis technique was originally designed to suppress speaker dependent components in features used for automatic speech recognition, but later experiments demonstrated the efficiency of their use for speaker recognition tasks. So this paper proposed an algorithm that can easily evaluate the personal emotion from speech signals in real time using personalized emotion patterns that are made by PLP analysis. The experimental results show that the maximum recognition rate for the speaker dependant system is above 90%, whereas the average recognition rate is 75%. The proposed system has a simple structure and but efficient to be used in real time.

Speech Emotion Recognition on a Simulated Intelligent Robot (모의 지능로봇에서의 음성 감정인식)

  • Jang Kwang-Dong;Kim Nam;Kwon Oh-Wook
    • MALSORI
    • /
    • no.56
    • /
    • pp.173-183
    • /
    • 2005
  • We propose a speech emotion recognition method for affective human-robot interface. In the Proposed method, emotion is classified into 6 classes: Angry, bored, happy, neutral, sad and surprised. Features for an input utterance are extracted from statistics of phonetic and prosodic information. Phonetic information includes log energy, shimmer, formant frequencies, and Teager energy; Prosodic information includes Pitch, jitter, duration, and rate of speech. Finally a pattern classifier based on Gaussian support vector machines decides the emotion class of the utterance. We record speech commands and dialogs uttered at 2m away from microphones in 5 different directions. Experimental results show that the proposed method yields $48\%$ classification accuracy while human classifiers give $71\%$ accuracy.

  • PDF

A Proposal of an Interactive Simulation Game using SER (Speech Emotion Recognition) Technology (SER 기술을 이용한 대화형 시뮬레이션 게임 제안)

  • Lee, Kang-Hee;Jeon, Seo-Hyun
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2019.07a
    • /
    • pp.445-446
    • /
    • 2019
  • 본 논문에서는 단순히 필요한 정보를 얻기 위한 수준에 그쳤던 현대의 인공지능을 SER (Speech Emotion Recognition) 기술을 이용하여 사용자와 직접적으로 대화하는 형식으로 발전시키고자 한다. 사용자의 음성 언어에서 감정을 추출하여 인공지능 분야 및 챗봇과 대화함에 있어 좀더 효과적으로 해석할 수 있도록 도움을 준다. 이것을 대화형 시뮬레이션 게임에 접목시켜 단순한 선택형 대화 방식이 아닌 구어체로 대화하며 사용자에게 높은 몰입도를 줄 수 있다.

  • PDF

Feature Vector Processing for Speech Emotion Recognition in Noisy Environments (잡음 환경에서의 음성 감정 인식을 위한 특징 벡터 처리)

  • Park, Jeong-Sik;Oh, Yung-Hwan
    • Phonetics and Speech Sciences
    • /
    • v.2 no.1
    • /
    • pp.77-85
    • /
    • 2010
  • This paper proposes an efficient feature vector processing technique to guard the Speech Emotion Recognition (SER) system against a variety of noises. In the proposed approach, emotional feature vectors are extracted from speech processed by comb filtering. Then, these extracts are used in a robust model construction based on feature vector classification. We modify conventional comb filtering by using speech presence probability to minimize drawbacks due to incorrect pitch estimation under background noise conditions. The modified comb filtering can correctly enhance the harmonics, which is an important factor used in SER. Feature vector classification technique categorizes feature vectors into either discriminative vectors or non-discriminative vectors based on a log-likelihood criterion. This method can successfully select the discriminative vectors while preserving correct emotional characteristics. Thus, robust emotion models can be constructed by only using such discriminative vectors. On SER experiment using an emotional speech corpus contaminated by various noises, our approach exhibited superior performance to the baseline system.

  • PDF

Emotion Recognition using Pitch Parameters of Speech (음성의 피치 파라메터를 사용한 감정 인식)

  • Lee, Guehyun;Kim, Weon-Goo
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.25 no.3
    • /
    • pp.272-278
    • /
    • 2015
  • This paper studied various parameter extraction methods using pitch information of speech for the development of the emotion recognition system. For this purpose, pitch parameters were extracted from korean speech database containing various emotions using stochastical information and numerical analysis techniques. GMM based emotion recognition system were used to compare the performance of pitch parameters. Sequential feature selection method were used to select the parameters showing the best emotion recognition performance. Experimental results of recognizing four emotions showed 63.5% recognition rate using the combination of 15 parameters out of 56 pitch parameters. Experimental results of detecting the presence of emotion showed 80.3% recognition rate using the combination of 14 parameters.

An Emotion Recognition Method using Facial Expression and Speech Signal (얼굴표정과 음성을 이용한 감정인식)

  • 고현주;이대종;전명근
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.6
    • /
    • pp.799-807
    • /
    • 2004
  • In this paper, we deal with an emotion recognition method using facial images and speech signal. Six basic human emotions including happiness, sadness, anger, surprise, fear and dislike are investigated. Emotion recognition using the facial expression is performed by using a multi-resolution analysis based on the discrete wavelet transform. And then, the feature vectors are extracted from the linear discriminant analysis method. On the other hand, the emotion recognition from speech signal method has a structure of performing the recognition algorithm independently for each wavelet subband and then the final recognition is obtained from a multi-decision making scheme.

Adaptive Speech Emotion Recognition Framework Using Prompted Labeling Technique (프롬프트 레이블링을 이용한 적응형 음성기반 감정인식 프레임워크)

  • Bang, Jae Hun;Lee, Sungyoung
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.2
    • /
    • pp.160-165
    • /
    • 2015
  • Traditional speech emotion recognition techniques recognize emotions using a general training model based on the voices of various people. These techniques can not consider personalized speech character exactly. Therefore, the recognized results are very different to each person. This paper proposes an adaptive speech emotion recognition framework made from user's' immediate feedback data using a prompted labeling technique for building a personal adaptive recognition model and applying it to each user in a mobile device environment. The proposed framework can recognize emotions from the building of a personalized recognition model. The proposed framework was evaluated to be better than the traditional research techniques from three comparative experiment. The proposed framework can be applied to healthcare, emotion monitoring and personalized service.

Spontaneous Speech Emotion Recognition Based On Spectrogram With Convolutional Neural Network (CNN 기반 스펙트로그램을 이용한 자유발화 음성감정인식)

  • Guiyoung Son;Soonil Kwon
    • The Transactions of the Korea Information Processing Society
    • /
    • v.13 no.6
    • /
    • pp.284-290
    • /
    • 2024
  • Speech emotion recognition (SER) is a technique that is used to analyze the speaker's voice patterns, including vibration, intensity, and tone, to determine their emotional state. There has been an increase in interest in artificial intelligence (AI) techniques, which are now widely used in medicine, education, industry, and the military. Nevertheless, existing researchers have attained impressive results by utilizing acted-out speech from skilled actors in a controlled environment for various scenarios. In particular, there is a mismatch between acted and spontaneous speech since acted speech includes more explicit emotional expressions than spontaneous speech. For this reason, spontaneous speech-emotion recognition remains a challenging task. This paper aims to conduct emotion recognition and improve performance using spontaneous speech data. To this end, we implement deep learning-based speech emotion recognition using the VGG (Visual Geometry Group) after converting 1-dimensional audio signals into a 2-dimensional spectrogram image. The experimental evaluations are performed on the Korean spontaneous emotional speech database from AI-Hub, consisting of 7 emotions, i.e., joy, love, anger, fear, sadness, surprise, and neutral. As a result, we achieved an average accuracy of 83.5% and 73.0% for adults and young people using a time-frequency 2-dimension spectrogram, respectively. In conclusion, our findings demonstrated that the suggested framework outperformed current state-of-the-art techniques for spontaneous speech and showed a promising performance despite the difficulty in quantifying spontaneous speech emotional expression.