• Title/Summary/Keyword: Speech emotion recognition

Search Result 135, Processing Time 0.025 seconds

A Study on Robust Emotion Classification Structure Between Heterogeneous Speech Databases (이종 음성 DB 환경에 강인한 감성 분류 체계에 대한 연구)

  • Yoon, Won-Jung;Park, Kyu-Sik
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.5
    • /
    • pp.477-482
    • /
    • 2009
  • The emotion recognition system in commercial environments such as call-center undergoes severe system performance degradation and instability due to the speech characteristic differences between the system training database and the input speech of unspecified customers. In order to alleviate these problems, this paper extends traditional method of emotion recognition of neutral/anger into two-step hierarchical structure by using emotional characteristic changes and differences of male and female. The experimental results indicate that the proposed method provides very stable and successful emotional classification performance about 25% over the traditional method of emotion recognition.

Design of Intelligent Emotion Recognition Model

  • Kim, Yi-gon
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.11 no.7
    • /
    • pp.611-614
    • /
    • 2001
  • Voice is one of the most efficient communication media and it includes several kinds of factors about speaker, context emotion and so on. Human emotion is expressed is expressed in the speech, the gesture, the physiological phenomena(the breath, the beating of the pulse, etc). In this paper, the emotion recognition method model using neuro-fuzzy in order to have cognizance of emotion from voice signal is presented and simulated.

  • PDF

Speaker and Context Independent Emotion Recognition System using Gaussian Mixture Model (GMM을 이용한 화자 및 문장 독립적 감정 인식 시스템 구현)

  • 강면구;김원구
    • Proceedings of the IEEK Conference
    • /
    • 2003.07e
    • /
    • pp.2463-2466
    • /
    • 2003
  • This paper studied the pattern recognition algorithm and feature parameters for emotion recognition. In this paper, KNN algorithm was used as the pattern matching technique for comparison, and also VQ and GMM were used lot speaker and context independent recognition. The speech parameters used as the feature are pitch, energy, MFCC and their first and second derivatives. Experimental results showed that emotion recognizer using MFCC and their derivatives as a feature showed better performance than that using the Pitch and energy Parameters. For pattern recognition algorithm, GMM based emotion recognizer was superior to KNN and VQ based recognizer

  • PDF

Interactive Feature selection Algorithm for Emotion recognition (감정 인식을 위한 Interactive Feature Selection(IFS) 알고리즘)

  • Yang, Hyun-Chang;Kim, Ho-Duck;Park, Chang-Hyun;Sim, Kwee-Bo
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.16 no.6
    • /
    • pp.647-652
    • /
    • 2006
  • This paper presents the novel feature selection method for Emotion Recognition, which may include a lot of original features. Specially, the emotion recognition in this paper treated speech signal with emotion. The feature selection has some benefits on the pattern recognition performance and 'the curse of dimension'. Thus, We implemented a simulator called 'IFS' and those result was applied to a emotion recognition system(ERS), which was also implemented for this research. Our novel feature selection method was basically affected by Reinforcement Learning and since it needs responses from human user, it is called 'Interactive Feature Selection'. From performing the IFS, we could get 3 best features and applied to ERS. Comparing those results with randomly selected feature set, The 3 best features were better than the randomly selected feature set.

Speech Emotion Recognition in People at High Risk of Dementia

  • Dongseon Kim;Bongwon Yi;Yugwon Won
    • Dementia and Neurocognitive Disorders
    • /
    • v.23 no.3
    • /
    • pp.146-160
    • /
    • 2024
  • Background and Purpose: The emotions of people at various stages of dementia need to be effectively utilized for prevention, early intervention, and care planning. With technology available for understanding and addressing the emotional needs of people, this study aims to develop speech emotion recognition (SER) technology to classify emotions for people at high risk of dementia. Methods: Speech samples from people at high risk of dementia were categorized into distinct emotions via human auditory assessment, the outcomes of which were annotated for guided deep-learning method. The architecture incorporated convolutional neural network, long short-term memory, attention layers, and Wav2Vec2, a novel feature extractor to develop automated speech-emotion recognition. Results: Twenty-seven kinds of Emotions were found in the speech of the participants. These emotions were grouped into 6 detailed emotions: happiness, interest, sadness, frustration, anger, and neutrality, and further into 3 basic emotions: positive, negative, and neutral. To improve algorithmic performance, multiple learning approaches were applied using different data sources-voice and text-and varying the number of emotions. Ultimately, a 2-stage algorithm-initial text-based classification followed by voice-based analysis-achieved the highest accuracy, reaching 70%. Conclusions: The diverse emotions identified in this study were attributed to the characteristics of the participants and the method of data collection. The speech of people at high risk of dementia to companion robots also explains the relatively low performance of the SER algorithm. Accordingly, this study suggests the systematic and comprehensive construction of a dataset from people with dementia.

Enhancing Multimodal Emotion Recognition in Speech and Text with Integrated CNN, LSTM, and BERT Models (통합 CNN, LSTM, 및 BERT 모델 기반의 음성 및 텍스트 다중 모달 감정 인식 연구)

  • Edward Dwijayanto Cahyadi;Hans Nathaniel Hadi Soesilo;Mi-Hwa Song
    • The Journal of the Convergence on Culture Technology
    • /
    • v.10 no.1
    • /
    • pp.617-623
    • /
    • 2024
  • Identifying emotions through speech poses a significant challenge due to the complex relationship between language and emotions. Our paper aims to take on this challenge by employing feature engineering to identify emotions in speech through a multimodal classification task involving both speech and text data. We evaluated two classifiers-Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM)-both integrated with a BERT-based pre-trained model. Our assessment covers various performance metrics (accuracy, F-score, precision, and recall) across different experimental setups). The findings highlight the impressive proficiency of two models in accurately discerning emotions from both text and speech data.

A Study on The Improvement of Emotion Recognition by Gender Discrimination (성별 구분을 통한 음성 감성인식 성능 향상에 대한 연구)

  • Cho, Youn-Ho;Park, Kyu-Sik
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.45 no.4
    • /
    • pp.107-114
    • /
    • 2008
  • In this paper, we constructed a speech emotion recognition system that classifies four emotions - neutral, happy, sad, and anger from speech based on male/female gender discrimination. At first, the proposed system distinguish between male and female from a queried speech, then the system performance can be improved by using separate optimized feature vectors for each gender for the emotion classification. As a emotion feature vector, this paper adopts ZCPA(Zero Crossings with Peak Amplitudes) which is well known for its noise-robustic characteristic from the speech recognition area and the features are optimized using SFS method. For a pattern classification of emotion, k-NN and SVM classifiers are compared experimentally. From the computer simulation results, the proposed system was proven to be highly efficient for speech emotion classification about 85.3% regarding four emotion states. This might promise the use the proposed system in various applications such as call-center, humanoid robots, ubiquitous, and etc.

Emotion Recognition Based on Frequency Analysis of Speech Signal

  • Sim, Kwee-Bo;Park, Chang-Hyun;Lee, Dong-Wook;Joo, Young-Hoon
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.2 no.2
    • /
    • pp.122-126
    • /
    • 2002
  • In this study, we find features of 3 emotions (Happiness, Angry, Surprise) as the fundamental research of emotion recognition. Speech signal with emotion has several elements. That is, voice quality, pitch, formant, speech speed, etc. Until now, most researchers have used the change of pitch or Short-time average power envelope or Mel based speech power coefficients. Of course, pitch is very efficient and informative feature. Thus we used it in this study. As pitch is very sensitive to a delicate emotion, it changes easily whenever a man is at different emotional state. Therefore, we can find the pitch is changed steeply or changed with gentle slope or not changed. And, this paper extracts formant features from speech signal with emotion. Each vowels show that each formant has similar position without big difference. Based on this fact, in the pleasure case, we extract features of laughter. And, with that, we separate laughing for easy work. Also, we find those far the angry and surprise.

Automatic Human Emotion Recognition from Speech and Face Display - A New Approach (인간의 언어와 얼굴 표정에 통하여 자동적으로 감정 인식 시스템 새로운 접근법)

  • Luong, Dinh Dong;Lee, Young-Koo;Lee, Sung-Young
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2011.06b
    • /
    • pp.231-234
    • /
    • 2011
  • Audiovisual-based human emotion recognition can be considered a good approach for multimodal humancomputer interaction. However, the optimal multimodal information fusion remains challenges. In order to overcome the limitations and bring robustness to the interface, we propose a framework of automatic human emotion recognition system from speech and face display. In this paper, we develop a new approach for fusing information in model-level based on the relationship between speech and face expression to detect automatic temporal segments and perform multimodal information fusion.

RECOGNIZING SIX EMOTIONAL STATES USING SPEECH SIGNALS

  • Kang, Bong-Seok;Han, Chul-Hee;Youn, Dae-Hee;Lee, Chungyong
    • Proceedings of the Korean Society for Emotion and Sensibility Conference
    • /
    • 2000.04a
    • /
    • pp.366-369
    • /
    • 2000
  • This paper examines three algorithms to recognize speaker's emotion using the speech signals. Target emotions are happiness, sadness, anger, fear, boredom and neutral state. MLB(Maximum-Likeligood Bayes), NN(Nearest Neighbor) and HMM (Hidden Markov Model) algorithms are used as the pattern matching techniques. In all cases, pitch and energy are used as the features. The feature vectors for MLB and NN are composed of pitch mean, pitch standard deviation, energy mean, energy standard deviation, etc. For HMM, vectors of delta pitch with delta-delta pitch and delta energy with delta-delta energy are used. We recorded a corpus of emotional speech data and performed the subjective evaluation for the data. The subjective recognition result was 56% and was compared with the classifiers' recognition rates. MLB, NN, and HMM classifiers achieved recognition rates of 68.9%, 69.3% and 89.1% respectively, for the speaker dependent, and context-independent classification.

  • PDF