• Title/Summary/Keyword: 음성감정인식

Search Result 142, Processing Time 0.025 seconds

A Study on Emotion Recognition of Chunk-Based Time Series Speech (청크 기반 시계열 음성의 감정 인식 연구)

  • Hyun-Sam Shin;Jun-Ki Hong;Sung-Chan Hong
    • Journal of Internet Computing and Services
    • /
    • v.24 no.2
    • /
    • pp.11-18
    • /
    • 2023
  • Recently, in the field of Speech Emotion Recognition (SER), many studies have been conducted to improve accuracy using voice features and modeling. In addition to modeling studies to improve the accuracy of existing voice emotion recognition, various studies using voice features are being conducted. This paper, voice files are separated by time interval in a time series method, focusing on the fact that voice emotions are related to time flow. After voice file separation, we propose a model for classifying emotions of speech data by extracting speech features Mel, Chroma, zero-crossing rate (ZCR), root mean square (RMS), and mel-frequency cepstrum coefficients (MFCC) and applying them to a recurrent neural network model used for sequential data processing. As proposed method, voice features were extracted from all files using 'librosa' library and applied to neural network models. The experimental method compared and analyzed the performance of models of recurrent neural network (RNN), long short-term memory (LSTM) and gated recurrent unit (GRU) using the Interactive emotional dyadic motion capture Interactive Emotional Dyadic Motion Capture (IEMOCAP) english dataset.

Toward More Reliable Emotion Recognition of Vocal Sentences by Emphasizing Information of Korean Ending Boundary Tones (한국어 문미억양 강조를 통한 향상된 음성문장 감정인식)

  • Lee Tae-Seung;Park Mikyong;Kim Tae-Soo
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2005.07b
    • /
    • pp.514-516
    • /
    • 2005
  • 인간을 상대하는 자율장치는 고객의 자발적인 협조를 얻기 위해 암시적인 신호에 포함된 감정과 태도를 인지할 수 있어야 한다. 인간에게 음성은 가장 쉽고 자연스럽게 정보를 교환할 수 있는 수단이다. 지금까지 감정과 태도를 이해할 수 있는 자동시스템은 발성문장의 피치와 에너지에 기반한 특징을 활용하였다. 이와 같은 기존의 감정인식 시스템의 성능은 문장의 특정한 억양구간이 감정과 태도와 관련을 갖는다는 언어학적 지식의 활용으로 보다 높은 향상이 가능하다. 본 논문에서는 한국어 문미억양에 대한 언어학적 지식을 피치기반 특징과 다층신경망을 활용하여 구현한 자동시스템에 적용하여 감정인식률을 향상시킨다. 한국어 감정음성 데이터베이스를 대상으로 실험을 실시한 결과 $4\%$의 인식률 향상을 확인하였다.

  • PDF

On the Importance of Tonal Features for Speech Emotion Recognition (음성 감정인식에서의 톤 정보의 중요성 연구)

  • Lee, Jung-In;Kang, Hong-Goo
    • Journal of Broadcast Engineering
    • /
    • v.18 no.5
    • /
    • pp.713-721
    • /
    • 2013
  • This paper describes an efficiency of chroma based tonal features for speech emotion recognition. As the tonality caused by major or minor keys affects to the perception of musical mood, so the speech tonality affects the perception of the emotional states of spoken utterances. In order to justify this assertion with respect to tonality and emotion, subjective hearing tests are carried out by using synthesized signals generated from chroma features, and consequently show that the tonality contributes especially to the perception of the negative emotion such as anger and sad. In automatic emotion recognition tests, the modified chroma-based tonal features are shown to produce noticeable improvement of accuracy when they are supplemented to the conventional log-frequency power coefficient (LFPC)-based spectral features.

Enhancing Multimodal Emotion Recognition in Speech and Text with Integrated CNN, LSTM, and BERT Models (통합 CNN, LSTM, 및 BERT 모델 기반의 음성 및 텍스트 다중 모달 감정 인식 연구)

  • Edward Dwijayanto Cahyadi;Hans Nathaniel Hadi Soesilo;Mi-Hwa Song
    • The Journal of the Convergence on Culture Technology
    • /
    • v.10 no.1
    • /
    • pp.617-623
    • /
    • 2024
  • Identifying emotions through speech poses a significant challenge due to the complex relationship between language and emotions. Our paper aims to take on this challenge by employing feature engineering to identify emotions in speech through a multimodal classification task involving both speech and text data. We evaluated two classifiers-Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM)-both integrated with a BERT-based pre-trained model. Our assessment covers various performance metrics (accuracy, F-score, precision, and recall) across different experimental setups). The findings highlight the impressive proficiency of two models in accurately discerning emotions from both text and speech data.

GMM-based Emotion Recognition Using Speech Signal (음성 신호를 사용한 GMM기반의 감정 인식)

  • 서정태;김원구;강면구
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.3
    • /
    • pp.235-241
    • /
    • 2004
  • This paper studied the pattern recognition algorithm and feature parameters for speaker and context independent emotion recognition. In this paper, KNN algorithm was used as the pattern matching technique for comparison, and also VQ and GMM were used for speaker and context independent recognition. The speech parameters used as the feature are pitch. energy, MFCC and their first and second derivatives. Experimental results showed that emotion recognizer using MFCC and its derivatives showed better performance than that using the pitch and energy parameters. For pattern recognition algorithm. GMM-based emotion recognizer was superior to KNN and VQ-based recognizer.

Comparison of feature parameters for emotion recognition using speech signal (음성 신호를 사용한 감정인식의 특징 파라메터 비교)

  • 김원구
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.40 no.5
    • /
    • pp.371-377
    • /
    • 2003
  • In this paper, comparison of feature parameters for emotion recognition using speech signal is studied. For this purpose, a corpus of emotional speech data recorded and classified according to the emotion using the subjective evaluation were used to make statical feature vectors such as average, standard deviation and maximum value of pitch and energy and phonetic feature such as MFCC parameters. In order to evaluate the performance of feature parameters speaker and context independent emotion recognition system was constructed to make experiment. In the experiments, pitch, energy parameters and their derivatives were used as a prosodic information and MFCC parameters and its derivative were used as phonetic information. Experimental results using vector quantization based emotion recognition system showed that recognition system using MFCC parameter and its derivative showed better performance than that using the pitch and energy parameters.

A Study on Dog-emotion judgment method Based on Deep Learning (딥러닝 기반의 반려견 감정 판단 기법에 관한 연구)

  • Kim, Mingu;Kim, Seha;Go, Yujeong;Lee, Hyunseo;Park, Joonho
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2022.07a
    • /
    • pp.449-450
    • /
    • 2022
  • 반려견의 행동인식기술은 다양한 센서들에서 입력되는 반려견의 동작과 관련된 정보를 분석하고 해석하여 반려견이 어떤 행동을 취하고 있는지를 인식하는 기술이다. 음성인식기술은 컴퓨터가 청각 자료를 수집, 분석하여 훈련된 데이터와 비교를 통해 소리를 분류하는 기술이다. 본 논문에서는 딥러닝을 기반으로 행동인식기술과 음성인식기술을 적용하여 반려견의 감정을 판단하는 기법을 제안한다. 이러한 기법은 반려견의 감정을 쉽게 파악하여 반려견 보호자가 반려견의 행동과 감정에 대한 이해를 쉽고 빠르게 할 수 있으므로, 보호자에게 즐거운 반려 생활이 가능하도록 도움을 줄 수 있다.

  • PDF

Robust Speech Parameters for the Emotional Speech Recognition (감정 음성 인식을 위한 강인한 음성 파라메터)

  • Lee, Guehyun;Kim, Weon-Goo
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.22 no.6
    • /
    • pp.681-686
    • /
    • 2012
  • This paper studied the speech parameters less affected by the human emotion for the development of the robust emotional speech recognition system. For this purpose, the effect of emotion on the speech recognition system and robust speech parameters of speech recognition system were studied using speech database containing various emotions. In this study, mel-cepstral coefficient, delta-cepstral coefficient, RASTA mel-cepstral coefficient, root-cepstral coefficient, PLP coefficient and frequency warped mel-cepstral coefficient in the vocal tract length normalization method were used as feature parameters. And CMS (Cepstral Mean Subtraction) and SBR(Signal Bias Removal) method were used as a signal bias removal technique. Experimental results showed that the HMM based speaker independent word recognizer using frequency warped RASTA mel-cepstral coefficient in the vocal tract length normalized method, its derivatives and CMS as a signal bias removal showed the best performance.

Speech Emotion Recognition Using Confidence Level for Emotional Interaction Robot (감정 상호작용 로봇을 위한 신뢰도 평가를 이용한 화자독립 감정인식)

  • Kim, Eun-Ho
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.19 no.6
    • /
    • pp.755-759
    • /
    • 2009
  • The ability to recognize human emotion is one of the hallmarks of human-robot interaction. Especially, speaker-independent emotion recognition is a challenging issue for commercial use of speech emotion recognition systems. In general, speaker-independent systems show a lower accuracy rate compared with speaker-dependent systems, as emotional feature values depend on the speaker and his/her gender. Hence, this paper describes the realization of speaker-independent emotion recognition by rejection using confidence measure to make the emotion recognition system be homogeneous and accurate. From comparison of the proposed methods with conventional method, the improvement and effectiveness of proposed methods were clearly confirmed.

Generating Speech feature vectors for Effective Emotional Recognition (효과적인 감정인식을 위한 음성 특징 벡터 생성)

  • Sim, In-woo;Han, Eui Hwan;Cha, Hyung Tai
    • Annual Conference of KIPS
    • /
    • 2019.05a
    • /
    • pp.687-690
    • /
    • 2019
  • 본 논문에서는 효과적인 감정인식을 위한 효과적인 특징 벡터를 생성한다. 이를 위해서 음성 데이터 셋 RAVDESS를 이용하였으며, 그 중 neutral, calm, happy, sad 총 4가지 감정을 나타내는 음성 신호를 사용하였다. 본 논문에서는 기존에 감정인식에 사용되는 MFCC1~13차 계수와 pitch, ZCR, peakenergy 중에서 효과적인 특징을 추출하기 위해 클래스 간, 클래스 내 분산의 비를 이용하였다. 실험결과 감정인식에 사용되는 특징 벡터들 중 peakenergy, pitch, MFCC2, MFCC3, MFCC4, MFCC12, MFCC13이 효과적임을 확인하였다.