Search | Korea Science

Spontaneous Speech Emotion Recognition Based On Spectrogram With Convolutional Neural Network (CNN 기반 스펙트로그램을 이용한 자유발화 음성감정인식)

Guiyoung Son;Soonil Kwon
- The Transactions of the Korea Information Processing Society
- /
- v.13 no.6
- /
- pp.284-290
- /
- 2024
Speech emotion recognition (SER) is a technique that is used to analyze the speaker's voice patterns, including vibration, intensity, and tone, to determine their emotional state. There has been an increase in interest in artificial intelligence (AI) techniques, which are now widely used in medicine, education, industry, and the military. Nevertheless, existing researchers have attained impressive results by utilizing acted-out speech from skilled actors in a controlled environment for various scenarios. In particular, there is a mismatch between acted and spontaneous speech since acted speech includes more explicit emotional expressions than spontaneous speech. For this reason, spontaneous speech-emotion recognition remains a challenging task. This paper aims to conduct emotion recognition and improve performance using spontaneous speech data. To this end, we implement deep learning-based speech emotion recognition using the VGG (Visual Geometry Group) after converting 1-dimensional audio signals into a 2-dimensional spectrogram image. The experimental evaluations are performed on the Korean spontaneous emotional speech database from AI-Hub, consisting of 7 emotions, i.e., joy, love, anger, fear, sadness, surprise, and neutral. As a result, we achieved an average accuracy of 83.5% and 73.0% for adults and young people using a time-frequency 2-dimension spectrogram, respectively. In conclusion, our findings demonstrated that the suggested framework outperformed current state-of-the-art techniques for spontaneous speech and showed a promising performance despite the difficulty in quantifying spontaneous speech emotional expression.
https://doi.org/10.3745/TKIPS.2024.13.6.284 인용 PDF

Robust Speech Parameters for the Emotional Speech Recognition (감정 음성 인식을 위한 강인한 음성 파라메터)

Lee, Guehyun;Kim, Weon-Goo
- Journal of the Korean Institute of Intelligent Systems
- /
- v.22 no.6
- /
- pp.681-686
- /
- 2012
This paper studied the speech parameters less affected by the human emotion for the development of the robust emotional speech recognition system. For this purpose, the effect of emotion on the speech recognition system and robust speech parameters of speech recognition system were studied using speech database containing various emotions. In this study, mel-cepstral coefficient, delta-cepstral coefficient, RASTA mel-cepstral coefficient, root-cepstral coefficient, PLP coefficient and frequency warped mel-cepstral coefficient in the vocal tract length normalization method were used as feature parameters. And CMS (Cepstral Mean Subtraction) and SBR(Signal Bias Removal) method were used as a signal bias removal technique. Experimental results showed that the HMM based speaker independent word recognizer using frequency warped RASTA mel-cepstral coefficient in the vocal tract length normalized method, its derivatives and CMS as a signal bias removal showed the best performance.
https://doi.org/10.5391/JKIIS.2012.22.6.681 인용 PDF KSCI

A Study on Robust Speech Emotion Feature Extraction Under the Mobile Communication Environment (이동통신 환경에서 강인한 음성 감성특징 추출에 대한 연구)

Cho Youn-Ho;Park Kyu-Sik
- The Journal of the Acoustical Society of Korea
- /
- v.25 no.6
- /
- pp.269-276
- /
- 2006
In this paper, we propose an emotion recognition system that can discriminate human emotional state into neutral or anger from the speech captured by a cellular-phone in real time. In general. the speech through the mobile network contains environment noise and network noise, thus it can causes serious System performance degradation due to the distortion in emotional features of the query speech. In order to minimize the effect of these noise and so improve the system performance, we adopt a simple MA (Moving Average) filter which has relatively simple structure and low computational complexity, to alleviate the distortion in the emotional feature vector. Then a SFS (Sequential Forward Selection) feature optimization method is implemented to further improve and stabilize the system performance. Two pattern recognition method such as k-NN and SVM is compared for emotional state classification. The experimental results indicate that the proposed method provides very stable and successful emotional classification performance such as 86.5%. so that it will be very useful in application areas such as customer call-center.
https://doi.org/10.7776/ASK.2006.25.6.269 인용 PDF KSCI

A Convergence Study on the Relationship between Empathy Abilities and Job Satisfaction of Speech and Language Rehabilitation in Daegu and Gyeongbuk (대구·경북지역 언어재활사의 공감능력과 직무만족도 간의 관련성에 대한 융합 연구)

Kim, Sun-Hee
- Journal of the Korea Convergence Society
- /
- v.12 no.6
- /
- pp.57-63
- /
- 2021
The purpose of the convergence study is to provide basic data for improving job satisfaction of speech and language therapists by identifying the relationship between personal characteristics, empathy ability, and job satisfaction of speech and language therapists, especially between cognitive and emotional empathy ability and job satisfaction. The subjects of the study were 111 people of speech and language therapists working in Daegu and Gyeongbuk. For data analysis, t-test, ANOVA, and correlation analysis were performed using the SPSS/PC 21.0 statistical program according to the purpose of the study. As a result of the study, the relationship between empathy ability and job satisfaction was found to be high, and there was a high correlation between cognitive and emotional empathy ability, which is a sub-factor of empathy ability, and job satisfaction. Therefore, follow-up research is needed to identify the empathy and job satisfaction about nationwide speech and language therapists for implementing various education programs and improving empathy skills in the future.
https://doi.org/10.15207/JKCS.2021.12.6.057 인용 PDF KSCI

Emotion Recognition Method from Speech Signal Using the Wavelet Transform (웨이블렛 변환을 이용한 음성에서의 감정 추출 및 인식 기법)

Go, Hyoun-Joo;Lee, Dae-Jong;Park, Jang-Hwan;Chun, Myung-Geun
- Journal of the Korean Institute of Intelligent Systems
- /
- v.14 no.2
- /
- pp.150-155
- /
- 2004
In this paper, an emotion recognition method using speech signal is presented. Six basic human emotions including happiness, sadness, anger, surprise, fear and dislike are investigated. The proposed recognizer have each codebook constructed by using the wavelet transform for the emotional state. Here, we first verify the emotional state at each filterbank and then the final recognition is obtained from a multi-decision method scheme. The database consists of 360 emotional utterances from twenty person who talk a sentence three times for six emotional states. The proposed method showed more 5% improvement of the recognition rate than previous works.
https://doi.org/10.5391/JKIIS.2004.14.2.150 인용 PDF KSCI

Emotion Recognition and Expression Method using Bi-Modal Sensor Fusion Algorithm (다중 센서 융합 알고리즘을 이용한 감정인식 및 표현기법)

Joo, Jong-Tae;Jang, In-Hun;Yang, Hyun-Chang;Sim, Kwee-Bo
- Journal of Institute of Control, Robotics and Systems
- /
- v.13 no.8
- /
- pp.754-759
- /
- 2007
In this paper, we proposed the Bi-Modal Sensor Fusion Algorithm which is the emotional recognition method that be able to classify 4 emotions (Happy, Sad, Angry, Surprise) by using facial image and speech signal together. We extract the feature vectors from speech signal using acoustic feature without language feature and classify emotional pattern using Neural-Network. We also make the feature selection of mouth, eyes and eyebrows from facial image. and extracted feature vectors that apply to Principal Component Analysis(PCA) remakes low dimension feature vector. So we proposed method to fused into result value of emotion recognition by using facial image and speech.
https://doi.org/10.5302/J.ICROS.2007.13.8.754 인용 PDF KSCI

Analysis of Indirect Uses of Interrogative Sentences Carrying Anger

Min, Hye-Jin;Park, Jong-C.
- Proceedings of the Korean Society for Language and Information Conference
- /
- 2007.11a
- /
- pp.311-320
- /
- 2007
Interrogative sentences are generally used to perform speech acts of directly asking a question or making a request, but they are also used to convey such speech acts indirectly. In the utterances, such indirect uses of interrogative sentences usually carry speaker's emotion with a negative attitude, which is close to an expression of anger. The identification of such negative emotion is known as a difficult problem that requires relevant information in syntax, semantics, discourse, pragmatics, and speech signals. In this paper, we argue that the interrogatives used for indirect speech acts could serve as a dominant marker for identifying the emotional attitudes, such as anger, as compared to other emotion-related markers, such as discourse markers, adverbial words, and syntactic markers. To support such an argument, we analyze the dialogues collected from the Korean soap operas, and examine individual or cooperative influences of the emotion-related markers on emotional realization. The user study shows that the interrogatives could be utilized as a promising device for emotion identification.
PDF

Acoustical Analysis of Emotional Speech based on the Declinations of F0 Contours from Utterance Final IP Boundary Tones (발화 말 억양구 경계 성조의 기울기 정보를 이용한 음성의 감정 정보 분류)

Park, Mi-Kyoung
- Proceedings of the KSPS conference
- /
- 2007.05a
- /
- pp.298-301
- /
- 2007
PDF

Analyzing the element of emotion recognition from speech (음성으로부터 감성인식 요소분석)

심귀보;박창현
- Journal of the Korean Institute of Intelligent Systems
- /
- v.11 no.6
- /
- pp.510-515
- /
- 2001
Generally, there are (1)Words for conversation (2)Tone (3)Pitch (4)Formant frequency (5)Speech speed, etc as the element for emotional recognition from speech signal. For human being, it is natural that the tone, vice quality, speed words are easier elements rather than frequency to perceive other s feeling. Therefore, the former things are important elements fro classifying feelings. And, previous methods have mainly used the former thins but using formant is good for implementing as machine. Thus. our final goal of this research is to implement an emotional recognition system based on pitch, formant, speech speed, etc. from speech signal. In this paper, as first stage we foun specific features of feeling angry from his words when a man got angry.
PDF

Study of Emotion in Speech (감정변화에 따른 음성정보 분석에 관한 연구)

장인창;박미경;김태수;박면웅
- Proceedings of the Korean Society of Precision Engineering Conference
- /
- 2004.10a
- /
- pp.1123-1126
- /
- 2004
Recognizing emotion in speech is required lots of spoken language corpus not only at the different emotional statues, but also in individual languages. In this paper, we focused on the changes speech signals in different emotions. We compared the features of speech information like formant and pitch according to the 4 emotions (normal, happiness, sadness, anger). In Korean, pitch data on monophthongs changed in each emotion. Therefore we suggested the suitable analysis techniques using these features to recognize emotions in Korean.
PDF

Search Result 186, Processing Time 0.03 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)