DOI QR코드

DOI QR Code

Deep Learning-Based Speech Emotion Recognition Technology Using Voice Feature Filters

음성 특징 필터를 이용한 딥러닝 기반 음성 감정 인식 기술

  • Shin Hyun Sam ;
  • Jun-Ki Hong
  • 신현삼 (한신대학교 정보통신학과) ;
  • 홍준기 (국립공주대학교 스마트정보기술공학과)
  • Received : 2023.11.27
  • Accepted : 2023.12.22
  • Published : 2023.12.31

Abstract

In this study, we propose a model that extracts and analyzes features from deep learning-based speech signals, generates filters, and utilizes these filters to recognize emotions in speech signals. We evaluate the performance of emotion recognition accuracy using the proposed model. According to the simulation results using the proposed model, the average emotion recognition accuracy of DNN and RNN was very similar, at 84.59% and 84.52%, respectively. However, we observed that the simulation time for DNN was approximately 44.5% shorter than that of RNN, enabling quicker emotion prediction.

본 연구에선 딥러닝 기반 음성 신호로부터 음성의 특징을 추출하고 분석하여 필터를 생성하고, 생성된 필터를 이용하여 음성 신호로부터 감정을 인식하는 모델을 제안하고 감정 인식 정확도 성능을 평가하였다. 제안한 모델을 사용한 시뮬레이션 결과에 따르면, DNN (Deep Neural Network)과 RNN (Recurrent Neural Network)의 평균 감정인식 정확도는 각각 84.59%와 84.52%으로 매우 비슷한 성능을 나타냈다. 하지만 DNN의 시뮬레이션 소요 시간은 RNN보다 약 44.5% 짧은 시뮬레이션 시간으로 감정을 예측할 수 있는 것을 확인하였다.

Keywords

References

  1. Rus s ell, J. A. "A Circumplex Model of Affect Journal of Personality and Social Psychology 39.", 161-178, 1980.D. Bird, "Direct Marketing Is as Relevant Now as It Was in 1900," Marketing, pp. 28, 2000.
  2. 김남수(2009), "감정인식 기술의 현황과 전망," Telecommunications Review 19권 5호, 에스케이텔레콤.
  3. S. Mirsamadi, E. Barsoum, and C. Zhang, "Automatic Speech Emotion Recognition using Recurrent Neural Networks with Local Attention," presented at the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2017, pp. 2227-2231.
  4. J. Han, Z. Zhang, F. Ringeval, and B. Schuller, "Reconstruction-error-based Learning for Continuous Emotion Recognition in Speech," presented at the 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, 2017, pp. 2367-2371.
  5. H. M. Fayek, M. Lech, and L. Cavedon, "Towards Real-time Speech Emotion Recognition using Deep Neural Networks," presented at the 2015 9th International Conference on Signal Processing and Communication Systems (ICSPCS), IEEE, pp. 1-5, 2015.
  6. J. Wang, M. Xue, R. Culhane, E. Diao, J. Ding, and V. Tarokh, "Speech Emotion Recognition with Dual-Sequence LSTM Architecture," presented at the ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, pp. 6474-6478, 2020.
  7. Z. Zhu, W. Dai, Y. Hu, and J. Li, "Speech Emotion Recognition Model based on Bi-GRU and Focal Loss," Pattern Recognit. Lett., vol. 140, pp. 358-365, 2020.
  8. R. A. Khalil, E. Jones, M. I. Babar, T. Jan, M. H. Zafar and T. Alhussain, "Speech Emotion Recognition Using Deep Learning Techniques: A Review," in IEEE Access, vol. 7, pp. 117327-117345, 2019.
  9. "Recurrent Neural Networks," Stanford University. Accessed 4 Oct. 23. https://stanford. edu/-shervine/teaching/cs-230/cheatsheetrecurrent-neural-networks.
  10. M. Sahidullah and G. Saha, "Design, Analysis and Experimental Evaluation of Block based Transformation in MFCC Computation for Speaker Recognition," Speech Communication, vol. 54, no. 4, pp. 543-565, May 2012.