• Title/Summary/Keyword: 음성감정인식

Search Result 142, Processing Time 0.031 seconds

A Study on the Performance of Music Retrieval Based on the Emotion Recognition (감정 인식을 통한 음악 검색 성능 분석)

  • Seo, Jin Soo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.34 no.3
    • /
    • pp.247-255
    • /
    • 2015
  • This paper presents a study on the performance of the music search based on the automatically recognized music-emotion labels. As in the other media data, such as speech, image, and video, a song can evoke certain emotions to the listeners. When people look for songs to listen, the emotions, evoked by songs, could be important points to consider. However; very little study has been done on the performance of the music-emotion labels to the music search. In this paper, we utilize the three axes of human music perception (valence, activity, tension) and the five basic emotion labels (happiness, sadness, tenderness, anger, fear) in measuring music similarity for music search. Experiments were conducted on both genre and singer datasets. The search accuracy of the proposed emotion-based music search was up to 75 % of that of the conventional feature-based music search. By combining the proposed emotion-based method with the feature-based method, we achieved up to 14 % improvement of search accuracy.

Context sentiment analysis based on Speech Tone (발화 음성을 기반으로 한 감정분석 시스템)

  • Jung, Jun-Hyeok;Park, Soo-Duck;Kim, Min-Seung;Park, So-Hyun;Han, Sang-Gon;Cho, Woo-Hyun
    • Annual Conference of KIPS
    • /
    • 2017.11a
    • /
    • pp.1037-1040
    • /
    • 2017
  • 현재 머신러닝과 딥러닝의 기술이 빠른 속도로 발전하면서 수많은 인공지능 음성 비서가 출시되고 있지만, 발화자의 문장 내 존재하는 단어만 분석하여 결과를 반환할 뿐, 비언어적 요소는 인식할 수 없기 때문에 결과의 구조적인 한계가 존재한다. 따라서 본 연구에서는 인간의 의사소통 내 존재하는 비언어적 요소인 말의 빠르기, 성조의 변화 등을 수치 데이터로 변환한 후, "플루칙의 감정 쳇바퀴"를 기초로 지도학습 시키고, 이후 입력되는 음성 데이터를 사전 기계학습 된 데이터를 기초로 kNN 알고리즘을 이용하여 분석한다.

An acoustic study of feeling between standard language and dialect (표준어와 방언간의 감정변화에 대한 음성적 연구)

  • Lee, Yeon-Soo;Park, Young-Beom
    • Annual Conference of KIPS
    • /
    • 2009.04a
    • /
    • pp.63-66
    • /
    • 2009
  • 사람의 감정 변화에는 크게 기쁨, 슬픔, 흥분, 보통 4가지 상태로 말할 수 있다. 이 4가지 상태에서 기쁨과 슬픔, 흥분과 기쁨 상태가 음성학적으로 비슷한 형태를 가지고 있다. 흥분과 기쁨의 상태에서 방언의 노말 상태가 표준어의 기쁨, 흥분상태와 비슷한 특징을 가지고 있다. 이와 같은 표준어와 방언 간의 특징 때문에 흥분 상태를 인지하는 경우 방언의 보통상태가 흥분상태로 잘못 인식되는 경우가 발생 한다. 본 논문에서는 이와 같은 문제점이 발생하는 음성학적인 차이를 구분 하고자 한다. 이들을 비교하기 위해 Pitch, Formant와 Formant RMS error 3가지 요소를 통하여 표준어와 방언간의 흥분 상태를 연구 하였다.

Development of Driver's Emotion and Attention Recognition System using Multi-modal Sensor Fusion Algorithm (다중 센서 융합 알고리즘을 이용한 운전자의 감정 및 주의력 인식 기술 개발)

  • Han, Cheol-Hun;Sim, Kwee-Bo
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.18 no.6
    • /
    • pp.754-761
    • /
    • 2008
  • As the automobile industry and technologies are developed, driver's tend to more concern about service matters than mechanical matters. For this reason, interests about recognition of human knowledge and emotion to make safe and convenient driving environment for driver are increasing more and more. recognition of human knowledge and emotion are emotion engineering technology which has been studied since the late 1980s to provide people with human-friendly services. Emotion engineering technology analyzes people's emotion through their faces, voices and gestures, so if we use this technology for automobile, we can supply drivels with various kinds of service for each driver's situation and help them drive safely. Furthermore, we can prevent accidents which are caused by careless driving or dozing off while driving by recognizing driver's gestures. the purpose of this paper is to develop a system which can recognize states of driver's emotion and attention for safe driving. First of all, we detect a signals of driver's emotion by using bio-motion signals, sleepiness and attention, and then we build several types of databases. by analyzing this databases, we find some special features about drivers' emotion, sleepiness and attention, and fuse the results through Multi-Modal method so that it is possible to develop the system.

Unraveling Emotions in Speech: Deep Neural Networks for Emotion Recognition (음성을 통한 감정 해석: 감정 인식을 위한 딥 뉴럴 네트워크 예비 연구)

  • Edward Dwijayanto Cahyadi;Mi-Hwa Song
    • Annual Conference of KIPS
    • /
    • 2023.05a
    • /
    • pp.411-412
    • /
    • 2023
  • Speech emotion recognition(SER) is one of the interesting topics in the machine learning field. By developing SER, we can get numerous benefits. By using a convolutional neural network and Long Short Term Memory (LSTM ) method as a part of Artificial intelligence, the SER system can be built.

Multi-Modal Emotion Recognition in Videos Based on Pre-Trained Models (사전학습 모델 기반 발화 동영상 멀티 모달 감정 인식)

  • Eun Hee Kim;Ju Hyun Shin
    • Smart Media Journal
    • /
    • v.13 no.10
    • /
    • pp.19-27
    • /
    • 2024
  • Recently, as the demand for non-face-to-face counseling has rapidly increased, the need for emotion recognition technology that combines various aspects such as text, voice, and facial expressions is being emphasized. In this paper, we address issues such as the dominance of non-Korean data and the imbalance of emotion labels in existing datasets like FER-2013, CK+, and AFEW by using Korean video data. We propose methods to enhance multimodal emotion recognition performance in videos by integrating the strengths of image modality with text modality. A pre-trained model is used to overcome the limitations caused by small training data. A GPT-4-based LLM model is applied to text, and a pre-trained model based on VGG-19 architecture is fine-tuned to facial expression images. The method of extracting representative emotions by combining the emotional results of each aspect extracted using a pre-trained model is as follows. Emotion information extracted from text was combined with facial expression changes in a video. If there was a sentiment mismatch between the text and the image, we applied a threshold that prioritized the text-based sentiment if it was deemed trustworthy. Additionally, as a result of adjusting representative emotions using emotion distribution information for each frame, performance was improved by 19% based on F1-Score compared to the existing method that used average emotion values for each frame.

Acoustic parameters for induced emotion categorizing and dimensional approach (자연스러운 정서 반응의 범주 및 차원 분류에 적합한 음성 파라미터)

  • Park, Ji-Eun;Park, Jeong-Sik;Sohn, Jin-Hun
    • Science of Emotion and Sensibility
    • /
    • v.16 no.1
    • /
    • pp.117-124
    • /
    • 2013
  • This study examined that how precisely MFCC, LPC, energy, and pitch related parameters of the speech data, which have been used mainly for voice recognition system could predict the vocal emotion categories as well as dimensions of vocal emotion. 110 college students participated in this experiment. For more realistic emotional response, we used well defined emotion-inducing stimuli. This study analyzed the relationship between the parameters of MFCC, LPC, energy, and pitch of the speech data and four emotional dimensions (valence, arousal, intensity, and potency). Because dimensional approach is more useful for realistic emotion classification. It results in the best vocal cue parameters for predicting each of dimensions by stepwise multiple regression analysis. Emotion categorizing accuracy analyzed by LDA is 62.7%, and four dimension regression models are statistically significant, p<.001. Consequently, this result showed the possibility that the parameters could also be applied to spontaneous vocal emotion recognition.

  • PDF

A Deep Learning System for Emotional Cat Sound Classification and Generation (감정별 고양이 소리 분류 및 생성 딥러닝 시스템)

  • Joo Yong Shim;SungKi Lim;Jong-Kook Kim
    • The Transactions of the Korea Information Processing Society
    • /
    • v.13 no.10
    • /
    • pp.492-496
    • /
    • 2024
  • Cats are known to express their emotions through a variety of vocalizations during interactions. These sounds reflect their emotional states, making the understanding and interpretation of these sounds crucial for more effective communication. Recent advancements in artificial intelligence has introduced research related to emotion recognition, particularly focusing on the analysis of voice data using deep learning models. Building on this background, the study aims to develop a deep learning system that classifies and generates cat sounds based on their emotional content. The classification model is trained to accurately categorize cat vocalizations by emotion. The sound generation model, which uses deep learning based models such as SampleRNN, is designed to produce cat sounds that reflect specific emotional states. The study finally proposes an integrated system that takes recorded cat vocalizations, classify them by emotion, and generate cat sounds based on user requirements.

Development and Application of AI-based Hearing Assistance Application (인공지능 기반 청각 보조 애플리케이션 개발 및 적용 연구)

  • Jun-Hyuk Kwon;Su-Min Kwon;Chan-Young Ma;In-Gyu Song;Do-Il Choi;Jae-Hun Lee
    • Annual Conference of KIPS
    • /
    • 2024.10a
    • /
    • pp.1074-1075
    • /
    • 2024
  • 본 논문은 청각 약자를 위한 청각 보조 애플리케이션 개발에 초점을 맞추고, 딥러닝을 활용한 오디오 분석과 감정 분석 기능을 포함한 시스템 설계를 다룹니다. 본 연구는 청각 약자들이 외출 시 혹은 실내에서 중요한 소리를 인식하고 경고를 받을 수 있도록 지원하는 애플리케이션을 개발하는 데 중점을 둡니다. 청각 보조 기능은 특정 소리를 학습한 모델을 이용해 위험 신호를 제공하며, 감정 분석 음성 번역 기능은 일상대화에서 텍스트와 감정 분석을 제공해 소통을 개선합니다. 이 애플리케이션은 사용자의 편리성을 높이기 위해 온디바이스 기술을 사용하여, 서버 없이도 실시간 분석이 가능하도록 설계되었습니다. 또한, 저비용으로 청각 보조를 가능하게 하여 더 많은 사용자에게 접근성을 제공합니다. 이를 통해 사회적 약자들의 안전을 보호하고, 감정 분석 기능을 통해 원활한 소통을 돕는다는 점에서 큰 기대효과를 보이고 있습니다.

Voice Recognition Chatbot System for an Aging Society: Technology Development and Customized UI/UX Design (고령화 사회를 위한 음성 인식 챗봇 시스템 : 기술 개발과 맞춤형 UI/UX 설계)

  • Yun-Ji Jeong;Min-Seong Yu;Joo-Young Oh;Hyeon-Seok Hwang;Won-Whoi Hun
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.24 no.4
    • /
    • pp.9-14
    • /
    • 2024
  • This study developed a voice recognition chatbot system to address depression and loneliness among the elderly in an aging society. The system utilizes the Whisper model, GPT 2.5, and XTTS2 to provide high-performance voice recognition, natural language processing, and text-to-speech conversion. Users can express their emotions and states and receive appropriate responses, with voice recognition functionality using familiar voices for comfort and reassurance. The UX/UI design considers the cognitive responses, visual impairments, and physical limitations of the smart senior generation, using high contrast colors and readable fonts for enhanced usability. This research is expected to improve the quality of life for the elderly through voice-based interfaces.