• Title/Summary/Keyword: 음성 정서 인식

Search Result 8, Processing Time 0.023 seconds

Acoustic parameters for induced emotion categorizing and dimensional approach (자연스러운 정서 반응의 범주 및 차원 분류에 적합한 음성 파라미터)

  • Park, Ji-Eun;Park, Jeong-Sik;Sohn, Jin-Hun
    • Science of Emotion and Sensibility
    • /
    • v.16 no.1
    • /
    • pp.117-124
    • /
    • 2013
  • This study examined that how precisely MFCC, LPC, energy, and pitch related parameters of the speech data, which have been used mainly for voice recognition system could predict the vocal emotion categories as well as dimensions of vocal emotion. 110 college students participated in this experiment. For more realistic emotional response, we used well defined emotion-inducing stimuli. This study analyzed the relationship between the parameters of MFCC, LPC, energy, and pitch of the speech data and four emotional dimensions (valence, arousal, intensity, and potency). Because dimensional approach is more useful for realistic emotion classification. It results in the best vocal cue parameters for predicting each of dimensions by stepwise multiple regression analysis. Emotion categorizing accuracy analyzed by LDA is 62.7%, and four dimension regression models are statistically significant, p<.001. Consequently, this result showed the possibility that the parameters could also be applied to spontaneous vocal emotion recognition.

  • PDF

Multidimensional Affective model-based Multimodal Complex Emotion Recognition System using Image, Voice and Brainwave (다차원 정서모델 기반 영상, 음성, 뇌파를 이용한 멀티모달 복합 감정인식 시스템)

  • Oh, Byung-Hun;Hong, Kwang-Seok
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2016.04a
    • /
    • pp.821-823
    • /
    • 2016
  • 본 논문은 다차원 정서모델 기반 영상, 음성, 뇌파를 이용한 멀티모달 복합 감정인식 시스템을 제안한다. 사용자의 얼굴 영상, 목소리 및 뇌파를 기반으로 각각 추출된 특징을 심리학 및 인지과학 분야에서 인간의 감정을 구성하는 정서적 감응요소로 알려진 다차원 정서모델(Arousal, Valence, Dominance)에 대한 명시적 감응 정도 데이터로 대응하여 스코어링(Scoring)을 수행한다. 이후, 스코어링을 통해 나온 결과 값을 이용하여 다차원으로 구성되는 3차원 감정 모델에 매핑하여 인간의 감정(단일감정, 복합감정)뿐만 아니라 감정의 세기까지 인식한다.

A policy study for the voice recognition technology based on elderly health care (음성인식기술의 노인간병 적용을 위한 정책연구)

  • Cho, Byung-Chul;Cheon, Sooyoung;Kim, Kab-Nyun;Yuk, Hyun-Seung
    • Journal of Digital Convergence
    • /
    • v.16 no.2
    • /
    • pp.9-17
    • /
    • 2018
  • The purpose of this study is to find out how voice recognition technology can be utilized to solve the elderly problem rapidly aging in Korea. Public support services and civilian nursing services for the elderly are expected to expand in Korea. In this case, voice recognition technology can be used variously for the elderly who are not familiar with the media interface. To this end, our researchers visited Japan and examined the achievements obtained by voice recognition technology in the elderly care. Especially, when caregivers write reports, they have greatly reduced their working hours by replacing the handwritten reports with ones using voice recognition technology. This method can be easily implemented in Korea. In addition, the social cost of the elderly support can be gradually reduced through the development of a robot equipped with voice recognition technology. Consequently, we realize that when voice recognition technology is combined with artificial intelligence programs of various emotion recognition functions and various policy possibilities as well.

Development and validation of a Korean Affective Voice Database (한국형 감정 음성 데이터베이스 구축을 위한 타당도 연구)

  • Kim, Yeji;Song, Hyesun;Jeon, Yesol;Oh, Yoorim;Lee, Youngmee
    • Phonetics and Speech Sciences
    • /
    • v.14 no.3
    • /
    • pp.77-86
    • /
    • 2022
  • In this study, we reported the validation results of the Korean Affective Voice Database (KAV DB), an affective voice database available for scientific and clinical use, comprising a total of 113 validated affective voice stimuli. The KAV DB includes audio-recordings of two actors (one male and one female), each uttering 10 semantically neutral sentences with the intention to convey six different affective states (happiness, anger, fear, sadness, surprise, and neutral). The database was organized into three separate voice stimulus sets in order to validate the KAV DB. Participants rated the stimuli on six rating scales corresponding to the six targeted affective states by using a 100 horizontal visual analog scale. The KAV DB showed high internal consistency for voice stimuli (Cronbach's α=.847). The database had high sensitivity (mean=82.8%) and specificity (mean=83.8%). The KAV DB is expected to be useful for both academic research and clinical purposes in the field of communication disorders. The KAV DB is available for download at https://kav-db.notion.site/KAV-DB-75 39a36abe2e414ebf4a50d80436b41a.

Audio-Based Human-Robot Interaction Technology (오디오 기반 인간로봇 상호작용 기술)

  • Kwak, K.C.;Kim, H.J.;Bae, K.S.;Yoon, H.S.
    • Electronics and Telecommunications Trends
    • /
    • v.22 no.2 s.104
    • /
    • pp.31-37
    • /
    • 2007
  • 인간로봇 상호작용 기술(human-robot interaction)은 다양한 의사소통 채널인 로봇카메라, 마이크로폰, 기타 센서를 통해 인지 및 정서적으로 상호작용할 수 있도록 로봇시스템 및 상호작용 환경을 디자인하고 구현 및 평가하는 지능형 서비스 로봇의 핵심기술이다. 본 고에서는 오디오 기반 인간로봇 상호작용 기술 중에서 음원 추적(sound localization)과 화자인식(speaker recognition) 기술의 국내외 기술동향을 살펴보고 최근 ETRI 지능형로봇연구단에서 상용화를 추진중인 시청각 기반 음원 추적(audio visual sound localization)과 문장독립 화자인식(text-independent speaker recognition)기술들을 다룬다. 또한 이들 기술들을 가정환경에서 효과적으로 사용하기 위해 음성인식, 얼굴검출, 얼굴인식 등을 결합한 시나리오에 대해서 살펴본다.

Human Touching Behavior Recognition based on Neural Network in the Touch Detector using Force Sensors (힘 센서를 이용한 접촉감지부에서 신경망기반 인간의 접촉행동 인식)

  • Ryu, Joung-Woo;Park, Cheon-Shu;Sohn, Joo-Chan
    • Journal of KIISE:Software and Applications
    • /
    • v.34 no.10
    • /
    • pp.910-917
    • /
    • 2007
  • Of the possible interactions between human and robot, touch is an important means of providing human beings with emotional relief. However, most previous studies have focused on interactions based on voice and images. In this paper. a method of recognizing human touching behaviors is proposed for developing a robot that can naturally interact with humans through touch. In this method, the recognition process is divided into pre-process and recognition Phases. In the Pre-Process Phase, recognizable characteristics are calculated from the data generated by the touch detector which was fabricated using force sensors. The force sensor used an FSR (force sensing register). The recognition phase classifies human touching behaviors using a multi-layer perceptron which is a neural network model. Experimental data was generated by six men employing three types of human touching behaviors including 'hitting', 'stroking' and 'tickling'. As the experimental result of a recognizer being generated for each user and being evaluated as cross-validation, the average recognition rate was 82.9% while the result of a single recognizer for all users showed a 74.5% average recognition rate.

VHI, V-RQOL, and vocal characteristics of teacher and singer (교사 및 성악가의 VHI, V-RQOL, 음향학적 특성 비교)

  • Hong, Ju-Hye;Hwang, Young-Jin
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.13 no.7
    • /
    • pp.3048-3056
    • /
    • 2012
  • The purpose of this study was to investigate VHI, V-RQOL, and vocal characteristics of a teacher and singer. 56 subjects were participated in this study (20 subjects are a teacher with vocal nodule, 20 subjects are a singer with vocal nodule, and 16 subjects are normal speakers). All subjects completed the VHI, V-RQOL, and vocal characteristics were measured using CSL 4500(Kay Pentax. USA). 21 subjects completed the VHI, V-RQOL, and vocal evaluation using CSL 4500 twice to assess test-retest reliability. A statistical analysis was performed using the Statistical Package for Social Sciences 18.0 (SPSS Inc, Chicago, IL). The VHI and V-RQOL showed that the teacher and singer group were significantly higher than those of the control group in functional, physical and emotional aspects(p<0.05). the acoustic analysis using CSL 4500 showed the teacher and singer group were significantly higher than those of the the control group in fundamental frequency related variables, fundamental perturbation related variables, amplitude perturbation related variables, noise related variable, and tremor related variables(p<0.05). Conclusionally, the teacher and singer group recognized their voice problems as a serious physico-functional aspects.

Mobile Phone Uses and Social-Psychological Variables of Teen-Agers (청소년들의 이동전화 이용행태와 사회심리적 변인에 관한 연구: 부산.울산 지역 이용자들을 중심으로)

  • Lee, Joon-Ho;Ann, Soo-Geun;Jeong, Yong-Jo
    • Korean journal of communication and information
    • /
    • v.27
    • /
    • pp.247-282
    • /
    • 2004
  • The purpose of this study is to examine the mobile phone service usage patterns of Korean teenagers and their motivation to use it, and to explore the relationships between the teenagers' social-psychological characteristics and their trails of mobile phone usage. After reviewing prior theoretical studios, three hypotheses regarding the research questions shown above were generated. Questionnaire study was conducted with 400 high school students in Busan and Ulsan metropolitan areas. They were to answer three categories of questions, such as frequencies of voice calls and text messaging services and the other mobile services, questions asking their social-psychological characteristics, and their motivations for using mobile phone services. Findings show that the teenagers more likely to tend to use voice calls when communicating family members. They tend to avoid those services with high rate of use as multimedia, game, and broadcasting services via mobile phones. The teenagers use mobile phone services for the similar purposes to conventional telephone uses. Therefore, the classical theories showing the alternative purposes or motivations to use conventional telephone usage, which divided the main purposes for phone uses in to two; work/instrumental and socializing/entertaining, proved to be reasonable. The cheap messaging services with texts are popular among the teenagers and are used for socializing/entertaining purposes rather than instrumental/functional purposes. In addition, the teenagers' social-psychological characteristicss are significantly associated with the amount of mobile phone uses. A characteristic of "individual-centeredness" is positively related with non-voice service uses, while "group-centeredness" explains heavy amount of using voice calls and text messaging services. Such characteristics as "immediacy," "directness," "innovativeness," and "other-direction" are positively associated with the frequencies and amounts of mobile phone uses, while "inner-direction" and "tradition-direction" are negatively correlated.

  • PDF