• Title/Summary/Keyword: Speech Emotion Recognition

Search Result 134, Processing Time 0.024 seconds

Method of Automatically Generating Metadata through Audio Analysis of Video Content (영상 콘텐츠의 오디오 분석을 통한 메타데이터 자동 생성 방법)

  • Sung-Jung Young;Hyo-Gyeong Park;Yeon-Hwi You;Il-Young Moon
    • Journal of Advanced Navigation Technology
    • /
    • v.25 no.6
    • /
    • pp.557-561
    • /
    • 2021
  • A meatadata has become an essential element in order to recommend video content to users. However, it is passively generated by video content providers. In the paper, a method for automatically generating metadata was studied in the existing manual metadata input method. In addition to the method of extracting emotion tags in the previous study, a study was conducted on a method for automatically generating metadata for genre and country of production through movie audio. The genre was extracted from the audio spectrogram using the ResNet34 artificial neural network model, a transfer learning model, and the language of the speaker in the movie was detected through speech recognition. Through this, it was possible to confirm the possibility of automatically generating metadata through artificial intelligence.

Applying Social Strategies for Breakdown Situations of Conversational Agents: A Case Study using Forewarning and Apology (대화형 에이전트의 오류 상황에서 사회적 전략 적용: 사전 양해와 사과를 이용한 사례 연구)

  • Lee, Yoomi;Park, Sunjeong;Suk, Hyeon-Jeong
    • Science of Emotion and Sensibility
    • /
    • v.21 no.1
    • /
    • pp.59-70
    • /
    • 2018
  • With the breakthrough of speech recognition technology, conversational agents have become pervasive through smartphones and smart speakers. The recognition accuracy of speech recognition technology has developed to the level of human beings, but it still shows limitations on understanding the underlying meaning or intention of words, or understanding long conversation. Accordingly, the users experience various errors when interacting with the conversational agents, which may negatively affect the user experience. In addition, in the case of smart speakers with a voice as the main interface, the lack of feedback on system and transparency was reported as the main issue when the users using. Therefore, there is a strong need for research on how users can better understand the capability of the conversational agents and mitigate negative emotions in error situations. In this study, we applied social strategies, "forewarning" and "apology", to conversational agent and investigated how these strategies affect users' perceptions of the agent in breakdown situations. For the study, we created a series of demo videos of a user interacting with a conversational agent. After watching the demo videos, the participants were asked to evaluate how they liked and trusted the agent through an online survey. A total of 104 respondents were analyzed and found to be contrary to our expectation based on the literature study. The result showed that forewarning gave a negative impression to the user, especially the reliability of the agent. Also, apology in a breakdown situation did not affect the users' perceptions. In the following in-depth interviews, participants explained that they perceived the smart speaker as a machine rather than a human-like object, and for this reason, the social strategies did not work. These results show that the social strategies should be applied according to the perceptions that user has toward agents.

Development and validation of a Korean Affective Voice Database (한국형 감정 음성 데이터베이스 구축을 위한 타당도 연구)

  • Kim, Yeji;Song, Hyesun;Jeon, Yesol;Oh, Yoorim;Lee, Youngmee
    • Phonetics and Speech Sciences
    • /
    • v.14 no.3
    • /
    • pp.77-86
    • /
    • 2022
  • In this study, we reported the validation results of the Korean Affective Voice Database (KAV DB), an affective voice database available for scientific and clinical use, comprising a total of 113 validated affective voice stimuli. The KAV DB includes audio-recordings of two actors (one male and one female), each uttering 10 semantically neutral sentences with the intention to convey six different affective states (happiness, anger, fear, sadness, surprise, and neutral). The database was organized into three separate voice stimulus sets in order to validate the KAV DB. Participants rated the stimuli on six rating scales corresponding to the six targeted affective states by using a 100 horizontal visual analog scale. The KAV DB showed high internal consistency for voice stimuli (Cronbach's α=.847). The database had high sensitivity (mean=82.8%) and specificity (mean=83.8%). The KAV DB is expected to be useful for both academic research and clinical purposes in the field of communication disorders. The KAV DB is available for download at https://kav-db.notion.site/KAV-DB-75 39a36abe2e414ebf4a50d80436b41a.

Increase of Spoken Number of Syllables Using MIT(Melody Intonation Therapy) : Case Studies on older adult with stroke and aphasia (MIT(Melodic Intonation Therapy) 중심의 음악활동을 이용한 실어증을 가진 뇌졸중 노인의 음절 수 증가에 대한 사례 연구)

  • Hong, Do Kyoung
    • Journal of Music and Human Behavior
    • /
    • v.2 no.2
    • /
    • pp.57-67
    • /
    • 2005
  • Most of stroke patients have not only physical difficulty but speech and neurological disorder because of hemiplegia and such unexpected changes cause psychologic disadaptability and absent-mindedness. Particularly, lowering of physical ability can lead to serious emotional problem from failure or frustration in daily life. Generally, treatment of patient with stroke put emphasis on physical rehabilitation but actually this patient had considerable speech disorder such as aphasia or articulation disorder. Moreover, failing of recognition function, mental disorder as hypochondria, and even visual and auditory disorder are represented. So it is effective to integrate verbal remediation and other treatments in medical care environment. In particular, patients with language disorder very often wither psychologically therefore it is efficient to use of music therapy that gives opulent emotion to aphasia patients. And primarily to investigate the effects of 10 sessions treatments; change in spoken total number of syllables, to confirm their own value by success of given task and reassure about themselves ability. All of 10 sessions stages were scored by MIT manual and its improvement were measured, that is, accomplishment was analyzed within each level in order to prove detail change of spoken total number of syllables. The result of this program organized from 2 syllables to 4 syllables is summarized as follows. Subject A completed in preliminary stage Level I, in 2 syllables case advanced to Level III in fifth session and to Level IV in seventh session, in 3 syllables case advanced to Level III in seventh session and to Level IV in ninth session, and in 4 syllables case showed 8% low success rate in first session but after repeated practice increased considerably in sixth session and in advanced to Level III in eighth session to Level IV in tenth session. Subject B also completed in preliminary stage Level I, in 2 syllables case advanced to Level III in forth session and to Level IV in sixth session, in 3 syllables case advanced to Level III in fifth session and to Level IV in seventh session, and in 4 syllables case showed 10% low success rate in first session and increased considerably in fifth session and in advanced to Level III in seventh session but could not reach to Level IV until tenth session. As a result, it was shown that music therapy using MIT was not statistically meaningful but improved spoken total number of syllables and success rate of task had improved as a whole. Therefore, music intervention using MIT it has positive affect on verbal ability of patients with Broca's Aphasia and their language rehabilitation.

  • PDF