• Title/Summary/Keyword: Audio-Visual Information

Search Result 207, Processing Time 0.027 seconds

Audio and Video Bimodal Emotion Recognition in Social Networks Based on Improved AlexNet Network and Attention Mechanism

  • Liu, Min;Tang, Jun
    • Journal of Information Processing Systems
    • /
    • v.17 no.4
    • /
    • pp.754-771
    • /
    • 2021
  • In the task of continuous dimension emotion recognition, the parts that highlight the emotional expression are not the same in each mode, and the influences of different modes on the emotional state is also different. Therefore, this paper studies the fusion of the two most important modes in emotional recognition (voice and visual expression), and proposes a two-mode dual-modal emotion recognition method combined with the attention mechanism of the improved AlexNet network. After a simple preprocessing of the audio signal and the video signal, respectively, the first step is to use the prior knowledge to realize the extraction of audio characteristics. Then, facial expression features are extracted by the improved AlexNet network. Finally, the multimodal attention mechanism is used to fuse facial expression features and audio features, and the improved loss function is used to optimize the modal missing problem, so as to improve the robustness of the model and the performance of emotion recognition. The experimental results show that the concordance coefficient of the proposed model in the two dimensions of arousal and valence (concordance correlation coefficient) were 0.729 and 0.718, respectively, which are superior to several comparative algorithms.

Real-time 3D Audio Downmixing System based on Sound Rendering for the Immersive Sound of Mobile Virtual Reality Applications

  • Hong, Dukki;Kwon, Hyuck-Joo;Kim, Cheong Ghil;Park, Woo-Chan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.12
    • /
    • pp.5936-5954
    • /
    • 2018
  • Eight out of the top ten the largest technology companies in the world are involved in some way with the coming mobile VR revolution since Facebook acquired Oculus. This trend has allowed the technology related with mobile VR to achieve remarkable growth in both academic and industry. Therefore, the importance of reproducing the acoustic expression for users to experience more realistic is increasing because auditory cues can enhance the perception of the complicated surrounding environment without the visual system in VR. This paper presents a audio downmixing system for auralization based on hardware, a stage of sound rendering pipelines that can reproduce realiy-like sound but requires high computation costs. The proposed system is verified through an FPGA platform with the special focus on hardware architectural designs for low power and real-time. The results show that the proposed system on an FPGA can downmix maximum 5 sources in real-time rate (52 FPS), with 382 mW low power consumptions. Furthermore, the generated 3D sound with the proposed system was verified with satisfactory results of sound quality via the user evaluation.

A Study of the spatial perception by audio-visual information (시각과 청각에 의한 공간적 지각에 관한 연구)

  • Lee, Chai-Bong;Kang, Dae-Gee
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.11 no.2
    • /
    • pp.132-136
    • /
    • 2010
  • Psychophysical experiment was performed to investigate how audio-visual spatial disparity affects on perceptual space in peripheral vision. In the experiment, participants were exposed to two stimuli of vision and sound which comes simultaneously from different directions, respectively. The visual stimulus was implemented by 7 white LEDs which were located at an equal distance with 7 different angles of $-70^{\circ}$, $-40^{\circ}$, $-20^{\circ}$, $0^{\circ}$, $20^{\circ}$, $40^{\circ}$, and $70^{\circ}$ from the right front. Those audial stimuli were also implemented by loudspeakers which were placed at 9 different directions equally spaced by $5^{\circ}$ ranged from $-20^{\circ}$ to $20^{\circ}$. Each participant then evaluated spatial disparity between visual and audial stimuli with 5 levels of response, in which the higher level indicates the larger gap. When the visual stimulus is applied from the right, the results show that the response level gets higher for a larger angle between visual and auditory stimuli. A similar tendency for the visual stimulus with $0^{\circ}$ orientation was also be observed. On the other hand, when the visual stimulus is applied from the left, the response level gets lower for the larger angle.

Implementation of the Broadcasting System for Digital Media Contents (디지털 미디어 콘텐츠 방송 시스템 구현)

  • Shin, Jae-Heung;Kim, Hong-Ryul;Lee, Sang-Cheal
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.57 no.10
    • /
    • pp.1883-1887
    • /
    • 2008
  • Most of digital media contents are composed with video and audio, picture and animation informations. Sometime, there is some deviation of information recognition quality for the video and audio information according to information receiver's characteristics or the understanding. But visual information using the text provide most clear and accurate ways for information recognition to human being. In this paper, we propose a new broadcasting system(BSDMC) to transmit clear and accurate meaning of the digital media contents. We implement general-purpose components to display the video, picture, text and symbol simultaneously. Only plug-in and call these components with proper parameters on the application developing tool, we can easily develop the multimedia contents broadcasting system. These components are implemented based on the object-oriented framework and modular structure so that increase the reusability and can be develop other applications quick and reliable.

Video Highlight Prediction Using GAN and Multiple Time-Interval Information of Audio and Image (오디오와 이미지의 다중 시구간 정보와 GAN을 이용한 영상의 하이라이트 예측 알고리즘)

  • Lee, Hansol;Lee, Gyemin
    • Journal of Broadcast Engineering
    • /
    • v.25 no.2
    • /
    • pp.143-150
    • /
    • 2020
  • Huge amounts of contents are being uploaded every day on various streaming platforms. Among those videos, game and sports videos account for a great portion. The broadcasting companies sometimes create and provide highlight videos. However, these tasks are time-consuming and costly. In this paper, we propose models that automatically predict highlights in games and sports matches. While most previous approaches use visual information exclusively, our models use both audio and visual information, and present a way to understand short term and long term flows of videos. We also describe models that combine GAN to find better highlight features. The proposed models are evaluated on e-sports and baseball videos.

Auditory and Visual Information Effect on the Loudness of Noise (시각 및 청각 정보가 소음의 인지도에 미치는 영향)

  • Shin, Hoon;Park, Sa-Gun;Song, Min-Jeong;Jang, Gil-Soo
    • KIEAE Journal
    • /
    • v.6 no.4
    • /
    • pp.69-76
    • /
    • 2006
  • The effects of the additional visual and auditory stimuli on the loudness evaluation of road traffic noise was investigated by the method of magnitude estimation. As a result, it was shown that additional visual stimulus of noise barrier can influence on the loudness perception of road traffic noise. Also, additional auditory stimuli such as green music or sound of flowing water can influence on the loudness perception of road traffic noise, approximately 5~10% lower than the absence of stimuli. But this effect was disappeared in the range of over 65dB(A).

The Auditory and Visual Information Effects on the Loudness of Noises Perception (친환경적 시각 및 청각정보가 소음의 인지도에 미치는 영향)

  • Shin, Hoon;Song, Min-Jeong;Kook, Chan;Jang, Gil-Soo;Kim, Sun-Woo
    • Proceedings of the Korean Society for Noise and Vibration Engineering Conference
    • /
    • 2006.05a
    • /
    • pp.970-973
    • /
    • 2006
  • The effects of the additional visual and auditory stimuli on the loudness evaluation of road traffic noise was investigated by the method of magnitude estimation. As a result, it was shown that additional visual stimulus of noise barrier can influence on the loudness perception of road traffic noise. Also, additional auditory stimuli such as green music or sound of flowing water can influence on the loudness perception of road traffic noise. approximately $5{\sim}10%$ lower than the absence of stimuli. But this effect was disappeared in the range of over 65dB(A).

  • PDF

DAVIC Security (DAVIC 에서의 보안 기법)

  • 염흥열
    • Review of KIISC
    • /
    • v.7 no.1
    • /
    • pp.7-40
    • /
    • 1997
  • 본 고에서는 DAVIC(Digital Audio-Visual Council)에서 권고 중인 액세스 제어 기법을 분석한다. 이를 위하여 DAVIC 보안 기법을 분석 및 제시하고, 보안 시스팀을 위한 참조 모델과 요구되는 서비스 및 보안 메카니즘을 분석한다. 현재 DAVIC에서는 보안을 위해 표준화를 진행 중에 있으므로 현재까지 표준화되고 있는 내용을 중심으로 기술한다.

  • PDF

Multimedia TIAV System

  • Beknazarova, Saida Safibullayevna
    • Journal of Multimedia Information System
    • /
    • v.2 no.4
    • /
    • pp.295-302
    • /
    • 2015
  • This article discusses the features and trends of development of the process of implementation of multimedia systems in various fields, research substantiate the basic concepts of multimedia systems, information flow, describes the classification and characterization of information flows and systems. Described container TIAV, which is designed with all the modern features and is aimed at future trends in the field of play.

Design and Implementation of T.130 Audio-Visual Control for Real-time Multimedia Conferencing (다양한 망에서의 실시간 멀티미디어 회의를 위한 영상.음성 제어부의 설계 및 구현)

  • Kang, Myung-Ho;Kim, Hong-Rae;Seong, Dong-Su;Huh, Mi-Young;Hahn, Jin-Ho;Seong, Kwang-Su
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 1998.10a
    • /
    • pp.653-655
    • /
    • 1998
  • 다양한 망에서의 실시간 멀티미디어 회의를 위한 영상.음성 제어 프로토콜의 표준규?으로서 ITU-T에서 T.130 시리즈를 정의하고 있다. T.130은 영상.음성 제어 프로토콜에 관해서 전반적인 소개를 하고 있으며, T.132 Audio-Visual Control(AVC)는 영상.음성 제어를 위한 프로토콜을 정의하고 있고, T.131은 H.320, H323, H324와 같은 다른 타입의 멀티미디어 시스템과의 mapping에 관한 사항을 소개하고 있다. 그러나 구체적으로 영상.음성 제어를 위한 자료구조 및 이들의 운영방법, 알고리즘에 대해서는 프로토콜 구현시의 사항으로 남겨놓았다. T.130 시리즈는 T.120 시리즈의 GCC(Generic Conference Control)와 MCS(Multipoint Communication Service)를 이용함으로써 PSTN을 비롯하여 PSDN, CSDN, ISDN, LAN, 그리고 ATM등 모든 망을 지원한다. 본 논문에서는 AVC 프로토콜의 설계와 구현된 AVC를 소개한다.

  • PDF