• Title/Summary/Keyword: visual-audio

Search Result 424, Processing Time 0.019 seconds

Lip and Voice Synchronization Using Visual Attention (시각적 어텐션을 활용한 입술과 목소리의 동기화 연구)

  • Dongryun Yoon;Hyeonjoong Cho
    • The Transactions of the Korea Information Processing Society
    • /
    • v.13 no.4
    • /
    • pp.166-173
    • /
    • 2024
  • This study explores lip-sync detection, focusing on the synchronization between lip movements and voices in videos. Typically, lip-sync detection techniques involve cropping the facial area of a given video, utilizing the lower half of the cropped box as input for the visual encoder to extract visual features. To enhance the emphasis on the articulatory region of lips for more accurate lip-sync detection, we propose utilizing a pre-trained visual attention-based encoder. The Visual Transformer Pooling (VTP) module is employed as the visual encoder, originally designed for the lip-reading task, predicting the script based solely on visual information without audio. Our experimental results demonstrate that, despite having fewer learning parameters, our proposed method outperforms the latest model, VocaList, on the LRS2 dataset, achieving a lip-sync detection accuracy of 94.5% based on five context frames. Moreover, our approach exhibits an approximately 8% superiority over VocaList in lip-sync detection accuracy, even on an untrained dataset, Acappella.

Implementation of an Intelligent Audio Graphic Equalizer System (지능형 오디오 그래픽 이퀄라이저 시스템 구현)

  • Lee Kang-Kyu;Cho Youn-Ho;Park Kyu-Sik
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.43 no.3 s.309
    • /
    • pp.76-83
    • /
    • 2006
  • A main objective of audio equalizer is for user to tailor acoustic frequency response to increase sound comfort and example applications of audio equalizer includes large-scale audio system to portable audio such as mobile MP3 player. Up to now, all the audio equalizer requires manual setting to equalize frequency bands to create suitable sound quality for each genre of music. In this paper, we propose an intelligent audio graphic equalizer system that automatically classifies the music genre using music content analysis and then the music sound is boosted with the given frequency gains according to the classified musical genre when playback. In order to reproduce comfort sound, the musical genre is determined based on two-step hierarchical algorithm - coarse-level and fine-level classification. It can prevent annoying sound reproduction due to the sudden change of the equalizer gains at the beginning of the music playback. Each stage of the music classification experiments shows at least 80% of success with complete genre classification and equalizer operation within 2 sec. Simple S/W graphical user interface of 3-band automatic equalizer is implemented using visual C on personal computer.

The Development of Multimedia Player Platform for Terrestrial Digital Multimedia Broadcasting (DMB) (지상파 이동 멀티미디어방송용 멀티미디어 재생기 개발)

  • 기명석;서정일;강경옥
    • Journal of Broadcast Engineering
    • /
    • v.8 no.4
    • /
    • pp.465-472
    • /
    • 2003
  • In this paper we propose the structure of MPEG-4 multimedia player platform for Terrestrial Digital Multimedia Broadcasting (DMB) Service. Korea will launch DMB service at next 2004 you based on Eureka-147 Digital Audio Broadcasting (DAB) Service System. This new mobile multimedia broadcasting services provide not only high quality digital audio broadcasting services, but also various multimedia data broadcasting services including high quality video. For the sake of MPEG-4 Systems technologies, it will provide an interactive service to users in the near future. Therefore it terminal shall have various functionalities as well as playing audio-visual contents. However there is no precedence standard for such mobile interactive multimedia broadcasting system. Therefore it is very import to provide the multimedia player platform of DMB service for accelerating the development process of commercial terminal and providing a direction of next DMB terminal structure.

Development of Wearable Sensing and Feedback Product Design for Movement Monitoring (동작 모니터링을 위한 웨어러블 센싱 및 피드백 제품 디자인 개발)

  • Cho, Hyun-Seung;Yang, Jin-Hee;Lee, Kang-Hwi;Lee, Jeong-Hwan;Park, Su-Youn;Choi, Hyeong-Ik;Jeon, Hak-Su;Lee, Joo-Hyeon
    • Science of Emotion and Sensibility
    • /
    • v.21 no.3
    • /
    • pp.165-176
    • /
    • 2018
  • The objective of this study was to develop clothing-type wearable motion sensing and feedback systems to enhance children's sports by promoting visual and audio feedback. In this study, several applications, such as fabric sensors, sportswear integrated with various types of fabric sensors, and fabric-based motion sensing module design, as well as a visual and audio feedback system for gaining a better understanding of a child's interest in a type of exercise, were developed. An SWCNT-based stretchable fabric sensor was developed for motion sensing, and sportswear was designed using the fabric sensor that was integrated into the limbs of the garment. The sensing module was developed, and sensory performance was evaluated through a joint motion experiment for children. In addition, using the feedback system that was developed in the form of an accessory, the responses of light and sound were also examined based on the movement of the child who was wearing the sportswear prototypes. This study focused on the development and assessment of prototype designs for children's sportswear and accessory products that can help to ascertain a child's interest in a particular exercise.

ANS responses in Negative Emotions Induced by Audio-visual Film Clips (시청각 동영상에 의해 유발된 부정적 감성에 따른 자율신경계 반응)

  • Lee, Young-Chang;Jang, Eun-Hye;Chung, Soon-Cheol;Sohn, Jin-Hun
    • Science of Emotion and Sensibility
    • /
    • v.10 no.3
    • /
    • pp.471-480
    • /
    • 2007
  • Negative emotions play an important function as to human's existence. In this research, we employed the audio-visual film clips to induce negative emotions and examined the classified responses in the autonomic nervous system(ANS) due to each negative emotion.30 adults(22.6 years $old{\pm}1.24$, 15 males and 15 females) took part in this experiment. Through the preliminary experiment, 2 minutes film's stimuli were selected as the emotion-induced stimuli. During the period when participants were viewing and listening to the selected movie, EDA and ECG were examined as soon as one stimulus was displayed, participants were tested by completing the psychological appraisals of their experienced emotion due to each emotional stimulus. With regard to the result of analyzing the psychological responses, each negative emotion appropriately and effectively induced its target emotion. While concerning the result of analyzing ANS responses, each negative emotion induced its respective activation in ANS. What is more, compared with other types of negative emotional stimuli, the scaring stimulus induced higher activation of the sympathetic nervour system(SNS) as to the indexes in EDh and ECG. This research made segmentation of ANS responses to each negative emotion, which has its significance.

  • PDF

Satisfaction and Perception of Nutrition Education by Elementary School Students (초등학생의 영양교육에 대한 만족과 인식 조사)

  • Yun, Jee-Sun;Lyu, Eun-Soon
    • Journal of the Korean Society of Food Science and Nutrition
    • /
    • v.41 no.9
    • /
    • pp.1259-1264
    • /
    • 2012
  • The study was conducted to investigate the perception and satisfaction of nutrition education by elementary school students. Questionnaires were distributed to a total of 327 students at three elementary schools in the Ulsan area in July 2010. Fifty-two percent of the respondents were satisfied with the teaching tools/audio-visual materials for nutrition education. Compared with other lectures, 46.5% of the students responded that nutrition education was more interesting. Average scores for necessity of nutrition education was 4.02/5.00, and willingness to participate in re-education lecture was 3.80/5.00. Seventy-seven percent of students answered that they would encourage friends to participate in nutrition education. Average scores for necessity of nutrition education and willingness to participate in re-education lecture were significantly (p<0.01) higher in the student group that was satisfied with the teaching tools/audio-visual materials and lectures. Students who were satisfied with the tools/audio-visual materials and those interested in nutrition education had significantly (p<0.01) higher average scores for improved dietary habits from nutrition education contents compared to other students.

A MPEG Audio-Visual Conversational Communication Terminal on the B-ISDN Environment (광대역 ISDN용 MPEG 오디오-비쥬열 대화형 통신단말의 설계 및 구현)

  • Hwang, Dae-Hwan;Cho, Kyu-Seob
    • The Transactions of the Korea Information Processing Society
    • /
    • v.5 no.8
    • /
    • pp.1960-1971
    • /
    • 1998
  • The researches and developments to provide multimedia communication services such as Video on Demand(VoDJ), real time video phonc and multipoint vidco conferencing on broadband ISDN environmcnts have been proceeded with activity. Specifications for Vol) services which is worked by Digital Audio-Visual Council(DAVIC) to support detail technologies including total service system that is consist of VoD server. delive[\! networl, and Set-Top Box(STB) had been already finished and ITU-T SG16 also recommended the standards of H.300 series terminal aspects for conversational multimedia services, But the architectures of multimedia tenninals recommended and specified by these organizations do not have an efficient st11lcture to provide all of retrieval, distrihution and conversational service due to a different point of view about multimedia terminals and services. In this paper, we analyzed the recornmendatio!E and the specifications of intemational public and private organizations like lTU-T, DAVIC and ATM forum. As a result of these analysis. we propose an efficient terminal architecture, and then we have designed, lmplemented the multimedia communication terminal for offering VoI) and real- time conversation ,,, functional module test according to the individual commumication service session and confirined the validiry or terminal implemented to be used on broadband ISDK environments.

  • PDF

An Optimization Technique of Scene Description for Effective Transmission of Interactive T-DMB Contents (대화형 T-DMB 컨텐츠의 효율적인 전송을 위한 장면기술정보 최적화 기법)

  • Li Song-Lu;Cheong Won-Sik;Jae Yoo-Young;Cha Kyung-Ae
    • Journal of Broadcast Engineering
    • /
    • v.11 no.3 s.32
    • /
    • pp.363-378
    • /
    • 2006
  • The Digital Multimedia Broadcasting(DMB) system is developed to offer high quality audio-visual multimedia contents to the mobile environment. The system adopts MPEG-4 standard for the main video, audio and other media format. It also adopts the MPEG-4 scene description for interactive multimedia contents. The animated and interactive contents can be actualized by BIFS(Binary Format for Scene), the binary format for scene description that refers to the spatio-temporal specifications and behaviors of the individual objects. As more interactive contents are, the scene description is also needed more high bitrate. However, the bandwidth for allocating meta data such as scene description is restrictive in mobile environment. On one hand, the DMB terminal starts demultiplexing content and decodes individual media by its own decoder. After decoding each media, rendering module presents each media stream according to the scene description. Thus the BIFS stream corresponding to the scene description should be decoded and parsed in advance of presenting media data. With these reason, the transmission delay of BIFS stream causes the delay of whole audio-visual scene presentation although the audio or video streams are encoded in very low bitrate. This paper presents the effective optimization technique for adapting BIFS stream into expected MPEG-2 TS bitrate without any bandwidth waste and avoiding the transmission delay of the initial scene description for interactive DMB contents.

The Effect of Breathing Biofeedback on Breathing Reproducibility and Patient's Dose in Respiration-gated Radiotherapy (호흡연동 방사선 치료에서 호흡생체자기제어 방식이 호흡 재현성 및 선량에 미치는 영향 평가)

  • An, Sohyun;Yeo, Inhwan;Jung, Jaewon;Suh, Hyunsuk;Lee, Kyung Ja;Choi, Jinho;Lee, Kyu Chan;Lee, Rena
    • Progress in Medical Physics
    • /
    • v.24 no.3
    • /
    • pp.135-139
    • /
    • 2013
  • We evaluated the effect of two kinds of breathing biofeedback technique such as audio-instruction and audio-visual biofeedback on breathing reproducibility and the CTV coverage during repeated treatment regimes in respiration-gated radiotherapy. In this study, the breathing data of nineteen lung cancer patients acquired from Medical College of Virginia (MCV) during five weeks were used. The dose evaluation algorithm was programmed in MATLAB. In the result, the CTV coverage was decreased as 30.0% due to the breathing irreproducibility for free-breathing. For audio-visual biofeedback, the CTV coverage was improved as 20.0% because patients can learn how control their breathing stably. And the audio-instruction was effective to preserve the breathing reproducibility.

A Study on the Robust Bimodal Speech-recognition System in Noisy Environments (잡음 환경에 강인한 이중모드 음성인식 시스템에 관한 연구)

  • 이철우;고인선;계영철
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.1
    • /
    • pp.28-34
    • /
    • 2003
  • Recent researches have been focusing on jointly using lip motions (i.e. visual speech) and speech for reliable speech recognitions in noisy environments. This paper also deals with the method of combining the result of the visual speech recognizer and that of the conventional speech recognizer through putting weights on each result: the paper proposes the method of determining proper weights for each result and, in particular, the weights are autonomously determined, depending on the amounts of noise in the speech and the image quality. Simulation results show that combining the audio and visual recognition by the proposed method provides the recognition performance of 84% even in severely noisy environments. It is also shown that in the presence of blur in images, the newly proposed weighting method, which takes the blur into account as well, yields better performance than the other methods.