Search | Korea Science

Estimation of speech feature vectors and enhancement of speech recognition performance using lip information (입술정보를 이용한 음성 특징 파라미터 추정 및 음성인식 성능향상)

Min So-Hee;Kim Jin-Young;Choi Seung-Ho
- MALSORI
- /
- no.44
- /
- pp.83-92
- /
- 2002
Speech recognition performance is severly degraded under noisy envrionments. One approach to cope with this problem is audio-visual speech recognition. In this paper, we discuss the experiment results of bimodal speech recongition based on enhanced speech feature vectors using lip information. We try various kinds of speech features as like linear predicion coefficient, cepstrum, log area ratio and etc for transforming lip information into speech parameters. The experimental results show that the cepstrum parameter is the best feature in the point of reconition rate. Also, we present the desirable weighting values of audio and visual informations depending on signal-to-noiso ratio.
PDF

A Novel Covariance Matrix Estimation Method for MVDR Beamforming In Audio-Visual Communication Systems (오디오-비디오 통신 시스템에서 MVDR 빔 형성 기법을 위한 새로운 공분산 행렬 예측 방법)

You, Gyeong-Kuk;Yang, Jae-Mo;Lee, Jinkyu;Kang, Hong-Goo
- The Journal of the Acoustical Society of Korea
- /
- v.33 no.5
- /
- pp.326-334
- /
- 2014
This paper proposes a novel covariance matrix estimation scheme for minimum variance distortionless response (MVDR) beamforming. By accurately tracking direction-of-sound source arrival (DoA) information using audio-visual sensors, the covariance matrix is efficiently estimated by adopting a variable forgetting factor. The variable forgetting factor is determined by considering signal-to-interference ratio (SIR). Experimental results verify that the performance of the proposed method is superior to that of the conventional one in terms of interference/noise reduction and speech distortion.
https://doi.org/10.7776/ASK.2014.33.5.326 인용 PDF KSCI

Visual Image Effects on Sound Localization in Peripheral Region under Dynamic Multimedia Conditions

Kono, Yoshinori;Hasegawa, Hiroshi;Ayama, Miyoshi;Kasuga, Masao;Matsumoto, Shuichi;Koike, Atsushi;Takagi, Koichi
- Proceedings of the IEEK Conference
- /
- 2002.07a
- /
- pp.702-705
- /
- 2002
This paper describes effects of visual information influencing sound localization in the peripheral visual Held under dynamic conditions. Presentation experiments of an audio-visual stimulus were carried out using a movie of a moving patrol car and its siren sound. The tallowing results were obtained: first, the sound image on the timing at the beginning of the presentation was more strongly captured by the visual image than that at the end, i.e., the "beginning effect" was occurred; second, in the peripheral regions, the "beginning effect" was strongly appeared in near the fixation point of eyes.
PDF

An Audio-Visual Teaching Aid (AVTA) with Scrolling Display and Speech to Text over the Internet

Davood Khalili;Chung, Wan-Young
- Proceedings of the IEEK Conference
- /
- 2003.07c
- /
- pp.2649-2652
- /
- 2003
In this Paper, an Audio-Visual Teaching aid (AVTA) for use in a classroom and with Internet is presented. A system, which was designed and tested, consists of a wireless Microphone system, Text to Speech conversion Software, Noise filtering circuit and a Computer. An IBM compatible PC with sound card and Network Interface card and a Web browser and a voice and text messenger service were used to provide slightly delayed text and also voice over the internet for remote teaming, while providing scrolling text from a real time lecture in a classroom. The motivation for design of this system, was to aid Korean students who may have difficulty in listening comprehension while have, fairly good reading ability of text. This application of this system is twofold. On one hand it will help the students in a class to view and listen to a lecture, and on the other hand, it will serve as a vehicle for remote access (audio and text) for a classroom lecture. The project provides a simple and low cost solution to remote learning and also allows a student to have access to classroom in emergency situations when the student, can not attend a class. In addition, such system allows the student in capturing a teacher's lecture in audio and text form, without the need to be present in class or having to take many notes. This system will therefore help students in many ways.
PDF

A Study on the space organization of the CDI in $Lyc\acute{e}es$ (학교도서관 공간계획 방향에 대한 연구 - 프랑스 고등학교의 지식정보센터(CDI)를 중심으로 -)

Kim, Kyung-Ho;Yeom, Dae-Bong;Kim, Jong-Seok
- Journal of the Korean Institute of Educational Facilities
- /
- v.13 no.2
- /
- pp.42-49
- /
- 2006
We have analysed the use and organization of space in the Centres des Documentations et de Informations(CDI) in lycees in France with the equivalent information centres. The majority of CDI are located in close proximity to the Academic staff and provide easy student access. The CDI provide not only books but also reviews, magazines and CD-Rom. This information is very important to pupils both in the pursuit of their higher studies and also with their future professional life as there is also careers information available to be discussed regularly with the careers advisor. The function of the CDI is not only to provide information. As part of the curriculum there are classes on how to access the information available in the CDI as well as Seminars and Audio-Visual courses. This management makes it possible for the CDI to operate, not only as a school library but also as a multi-function centre of documentation and information; a variety of spaces are also available: a room for private/small group study. A computer room (as well as a research corner), a reading room, monthly review room, photocopy room, rest room, exhibition room, careers information room, audio visual room etc. The results of this study can be used as essential information during the space planning of Korean school libraries in the future.
PDF KSCI

Improved Bimodal Speech Recognition Study Based on Product Hidden Markov Model

Xi, Su Mei;Cho, Young Im
- International Journal of Fuzzy Logic and Intelligent Systems
- /
- v.13 no.3
- /
- pp.164-170
- /
- 2013
Recent years have been higher demands for automatic speech recognition (ASR) systems that are able to operate robustly in an acoustically noisy environment. This paper proposes an improved product hidden markov model (HMM) used for bimodal speech recognition. A two-dimensional training model is built based on dependently trained audio-HMM and visual-HMM, reflecting the asynchronous characteristics of the audio and video streams. A weight coefficient is introduced to adjust the weight of the video and audio streams automatically according to differences in the noise environment. Experimental results show that compared with other bimodal speech recognition approaches, this approach obtains better speech recognition performance.
https://doi.org/10.5391/IJFIS.2013.13.3.164 인용 PDF KSCI

'EVE-Sound^TM' Toolkit for Interactive Sound in Virtual Environment (가상환경의 인터랙티브 사운드를 위한 'EVE-Sound^TM' 툴킷)

Nam, Yang-Hee;Sung, Suk-Jeong
- The KIPS Transactions:PartB
- /
- v.14B no.4
- /
- pp.273-280
- /
- 2007
This paper presents a new 3D sound toolkit called $EVE-Sound^{TM}$ that consists of pre-processing tool for environment simplification preserving sound effect and 3D sound API for real-time rendering. It is designed so that it can allow users to interact with complex 3D virtual environments by audio-visual modalities. $EVE-Sound^{TM}$ toolkit would serve two different types of users: high-level programmers who need an easy-to-use sound API for developing realistic 3D audio-visually rendered applications, and the researchers in 3D sound field who need to experiment with or develop new algorithms while not wanting to re-write all the required code from scratch. An interactive virtual environment application is created with the sound engine constructed using $EVE-Sound^{TM}$ toolkit, and it shows the real-time audio-visual rendering performance and the applicability of proposed $EVE-Sound^{TM}$ for building interactive applications with complex 3D environments.
https://doi.org/10.3745/KIPSTB.2007.14-B.4.273 인용 PDF KSCI

Comparison of Integration Methods of Speech and Lip Information in the Bi-modal Speech Recognition (바이모달 음성인식의 음성정보와 입술정보 결합방법 비교)

박병구;김진영;최승호
- The Journal of the Acoustical Society of Korea
- /
- v.18 no.4
- /
- pp.31-37
- /
- 1999
A bimodal speech recognition using visual and audio information has been proposed and researched to improve the performance of ASR(Automatic Speech Recognition) system in noisy environments. The integration method of two modalities can be usually classified into an early integration and a late integration. The early integration method includes a method using a fixed weight of lip parameters and a method using a variable weight according to speech SNR information. The 4 late integration methods are a method using audio and visual information independently, a method using speech optimal path, a method using lip optimal path and a way using speech SNR information. Among these 6 methods, the method using the fixed weight of lip parameter showed a better recognition rate.
PDF

The Influence of SOA between the Visual and Auditory Stimuli with Semantic Properties on Integration of Audio-Visual Senses -Focus on the Redundant Target Effect and Visual Dominance Effect- (의미적 속성을 가진 시.청각자극의 SOA가 시청각 통합 현상에 미치는 영향 -중복 표적 효과와 시각 우세성 효과를 중심으로-)

Kim, Bo-Seong;Lee, Young-Chang;Lim, Dong-Hoon;Kim, Hyun-Woo;Min, Yoon-Ki
- Science of Emotion and Sensibility
- /
- v.13 no.3
- /
- pp.475-484
- /
- 2010
This study examined the influence of the SOA(stimulus onset asynchrony) between visual and auditory stimuli on the integration phenomenon of audio-visual senses. Within the stimulus integration phenomenon, the redundant target effect (the faster and more accurate response to the target stimulus when the target stimulus is presented with more than two modalities) and the visual dominance effect (the faster and more accurate response to a visual stimulus compared to an auditory stimulus) were examined as we composed a visual and auditory unimodal target condition and a multimodal target condition and then observed the response time and accuracy. Consequently, despite the change between visual and auditory stimuli SOA, there was no redundant target effect present. The auditory dominance effect appeared when the SOA between the two stimuli was over 100ms. Theses results imply that the redundant target effect is continuously maintained even when the SOA between two modal stimuli is altered, and also suggests that the behavioral results of superior information processing can only be deducted when the time difference between the onset of the auditory stimuli and the visual stimuli is approximately over 100ms.
PDF

Video Summarization Using Eye Tracking and Electroencephalogram (EEG) Data (시선추적-뇌파 기반의 비디오 요약 생성 방안 연구)

Kim, Hyun-Hee;Kim, Yong-Ho
- Journal of the Korean Society for Library and Information Science
- /
- v.56 no.1
- /
- pp.95-117
- /
- 2022
This study developed and evaluated audio-visual (AV) semantics-based video summarization methods using eye tracking and electroencephalography (EEG) data. For this study, twenty-seven university students participated in eye tracking and EEG experiments. The evaluation results showed that the average recall rate (0.73) of using both EEG and pupil diameter data for the construction of a video summary was higher than that (0.50) of using EEG data or that (0.68) of using pupil diameter data. In addition, this study reported that the reasons why the average recall (0.57) of the AV semantics-based personalized video summaries was lower than that (0.69) of the AV semantics-based generic video summaries. The differences and characteristics between the AV semantics-based video summarization methods and the text semantics-based video summarization methods were compared and analyzed.
https://doi.org/10.4275/KSLIS.2022.56.1.095 인용 PDF KSCI

Search Result 207, Processing Time 0.025 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)