• Title/Summary/Keyword: Lip-reading

Search Result 36, Processing Time 0.022 seconds

Subword-based Lip Reading Using State-tied HMM (상태공유 HMM을 이용한 서브워드 단위 기반 립리딩)

  • Kim, Jin-Young;Shin, Do-Sung
    • Speech Sciences
    • /
    • v.8 no.3
    • /
    • pp.123-132
    • /
    • 2001
  • In recent years research on HCI technology has been very active and speech recognition is being used as its typical method. Its recognition, however, is deteriorated with the increase of surrounding noise. To solve this problem, studies concerning the multimodal HCI are being briskly made. This paper describes automated lipreading for bimodal speech recognition on the basis of image- and speech information. It employs audio-visual DB containing 1,074 words from 70 voice and tri-viseme as a recognition unit, and state tied HMM as a recognition model. Performance of automated recognition of 22 to 1,000 words are evaluated to achieve word recognition of 60.5% in terms of 22word recognizer.

  • PDF

Recognition of Korean Vowels using Bayesian Classification with Mouth Shape (베이지안 분류 기반의 입 모양을 이용한 한글 모음 인식 시스템)

  • Kim, Seong-Woo;Cha, Kyung-Ae;Park, Se-Hyun
    • Journal of Korea Multimedia Society
    • /
    • v.22 no.8
    • /
    • pp.852-859
    • /
    • 2019
  • With the development of IT technology and smart devices, various applications utilizing image information are being developed. In order to provide an intuitive interface for pronunciation recognition, there is a growing need for research on pronunciation recognition using mouth feature values. In this paper, we propose a system to distinguish Korean vowel pronunciations by detecting feature points of lips region in images and applying Bayesian based learning model. The proposed system implements the recognition system based on Bayes' theorem, so that it is possible to improve the accuracy of speech recognition by accumulating input data regardless of whether it is speaker independent or dependent on small amount of learning data. Experimental results show that it is possible to effectively distinguish Korean vowels as a result of applying probability based Bayesian classification using only visual information such as mouth shape features.

Multimodal audiovisual speech recognition architecture using a three-feature multi-fusion method for noise-robust systems

  • Sanghun Jeon;Jieun Lee;Dohyeon Yeo;Yong-Ju Lee;SeungJun Kim
    • ETRI Journal
    • /
    • v.46 no.1
    • /
    • pp.22-34
    • /
    • 2024
  • Exposure to varied noisy environments impairs the recognition performance of artificial intelligence-based speech recognition technologies. Degraded-performance services can be utilized as limited systems that assure good performance in certain environments, but impair the general quality of speech recognition services. This study introduces an audiovisual speech recognition (AVSR) model robust to various noise settings, mimicking human dialogue recognition elements. The model converts word embeddings and log-Mel spectrograms into feature vectors for audio recognition. A dense spatial-temporal convolutional neural network model extracts features from log-Mel spectrograms, transformed for visual-based recognition. This approach exhibits improved aural and visual recognition capabilities. We assess the signal-to-noise ratio in nine synthesized noise environments, with the proposed model exhibiting lower average error rates. The error rate for the AVSR model using a three-feature multi-fusion method is 1.711%, compared to the general 3.939% rate. This model is applicable in noise-affected environments owing to its enhanced stability and recognition rate.

A Study on the Effect of Traditional Percussion Improvisation to Hearing-Impaired College Students Who are Under Stress (전통타악기를 활용한 즉흥연주가 청각장애 대학생의 스트레스에 미치는 효과)

  • Lee, Eun Kyung
    • Journal of Music and Human Behavior
    • /
    • v.5 no.2
    • /
    • pp.41-66
    • /
    • 2008
  • This study investigated the effects of traditional percussion improvisation to hearing-impaired college students who are under stress. For the research, between 21 to 22 years old four hearing-impaired college students, who could do lip reading, were chosen. In quantity program, improved version of college student stress measuring method which invented by Gyoung-gu Jun and Gyo-hyeon Kim(1991) were applied, and graphs has been used for analysis. In quality program, for reliability, the researcher and two music therapists observed and analysed it. The period of research was from Dec 26, 2007 to Feb 21, 2008. There were total twenty sessions and two sessions were assigned for each week. One was 40 minutes individual session, and the other one was 50 minutes group session. Even though auditory function is critical in music playing or listening, this study showed the positive results of the therapeutic use of music on stress management for college students with hearing impairment. Future studies are important to continue to investigate the effectiveness of music therapy for hearing impaired clients who are under stress with various age range.

  • PDF

Design of an Efficient VLSI Architecture and Verification using FPGA-implementation for HMM(Hidden Markov Model)-based Robust and Real-time Lip Reading (HMM(Hidden Markov Model) 기반의 견고한 실시간 립리딩을 위한 효율적인 VLSI 구조 설계 및 FPGA 구현을 이용한 검증)

  • Lee Chi-Geun;Kim Myung-Hun;Lee Sang-Seol;Jung Sung-Tae
    • Journal of the Korea Society of Computer and Information
    • /
    • v.11 no.2 s.40
    • /
    • pp.159-167
    • /
    • 2006
  • Lipreading has been suggested as one of the methods to improve the performance of speech recognition in noisy environment. However, existing methods are developed and implemented only in software. This paper suggests a hardware design for real-time lipreading. For real-time processing and feasible implementation, we decompose the lipreading system into three parts; image acquisition module, feature vector extraction module, and recognition module. Image acquisition module capture input image by using CMOS image sensor. The feature vector extraction module extracts feature vector from the input image by using parallel block matching algorithm. The parallel block matching algorithm is coded and simulated for FPGA circuit. Recognition module uses HMM based recognition algorithm. The recognition algorithm is coded and simulated by using DSP chip. The simulation results show that a real-time lipreading system can be implemented in hardware.

  • PDF

Development and Validation Study for Korean Version of Deaf Acculturation Scale (한국판 농인 문화적응 척도 개발 및 타당화 연구)

  • Eum, Youngji;Park, Jieun;Sohn, Sunju;Eom, Jinsup;Sohn, Jinhun
    • Korean Journal of Social Welfare
    • /
    • v.66 no.3
    • /
    • pp.55-73
    • /
    • 2014
  • The purpose of this study was to develop and validate Korean version of Deaf Acculturation Scales(DAS). Pilot items were made a faithful translation of the Acculturation Scales of Maxwell-McCaw and Zea (2011) and were modified for Korean Deaf people. The Scale involves two dimensions, in order to measure the acculturation of Deaf people; Deaf acculturation and hearing acculturation. Using factor analysis, we developed a Korean version of DAS consisted of twenty-five items for Deaf acculturation dimension and twenty-five items for hearing acculturation dimension. These analysis supported the four factors of Deaf acculturation dimension and the five factors of hearing acculturation dimension. Reliability, assessed by Cronbach's ${\alpha}$, was .93 for Deaf acculturation and .93 for hearing acculturation, respectively, which confirm the Koran version of DAS. Construct validity was demonstrated through correlation with Deaf acculturation-related variables: age, age of Deafness, Degree of hearing loss, American Sign Language ability, and lip-reading ability. Criterion validity was supported by correlation with Collective Self-Esteem Scale. Limitation and implication of this study and direction for future research were discussed.

  • PDF