Search | Korea Science

Improved Automatic Lipreading by Stochastic Optimization of Hidden Markov Models (은닉 마르코프 모델의 확률적 최적화를 통한 자동 독순의 성능 향상)

Lee, Jong-Seok;Park, Cheol-Hoon
- The KIPS Transactions:PartB
- /
- v.14B no.7
- /
- pp.523-530
- /
- 2007
This paper proposes a new stochastic optimization algorithm for hidden Markov models (HMMs) used as a recognizer of automatic lipreading. The proposed method combines a global stochastic optimization method, the simulated annealing technique, and the local optimization method, which produces fast convergence and good solution quality. We mathematically show that the proposed algorithm converges to the global optimum. Experimental results show that training HMMs by the method yields better lipreading performance compared to the conventional training methods based on local optimization.
https://doi.org/10.3745/KIPSTB.2007.14-B.7.523 인용 PDF KSCI

Automatic Lipreading Using Color Lip Images and Principal Component Analysis (컬러 입술영상과 주성분분석을 이용한 자동 독순)

Lee, Jong-Seok;Park, Cheol-Hoon
- The KIPS Transactions:PartB
- /
- v.15B no.3
- /
- pp.229-236
- /
- 2008
This paper examines effectiveness of using color images instead of grayscale ones for automatic lipreading. First, we show the effect of color information for performance of humans' lipreading. Then, we compare the performance of automatic lipreading using features obtained by applying principal component analysis to grayscale and color images. From the experiments for various color representations, it is shown that color information is useful for improving performance of automatic lipreading; the best performance is obtained by using the RGB color components, where the average relative error reductions for clean and noisy conditions are 4.7% and 13.0%, respectively.
https://doi.org/10.3745/KIPSTB.2008.15-B.3.229 인용 PDF KSCI

Experiments on Various Spatial-Temporal Features for Korean Lipreading (한국어 입술 독해에 적합한 시공간적 특징 추출)

오현화;김인철;김동수;진성일
- Proceedings of the IEEK Conference
- /
- 2001.06d
- /
- pp.29-32
- /
- 2001
Visual speech information improves the performance of speech recognition, especially in noisy environment. We have tested the various spatial-temporal features for the Korean lipreading and evaluated the performance by using a hidden Markov model based classifier. The results have shown that the direction as well as the magnitude of the movement of the lip contour over time is useful features for the lipreading.
PDF

Automatic Lipreading Based on Image Transform and HMM (이미지 변환과 HMM에 기반한 자동 립리딩)

김진범;김진영
- Proceedings of the IEEK Conference
- /
- 1999.11a
- /
- pp.585-588
- /
- 1999
This paper concentrates on an experimental results on visual only recognition tasks using an image transform approach and HMM based recognition system. There are two approaches for extracting features of lipreading, a lip contour based approach and an image transform based one. The latter obtains a compressed representation of the image pixel values that contain the speaker's mouth results in superior lipreading performance. In addition, PCA(Principal component analysis) is used for fast algorithm. Finally, HMM recognition tasks are compared with the another.
PDF

A Study on Spatio-temporal Features for Korean Vowel Lipreading (한국어 모음 입술독해를 위한 시공간적 특징에 관한 연구)

오현화;김인철;김동수;진성일
- The Journal of the Acoustical Society of Korea
- /
- v.21 no.1
- /
- pp.19-26
- /
- 2002
This paper defines the visual basic speech units, visemes and investigates various visual features of a lip for the effective Korean lipreading. First, we analyzed the visual characteristics of the Korean vowels from the database of the lip image sequences obtained from the multi-speakers, thereby giving a definition of seven Korean vowel visemes. Various spatio-temporal features of a lip are extracted from the feature points located on both inner and outer lip contours of image sequences and their classification performances are evaluated by using a hidden Markov model based classifier for effective lipreading. The experimental results for recognizing the Korean visemes have demonstrated that the feature victor containing the information of inner and outer lip contours can be effectively applied to lipreading and also the direction and magnitude of the movement of a lip feature point over time is quite useful for Korean lipreading.
PDF KSCI

A New Temporal Filtering Method for Improved Automatic Lipreading (향상된 자동 독순을 위한 새로운 시간영역 필터링 기법)

Lee, Jong-Seok;Park, Cheol-Hoon
- The KIPS Transactions:PartB
- /
- v.15B no.2
- /
- pp.123-130
- /
- 2008
Automatic lipreading is to recognize speech by observing the movement of a speaker's lips. It has received attention recently as a method of complementing performance degradation of acoustic speech recognition in acoustically noisy environments. One of the important issues in automatic lipreading is to define and extract salient features from the recorded images. In this paper, we propose a feature extraction method by using a new filtering technique for obtaining improved recognition performance. The proposed method eliminates frequency components which are too slow or too fast compared to the relevant speech information by applying a band-pass filter to the temporal trajectory of each pixel in the images containing the lip region and, then, features are extracted by principal component analysis. We show that the proposed method produces improved performance in both clean and visually noisy conditions via speaker-independent recognition experiments.
https://doi.org/10.3745/KIPSTB.2008.15-B.2.123 인용 PDF KSCI

Design of an Efficient VLSI Architecture and Verification using FPGA-implementation for HMM(Hidden Markov Model)-based Robust and Real-time Lip Reading (HMM(Hidden Markov Model) 기반의 견고한 실시간 립리딩을 위한 효율적인 VLSI 구조 설계 및 FPGA 구현을 이용한 검증)

Lee Chi-Geun;Kim Myung-Hun;Lee Sang-Seol;Jung Sung-Tae
- Journal of the Korea Society of Computer and Information
- /
- v.11 no.2 s.40
- /
- pp.159-167
- /
- 2006
Lipreading has been suggested as one of the methods to improve the performance of speech recognition in noisy environment. However, existing methods are developed and implemented only in software. This paper suggests a hardware design for real-time lipreading. For real-time processing and feasible implementation, we decompose the lipreading system into three parts; image acquisition module, feature vector extraction module, and recognition module. Image acquisition module capture input image by using CMOS image sensor. The feature vector extraction module extracts feature vector from the input image by using parallel block matching algorithm. The parallel block matching algorithm is coded and simulated for FPGA circuit. Recognition module uses HMM based recognition algorithm. The recognition algorithm is coded and simulated by using DSP chip. The simulation results show that a real-time lipreading system can be implemented in hardware.
PDF

Region of Interest Extraction and Bilinear Interpolation Application for Preprocessing of Lipreading Systems (입 모양 인식 시스템 전처리를 위한 관심 영역 추출과 이중 선형 보간법 적용)

Jae Hyeok Han;Yong Ki Kim;Mi Hye Kim
- The Transactions of the Korea Information Processing Society
- /
- v.13 no.4
- /
- pp.189-198
- /
- 2024
Lipreading is one of the important parts of speech recognition, and several studies have been conducted to improve the performance of lipreading in lipreading systems for speech recognition. Recent studies have used method to modify the model architecture of lipreading system to improve recognition performance. Unlike previous research that improve recognition performance by modifying model architecture, we aim to improve recognition performance without any change in model architecture. In order to improve the recognition performance without modifying the model architecture, we refer to the cues used in human lipreading and set other regions such as chin and cheeks as regions of interest along with the lip region, which is the existing region of interest of lipreading systems, and compare the recognition rate of each region of interest to propose the highest performing region of interest In addition, assuming that the difference in normalization results caused by the difference in interpolation method during the process of normalizing the size of the region of interest affects the recognition performance, we interpolate the same region of interest using nearest neighbor interpolation, bilinear interpolation, and bicubic interpolation, and compare the recognition rate of each interpolation method to propose the best performing interpolation method. Each region of interest was detected by training an object detection neural network, and dynamic time warping templates were generated by normalizing each region of interest, extracting and combining features, and mapping the dimensionality reduction of the combined features into a low-dimensional space. The recognition rate was evaluated by comparing the distance between the generated dynamic time warping templates and the data mapped to the low-dimensional space. In the comparison of regions of interest, the result of the region of interest containing only the lip region showed an average recognition rate of 97.36%, which is 3.44% higher than the average recognition rate of 93.92% in the previous study, and in the comparison of interpolation methods, the bilinear interpolation method performed 97.36%, which is 14.65% higher than the nearest neighbor interpolation method and 5.55% higher than the bicubic interpolation method. The code used in this study can be found a https://github.com/haraisi2/Lipreading-Systems.
https://doi.org/10.3745/TKIPS.2024.13.4.189 인용 PDF

Robustness of Lipreading against the Variations of Rotation, Translation and Scaling

Min, Duk-Soo;Kim, Jin-Young;Park, Seung-Ho;Kim, Ki-Jung
- Proceedings of the IEEK Conference
- /
- 2000.07a
- /
- pp.15-18
- /
- 2000
In this study, we improve the performance of a speech recognition system of visual information depending on lip movements. This paper focuses on the robustness of the word recognition system with the rotation, transition and scaling of the lip images. The different methods of lipreading have been used to estimate the stability of recognition performance. Especially, we work out the special system of the log-polar mapping, which is called Mellin transform with quasi RTS-invariant and related approaches to machine vision. The results of word recognition are reported with HMM (Hidden Markov Model) recognition system.
PDF

Design and Implementation of a Real-Time Lipreading System Using PCA & HMM (PCA와 HMM을 이용한 실시간 립리딩 시스템의 설계 및 구현)

Lee chi-geun;Lee eun-suk;Jung sung-tae;Lee sang-seol
- Journal of Korea Multimedia Society
- /
- v.7 no.11
- /
- pp.1597-1609
- /
- 2004
A lot of lipreading system has been proposed to compensate the rate of speech recognition dropped in a noisy environment. Previous lipreading systems work on some specific conditions such as artificial lighting and predefined background color. In this paper, we propose a real-time lipreading system which allows the motion of a speaker and relaxes the restriction on the condition for color and lighting. The proposed system extracts face and lip region from input video sequence captured with a common PC camera and essential visual information in real-time. It recognizes utterance words by using the visual information in real-time. It uses the hue histogram model to extract face and lip region. It uses mean shift algorithm to track the face of a moving speaker. It uses PCA(Principal Component Analysis) to extract the visual information for learning and testing. Also, it uses HMM(Hidden Markov Model) as a recognition algorithm. The experimental results show that our system could get the recognition rate of 90% in case of speaker dependent lipreading and increase the rate of speech recognition up to 40～85% according to the noise level when it is combined with audio speech recognition.
PDF

Search Result 28, Processing Time 0.023 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)