Search | Korea Science

A Study on Lip Detection based on Eye Localization for Visual Speech Recognition in Mobile Environment (모바일 환경에서의 시각 음성인식을 위한 눈 정위 기반 입술 탐지에 대한 연구)

Gyu, Song-Min;Pham, Thanh Trung;Kim, Jin-Young;Taek, Hwang-Sung
- Journal of the Korean Institute of Intelligent Systems
- /
- v.19 no.4
- /
- pp.478-484
- /
- 2009
Automatic speech recognition(ASR) is attractive technique in trend these day that seek convenient life. Although many approaches have been proposed for ASR but the performance is still not good in noisy environment. Now-a-days in the state of art in speech recognition, ASR uses not only the audio information but also the visual information. In this paper, We present a novel lip detection method for visual speech recognition in mobile environment. In order to apply visual information to speech recognition, we need to extract exact lip regions. Because eye-detection is more easy than lip-detection, we firstly detect positions of left and right eyes, then locate lip region roughly. After that we apply K-means clustering technique to devide that region into groups, than two lip corners and lip center are detected by choosing biggest one among clustered groups. Finally, we have shown the effectiveness of the proposed method through the experiments based on samsung AVSR database.
https://doi.org/10.5391/JKIIS.2009.19.4.478 인용 PDF KSCI

Lip Recognition Using Active Shape Model and Shape-Based Weighted Vector (능동적 형태 모델과 가중치 벡터를 이용한 입술 인식)

장경식
- Journal of Intelligence and Information Systems
- /
- v.8 no.1
- /
- pp.75-85
- /
- 2002
In this paper, we propose an efficient method for recognizing lip. Lip is localized by using the shape of lip and the pixel values around lip contour. The shape of lip is represented by a statistically based active shape model which learns typical lip shape from a training set. Because this model is affected by the initial position, we use a boundary between upper and lower lip as initial position for searching lip. The boundary is localized by using a weighted vector based on lip's shape. The experiments have been performed for many images, and show very encouraging result.
PDF

(Lip Recognition Using Active Shape Model and Gaussian Mixture Model) (Active Shape 모델과 Gaussian Mixture 모델을 이용한 입술 인식)

장경식;이임건
- Journal of KIISE:Software and Applications
- /
- v.30 no.5_6
- /
- pp.454-460
- /
- 2003
In this paper, we propose an efficient method for recognizing human lips. Based on Point Distribution Model, a lip shape is represented as a set of points. We calculate a lip model and the distribution of shape parameters using Principle Component Analysis and Gaussian mixture, respectively. The Expectation Maximization algorithm is used to determine the maximum likelihood parameter of Gaussian mixture. The lip contour model is derived by using the gray value changes at each point and in regions around the point and used to search the lip shape in a image. The experiments have been performed for many images, and show very encouraging result.
PDF KSCI

Design & Implementation of Lipreading System using Robust Lip Area Extraction (견고한 입술 영역 추출을 이용한 립리딩 시스템 설계 및 구현)

이은숙;이호근;이지근;김봉완;이상설;이용주;정성태
- Proceedings of the Korea Multimedia Society Conference
- /
- 2003.05b
- /
- pp.524-527
- /
- 2003
최근 들어 립리딩은 멀티모달 인터페이스 기술의 응용분야에서 많은 관심을 모으고 있다. 동적 영상을 이용한 립리딩 시스템에서 해결해야 할 주된 문제점은 상황 변화에 독립적인 얼굴 영역과 입술 영역을 추출하는 것이다. 본 논문에서는 움직임이 있는 영상에서 화자의 얼굴영역과 입술영역을 컬러, 조명등의 변화에 독립적으로 추출하기 위해 HSI 모델과 블록 매칭을 이용하였고 특징 점 추출에는 이미지 기반 방법인 PCA 기법을 이용하였다. 추출된 입술 파라미터와 음성 데이터에 각각 HMM 기반 패턴 인식 방법을 개별적으로 적용하여 단어를 인식하였고 각각의 인식 결과를 가중치를 주어 합병하였다. 실험 결과에 의하면 잡음으로 음성 인식률이 낮아지는 경우에 음성인식과 립리딩을 함께 사용함으로써 전체적인 인식 결과를 향상시킬 수 있었다.
PDF

An Efficient Lipreading Method Based on Lip's Symmetry (입술의 대칭성에 기반한 효율적인 립리딩 방법)

Kim, Jin-Bum;Kim, Jin-Young
- Journal of the Institute of Electronics Engineers of Korea SP
- /
- v.37 no.5
- /
- pp.105-114
- /
- 2000
In this paper, we concentrate on an efficient method to decrease a lot of pixel data to be processed with an Image transform based automatic lipreading It is reported that the image transform based approach, which obtains a compressed representation of the speaker's mouth, results in superior lipreading performance than the lip contour based approach But this approach produces so many feature parameters of the lip that has much data and requires much computation time for recognition To reduce the data to be computed, we propose a simple method folding at the vertical center of the lip-image based on the symmetry of the lip In addition, the principal component analysis(PCA) is used for fast algorithm and HMM word recognition results are reported The proposed method reduces the number of the feature parameters at $22{\sim}47%$ and improves hidden Markov model(HMM)word recognition rates at $2{\sim}3%$, using the folded lip-image compared with the normal method using $16{\times}16$ lip-image.
PDF

Real Time Lip Reading System Implementation in Embedded Environment (임베디드 환경에서의 실시간 립리딩 시스템 구현)

Kim, Young-Un;Kang, Sun-Kyung;Jung, Sung-Tae
- The KIPS Transactions:PartB
- /
- v.17B no.3
- /
- pp.227-232
- /
- 2010
This paper proposes the real time lip reading method in the embedded environment. The embedded environment has the limited sources to use compared to existing PC environment, so it is hard to drive the lip reading system with existing PC environment in the embedded environment in real time. To solve the problem, this paper suggests detection methods of lip region, feature extraction of lips, and awareness methods of phonetic words suitable to the embedded environment. First, it detects the face region by using face color information to find out the accurate lip region and then detects the exact lip region by finding the position of both eyes from the detected face region and using the geometric relations. To detect strong features of lighting variables by the changing surroundings, histogram matching, lip folding, and RASTA filter were applied, and the properties extracted by using the principal component analysis(PCA) were used for recognition. The result of the test has shown the processing speed between 1.15 and 2.35 sec. according to vocalizations in the embedded environment of CPU 806Mhz, RAM 128MB specifications and obtained 77% of recognition as 139 among 180 words were recognized.
https://doi.org/10.3745/KIPSTB.2010.17B.3.227 인용 PDF KSCI

Speech Activity Detection using Lip Movement Image Signals (입술 움직임 영상 선호를 이용한 음성 구간 검출)

Kim, Eung-Kyeu
- Journal of the Institute of Convergence Signal Processing
- /
- v.11 no.4
- /
- pp.289-297
- /
- 2010
In this paper, A method to prevent the external acoustic noise from being misrecognized as the speech recognition object is presented in the speech activity detection process for the speech recognition. Also this paper confirmed besides the acoustic energy to the lip movement image signals. First of all, the successive images are obtained through the image camera for personal computer and the lip movement whether or not is discriminated. The next, the lip movement image signal data is stored in the shared memory and shares with the speech recognition process. In the mean time, the acoustic energy whether or not by the utterance of a speaker is verified by confirming data stored in the shared memory in the speech activity detection process which is the preprocess phase of the speech recognition. Finally, as a experimental result of linking the speech recognition processor and the image processor, it is confirmed to be normal progression to the output of the speech recognition result if face to the image camera and speak. On the other hand, it is confirmed not to the output the result of the speech recognition if does not face to the image camera and speak. Also, the initial feature values under off-line are replaced by them. Similarly, the initial template image captured while off-line is replaced with a template image captured under on-line, so the discrimination of the lip movement image tracking is raised. An image processing test bed was implemented to confirm the lip movement image tracking process visually and to analyze the related parameters on a real-time basis. As a result of linking the speech and image processing system, the interworking rate shows 99.3% in the various illumination environments.
PDF KSCI

A Study on Enhancing the Performance of Detecting Lip Feature Points for Facial Expression Recognition Based on AAM (AAM 기반 얼굴 표정 인식을 위한 입술 특징점 검출 성능 향상 연구)

Han, Eun-Jung;Kang, Byung-Jun;Park, Kang-Ryoung
- The KIPS Transactions:PartB
- /
- v.16B no.4
- /
- pp.299-308
- /
- 2009
AAM(Active Appearance Model) is an algorithm to extract face feature points with statistical models of shape and texture information based on PCA(Principal Component Analysis). This method is widely used for face recognition, face modeling and expression recognition. However, the detection performance of AAM algorithm is sensitive to initial value and the AAM method has the problem that detection error is increased when an input image is quite different from training data. Especially, the algorithm shows high accuracy in case of closed lips but the detection error is increased in case of opened lips and deformed lips according to the facial expression of user. To solve these problems, we propose the improved AAM algorithm using lip feature points which is extracted based on a new lip detection algorithm. In this paper, we select a searching region based on the face feature points which are detected by AAM algorithm. And lip corner points are extracted by using Canny edge detection and histogram projection method in the selected searching region. Then, lip region is accurately detected by combining color and edge information of lip in the searching region which is adjusted based on the position of the detected lip corners. Based on that, the accuracy and processing speed of lip detection are improved. Experimental results showed that the RMS(Root Mean Square) error of the proposed method was reduced as much as 4.21 pixels compared to that only using AAM algorithm.
https://doi.org/10.3745/KIPSTB.2009.16-B.4.299 인용 PDF KSCI

A Study on the ID Visual System (개인식별을 위한 영상시스템 연구)

심정범;이진행;송현교;강민구
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 1998.11a
- /
- pp.208-213
- /
- 1998
사회구조가 복잡해질수록 보안(Security)의 확보는 점차 중요한 사회문제로 대두되고 있다. 보안의 문제에서 가장 중요한 것이 각 개인의 본인 여부를 정확하고 신속하게 판별할 수 있는 자동화된 인증(Authentication) 기술의 개발 여부라고 할 수 있다. 이를 위해 사용되는 개인식별은 신체의 일부를 이용한 지문인식, 두개골함성, 장문인식, 족적인식, 입술인식, 홍채인식, 골격인식 등 불변하는 신체의 특징을 이용하는 연구가 주도적이었다. 본 연구에서는 개인식별에 관한 총체적인 영상시스템을 위한 영상처리 자료를 정리한다.
PDF

Comparison of Integration Methods of Speech and Lip Information in the Bi-modal Speech Recognition (바이모달 음성인식의 음성정보와 입술정보 결합방법 비교)

박병구;김진영;최승호
- The Journal of the Acoustical Society of Korea
- /
- v.18 no.4
- /
- pp.31-37
- /
- 1999
A bimodal speech recognition using visual and audio information has been proposed and researched to improve the performance of ASR(Automatic Speech Recognition) system in noisy environments. The integration method of two modalities can be usually classified into an early integration and a late integration. The early integration method includes a method using a fixed weight of lip parameters and a method using a variable weight according to speech SNR information. The 4 late integration methods are a method using audio and visual information independently, a method using speech optimal path, a method using lip optimal path and a way using speech SNR information. Among these 6 methods, the method using the fixed weight of lip parameter showed a better recognition rate.
PDF

Search Result 93, Processing Time 0.026 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)