• 제목/요약/키워드: lip information

검색결과 195건 처리시간 0.03초

A Study of the Pattern Kernels for a Lip Print Recognition

  • Paik, Kyoung-Seok;Chung, Chin-Hyun
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 제어로봇시스템학회 1998년도 제13차 학술회의논문집
    • /
    • pp.64-69
    • /
    • 1998
  • This paper presents a lip print recognition by the pattern kernels for a personal identification. A lip print recognition is developed less than the other physical attributes of a fingerprint, a voice pattern, a retinal blood/vessel pattern, or a facial recognition. A new method is proposed to recognize a lip print bi the pattern kernels. The pattern kernels are a function consisted of some local lip print pattern masks. This function converts the information on a lip print into the digital data. The recognition in the multi-resolution system is more reliable than recognition in the single-resolution system. The results show that the proposed algorithm by the multi-resolution architecture can be efficiently realized.

  • PDF

Support Vector Machine Based Phoneme Segmentation for Lip Synch Application

  • Lee, Kun-Young;Ko, Han-Seok
    • 음성과학
    • /
    • 제11권2호
    • /
    • pp.193-210
    • /
    • 2004
  • In this paper, we develop a real time lip-synch system that activates 2-D avatar's lip motion in synch with an incoming speech utterance. To realize the 'real time' operation of the system, we contain the processing time by invoking merge and split procedures performing coarse-to-fine phoneme classification. At each stage of phoneme classification, we apply the support vector machine (SVM) to reduce the computational load while retraining the desired accuracy. The coarse-to-fine phoneme classification is accomplished via two stages of feature extraction: first, each speech frame is acoustically analyzed for 3 classes of lip opening using Mel Frequency Cepstral Coefficients (MFCC) as a feature; secondly, each frame is further refined in classification for detailed lip shape using formant information. We implemented the system with 2-D lip animation that shows the effectiveness of the proposed two-stage procedure in accomplishing a real-time lip-synch task. It was observed that the method of using phoneme merging and SVM achieved about twice faster speed in recognition than the method employing the Hidden Markov Model (HMM). A typical latency time per a single frame observed for our method was in the order of 18.22 milliseconds while an HMM method applied under identical conditions resulted about 30.67 milliseconds.

  • PDF

발화구간 검출을 위해 학습된 CNN 기반 입 모양 인식 방법 (Lip Reading Method Using CNN for Utterance Period Detection)

  • 김용기;임종관;김미혜
    • 디지털융복합연구
    • /
    • 제14권8호
    • /
    • pp.233-243
    • /
    • 2016
  • 소음환경에서의 음성인식 문제점으로 인해 1990년대 중반부터 음성정보와 영양정보를 결합한 AVSR(Audio Visual Speech Recognition) 시스템이 제안되었고, Lip Reading은 AVSR 시스템에서 시각적 특징으로 사용되었다. 본 연구는 효율적인 AVSR 시스템을 구축하기 위해 입 모양만을 이용한 발화 단어 인식률을 극대화하는데 목적이 있다. 본 연구에서는 입 모양 인식을 위해 실험단어를 발화한 입력 영상으로부터 영상의 전처리 과정을 수행하고 입술 영역을 검출한다. 이후 DNN(Deep Neural Network)의 일종인 CNN(Convolution Neural Network)을 이용하여 발화구간을 검출하고, 동일한 네트워크를 사용하여 입 모양 특징 벡터를 추출하여 HMM(Hidden Markov Mode)으로 인식 실험을 진행하였다. 그 결과 발화구간 검출 결과는 91%의 인식률을 보임으로써 Threshold를 이용한 방법에 비해 높은 성능을 나타냈다. 또한 입모양 인식 실험에서 화자종속 실험은 88.5%, 화자 독립 실험은 80.2%로 이전 연구들에 비해 높은 결과를 보였다.

바이모달 음성인식의 음성정보와 입술정보 결합방법 비교 (Comparison of Integration Methods of Speech and Lip Information in the Bi-modal Speech Recognition)

  • 박병구;김진영;최승호
    • 한국음향학회지
    • /
    • 제18권4호
    • /
    • pp.31-37
    • /
    • 1999
  • 잡음환경에서 음성인식 시스템의 성능을 향상시키기 위해서 영상정보와 음성정보를 이용한 바이모달(bimodal)음성인식이 제안되어왔다. 영상정보와 음성정보의 결합방식에는 크게 분류하여 인식 전 결합방식과 인식 후 결합방식이 있다. 인식 전 결합방식에서는 고정된 입술파라미터 중요도를 이용한 결합방법과 음성의 신호 대 잡음비 정보에 따라 가변 입술 파라미터 중요도를 이용하여 결합하는 방법을 비교하였고, 인식 후 결합방식에서는 영상정보와 음성정보를 독립적으로 결합하는 방법, 음성 최소거리 경로정보를 영상인식에 이용 결합하는 방법, 영상 최소거리 경로정보를 음성인식에 이용 결합하는 방법, 그리고 음성의 신호 대 잡음비 정보를 이용하여 결합하는 방법을 비교했다. 6가지 결합방법 중 인식 전 결합방법인 파라미터 중요도를 이용한 결합방법이 가장 좋은 인식결과를 보였다.

  • PDF

An Experimental Multimodal Command Control Interface toy Car Navigation Systems

  • Kim, Kyungnam;Ko, Jong-Gook;SeungHo choi;Kim, Jin-Young;Kim, Ki-Jung
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2000년도 ITC-CSCC -1
    • /
    • pp.249-252
    • /
    • 2000
  • An experimental multimodal system combining natural input modes such as speech, lip movement, and gaze is proposed in this paper. It benefits from novel human-compute. interaction (HCI) modalities and from multimodal integration for tackling the problem of the HCI bottleneck. This system allows the user to select menu items on the screen by employing speech recognition, lip reading, and gaze tracking components in parallel. Face tracking is a supplementary component to gaze tracking and lip movement analysis. These key components are reviewed and preliminary results are shown with multimodal integration and user testing on the prototype system. It is noteworthy that the system equipped with gaze tracking and lip reading is very effective in noisy environment, where the speech recognition rate is low, moreover, not stable. Our long term interest is to build a user interface embedded in a commercial car navigation system (CNS).

  • PDF

Text-driven Speech Animation with Emotion Control

  • Chae, Wonseok;Kim, Yejin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제14권8호
    • /
    • pp.3473-3487
    • /
    • 2020
  • In this paper, we present a new approach to creating speech animation with emotional expressions using a small set of example models. To generate realistic facial animation, two example models called key visemes and expressions are used for lip-synchronization and facial expressions, respectively. The key visemes represent lip shapes of phonemes such as vowels and consonants while the key expressions represent basic emotions of a face. Our approach utilizes a text-to-speech (TTS) system to create a phonetic transcript for the speech animation. Based on a phonetic transcript, a sequence of speech animation is synthesized by interpolating the corresponding sequence of key visemes. Using an input parameter vector, the key expressions are blended by a method of scattered data interpolation. During the synthesizing process, an importance-based scheme is introduced to combine both lip-synchronization and facial expressions into one animation sequence in real time (over 120Hz). The proposed approach can be applied to diverse types of digital content and applications that use facial animation with high accuracy (over 90%) in speech recognition.

Robust Lip Extraction and Tracking of the Mouth Region

  • Min, Duk-Soo;Kim, Jin-Young;Park, Seung-Ho;Kim, Ki-Jung
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2000년도 ITC-CSCC -2
    • /
    • pp.927-930
    • /
    • 2000
  • Visual features of lip area play an important role in the visual speech information. We are concerned about correct lip area as region of interest (ROI). In this paper, we propose a robust and fast method for locating the mouth corners. Also, we define a region of interest at mouth during speech. A method, which we have used, only uses the horizontal and vertical image operators at mouth area. This searching is performed by fitting the ROI-template to image with illumination control. Most of the lip extraction algorithms are dependent on luminosity of image. We just used the binary image where the variable threshold is applied. The variable threshold varies to illumination condition. In order to control those variations, the gray-tone is converted to binary image by threshold, which is obtained through Multiple Linear Regression Analysis (MLRA) about divided 2D special region. Thus we obtained the region of interest at mouth area, which is the robust extraction about illumination. A region of interest is automatically extracted.

  • PDF

Adaptive Background Modeling Considering Stationary Object and Object Detection Technique based on Multiple Gaussian Distribution

  • Jeong, Jongmyeon;Choi, Jiyun
    • 한국컴퓨터정보학회논문지
    • /
    • 제23권11호
    • /
    • pp.51-57
    • /
    • 2018
  • In this paper, we studied about the extraction of the parameter and implementation of speechreading system to recognize the Korean 8 vowel. Face features are detected by amplifying, reducing the image value and making a comparison between the image value which is represented for various value in various color space. The eyes position, the nose position, the inner boundary of lip, the outer boundary of upper lip and the outer line of the tooth is found to the feature and using the analysis the area of inner lip, the hight and width of inner lip, the outer line length of the tooth rate about a inner mouth area and the distance between the nose and outer boundary of upper lip are used for the parameter. 2400 data are gathered and analyzed. Based on this analysis, the neural net is constructed and the recognition experiments are performed. In the experiment, 5 normal persons were sampled. The observational error between samples was corrected using normalization method. The experiment show very encouraging result about the usefulness of the parameter.

복수 해상도 시스템의 Pattern Kernels에 의한 Lip Print 인식에 관한 연구 (A Study on Lip Print Recognition by using Pattern Kernels in Multi-Resolution Architecture)

  • 백경석;정진현
    • 정보처리학회논문지B
    • /
    • 제8B권2호
    • /
    • pp.189-194
    • /
    • 2001
  • 본 논문에서는 개인 식별을 위하여 복수 해상도 구조를 제시하였고 이 방법으로 구순문 인식을 구현하였다. 구순문 인식은 지문, 음성 패턴, 홍채 패턴과 얼굴 인식과 같은 신체적 특징에 비하여 상대적으로 연구가 많이 이루어지지 않은 신체적 특징이다. 구순문은 CCD 카메라를 이용할 경우 홍채나 얼굴 패턴 같은 다른 특징 요소와 연결하여 인식 시스템을 구축할 수 있는 장점을 가지고 있다. 구순문 인식을 위해 pattern kernels를 이용한 새로운 방법을 제시하였다. Pattern kernels는 여러 개의 local lip print mask들로 구성된 함수이며, lip print의 정보를 디지털 데이터로 전환시켜 준다. 복수 해상도를 가지는 인식 시스템은 단일 해상도의 시스템보다 더욱 신뢰적이며 인식률도 높다.

  • PDF