• Title/Summary/Keyword: Lip Segmentation

Search Result 13, Processing Time 0.031 seconds

Segmentation of the Lip Region by Color Gamut Compression and Feature Projection (색역 압축과 특징치 투영을 이용한 입술영역 분할)

  • Kim, Jeong Yeop
    • Journal of Korea Multimedia Society
    • /
    • v.21 no.11
    • /
    • pp.1279-1287
    • /
    • 2018
  • In this paper, a new type of color coordinate conversion is proposed as modified CIEXYZ from RGB to compress the color gamut. The proposed segmentation includes principal component analysis for the optimal projection of a feature vector into a one-dimensional feature. The final step adopted for lip segmentation is Otsu's threshold for a two-class problem. The performance of the proposed method was better than that of conventional methods, especially for the chromatic feature.

Analysis of CIELuv Color feature for the Segmentation of the Lip Region (입술영역 분할을 위한 CIELuv 칼라 특징 분석)

  • Kim, Jeong Yeop
    • Journal of Korea Multimedia Society
    • /
    • v.22 no.1
    • /
    • pp.27-34
    • /
    • 2019
  • In this paper, a new type of lip feature is proposed as distance metric in CIELUV color system. The performance of the proposed feature was tested on face image database, Helen dataset from University of Illinois. The test processes consists of three steps. The first step is feature extraction and second step is principal component analysis for the optimal projection of a feature vector. The final step is Otsu's threshold for a two-class problem. The performance of the proposed feature was better than conventional features. Performance metrics for the evaluation are OverLap and Segmentation Error. Best performance for the proposed feature was OverLap of 65% and 59 % of segmentation error. Conventional methods shows 80~95% for OverLap and 5~15% of segmentation error usually. In conventional cases, the face database is well calibrated and adjusted with the same background and illumination for the scene. The Helen dataset used in this paper is not calibrated or adjusted at all. These images are gathered from internet and therefore, there are no calibration and adjustment.

Lip Feature Extraction using Contrast of YCbCr (YCbCr 농도 대비를 이용한 입술특징 추출)

  • Kim, Woo-Sung;Min, Kyung-Won;Ko, Han-Seok
    • Proceedings of the IEEK Conference
    • /
    • 2006.06a
    • /
    • pp.259-260
    • /
    • 2006
  • Since audio speech recognition is affected by noise in real environment, visual speech recognition is used to support speech recognition. For the visual speech recognition, this paper suggests the extraction of lip-feature using two types of image segmentation and reduced ASM. Input images are transformed to YCbCr based images and lips are segmented using the contrast of Y/Cb/Cr between lip and face. Subsequently, lip-shape model trained by PCA is placed on segmented lip region and then lip features are extracted using ASM.

  • PDF

Lip-Synch System Optimization Using Class Dependent SCHMM (클래스 종속 반연속 HMM을 이용한 립싱크 시스템 최적화)

  • Lee, Sung-Hee;Park, Jun-Ho;Ko, Han-Seok
    • The Journal of the Acoustical Society of Korea
    • /
    • v.25 no.7
    • /
    • pp.312-318
    • /
    • 2006
  • The conventional lip-synch system has a two-step process, speech segmentation and recognition. However, the difficulty of speech segmentation procedure and the inaccuracy of training data set due to the segmentation lead to a significant Performance degradation in the system. To cope with that, the connected vowel recognition method using Head-Body-Tail (HBT) model is proposed. The HBT model which is appropriate for handling relatively small sized vocabulary tasks reflects co-articulation effect efficiently. Moreover the 7 vowels are merged into 3 classes having similar lip shape while the system is optimized by employing a class dependent SCHMM structure. Additionally in both end sides of each word which has large variations, 8 components Gaussian mixture model is directly used to improve the ability of representation. Though the proposed method reveals similar performance with respect to the CHMM based on the HBT structure. the number of parameters is reduced by 33.92%. This reduction makes it a computationally efficient method enabling real time operation.

A Face Segmentation Algorithm Using Window (윈도우를 사용한 얼굴영역의 추출 기법)

  • 임성현;이철희
    • Proceedings of the IEEK Conference
    • /
    • 2000.11d
    • /
    • pp.45-48
    • /
    • 2000
  • In this paper, we propose a region-based segmentation algorithm to extract human face area using a window function and neural networks. Furthermore, we apply the erosion and dilation to remove small error areas. By applying the window function, it is possible to reduce error. In particular, false segmentation of the eye and the lip can be considerably reduced. Experiments show promising results and it is expected that the Proposed method can be applied to video conference and still image compression.

  • PDF

Support Vector Machine Based Phoneme Segmentation for Lip Synch Application

  • Lee, Kun-Young;Ko, Han-Seok
    • Speech Sciences
    • /
    • v.11 no.2
    • /
    • pp.193-210
    • /
    • 2004
  • In this paper, we develop a real time lip-synch system that activates 2-D avatar's lip motion in synch with an incoming speech utterance. To realize the 'real time' operation of the system, we contain the processing time by invoking merge and split procedures performing coarse-to-fine phoneme classification. At each stage of phoneme classification, we apply the support vector machine (SVM) to reduce the computational load while retraining the desired accuracy. The coarse-to-fine phoneme classification is accomplished via two stages of feature extraction: first, each speech frame is acoustically analyzed for 3 classes of lip opening using Mel Frequency Cepstral Coefficients (MFCC) as a feature; secondly, each frame is further refined in classification for detailed lip shape using formant information. We implemented the system with 2-D lip animation that shows the effectiveness of the proposed two-stage procedure in accomplishing a real-time lip-synch task. It was observed that the method of using phoneme merging and SVM achieved about twice faster speed in recognition than the method employing the Hidden Markov Model (HMM). A typical latency time per a single frame observed for our method was in the order of 18.22 milliseconds while an HMM method applied under identical conditions resulted about 30.67 milliseconds.

  • PDF

Lip Reading Method Using CNN for Utterance Period Detection (발화구간 검출을 위해 학습된 CNN 기반 입 모양 인식 방법)

  • Kim, Yong-Ki;Lim, Jong Gwan;Kim, Mi-Hye
    • Journal of Digital Convergence
    • /
    • v.14 no.8
    • /
    • pp.233-243
    • /
    • 2016
  • Due to speech recognition problems in noisy environment, Audio Visual Speech Recognition (AVSR) system, which combines speech information and visual information, has been proposed since the mid-1990s,. and lip reading have played significant role in the AVSR System. This study aims to enhance recognition rate of utterance word using only lip shape detection for efficient AVSR system. After preprocessing for lip region detection, Convolution Neural Network (CNN) techniques are applied for utterance period detection and lip shape feature vector extraction, and Hidden Markov Models (HMMs) are then used for the recognition. As a result, the utterance period detection results show 91% of success rates, which are higher performance than general threshold methods. In the lip reading recognition, while user-dependent experiment records 88.5%, user-independent experiment shows 80.2% of recognition rates, which are improved results compared to the previous studies.

The segmentation of Korean word for the lip-synch application (Lip-synch application을 위한 한국어 단어의 음소분할)

  • 강용성;고한석
    • Proceedings of the IEEK Conference
    • /
    • 2001.09a
    • /
    • pp.509-512
    • /
    • 2001
  • 본 논문은 한국어 음성에 대한 한국어 단어의 음소단위 분할을 목적으로 하였다. 대상 단어는 원광대학교 phonetic balanced 452단어 데이터 베이스를 사용하였고 분할 단위는 음성 전문가에 의해 구성된 44개의 음소셋을 사용하였다. 음소를 분할하기 위해 음성을 각각 프레임으로 나눈 후 각 프레임간의 스펙트럼 성분의 유사도를 측정한 후 측정한 유사도를 기준으로 음소의 분할점을 찾았다. 두 프레임 간의 유사도를 결정하기 위해 두 벡터 상호간의 유사성을 결정하는 방법중의 하나인 Lukasiewicz implication을 사용하였다. 본 실험에서는 기존의 프레임간 스펙트럼 성분의 유사도 측정을 이용한 하나의 어절의 유/무성음 분할 방법을 본 실험의 목적인 한국어 단어의 음소 분할 실험에 맞도록 수정하였다. 성능평가를 위해 음성 전문가에 의해 손으로 분할된 데이터와 본 실험을 통해 얻은 데이터와의 비교를 하여 평가를 하였다. 실험결과 전문가가 직접 손으로 분할한 데이터와 비교하여 32ms이내로 분할된 비율이 최고 84.76%를 나타내었다.

  • PDF

CNN-based Human Parsing Technique Using Pyramid Pooling (Pyramid pooling을 이용한 CNN 기반의 Human Parsing 기법)

  • Choi, Inkyu;Ko, min-soo;Song, hyok
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2018.11a
    • /
    • pp.97-98
    • /
    • 2018
  • 최근 딥러닝 기술의 발전으로 영상 분류 및 영상 내 객체 검출뿐만 아니라 CNN 기반의 segmentation 기술도 개발되어 다른 요소까지 포함한 직사각형 영역의 검출 영역이 아닌 경계까지 고려한 분리가 가능하게 되었다. 더불어 사람 영역을 신체부위나 의류 부분과 같은 세부 영역으로 나누어 분리하는 human parsing 기술까지 연구되고 있다. Human parsing은 의류스타일 분석 및 검색, 사람의 행동 인식 및 추적과 같은 분야에도 응용될 수 있다. 본 논문에서는 Spatial pyramid pooling layer를 이용하여 영상 전체에 대한 공간적 분포 및 특성 정보를 고려한 human parsing 기법을 제안한다. Look into person(LIP) dataset을 이용하여 기존의 다른 segmentation 및 human parsing 기법과 제안하는 기법을 비교하여 제안하는 기법의 human parsing 결과가 보다 정교한 분리가 가능한 것을 확인하였다.

  • PDF

Lip Contour Detection by Multi-Threshold (다중 문턱치를 이용한 입술 윤곽 검출 방법)

  • Kim, Jeong Yeop
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.9 no.12
    • /
    • pp.431-438
    • /
    • 2020
  • In this paper, the method to extract lip contour by multiple threshold is proposed. Spyridonos et. el. proposed a method to extract lip contour. First step is get Q image from transform of RGB into YIQ. Second step is to find lip corner points by change point detection and split Q image into upper and lower part by corner points. The candidate lip contour can be obtained by apply threshold to Q image. From the candidate contour, feature variance is calculated and the contour with maximum variance is adopted as final contour. The feature variance 'D' is based on the absolute difference near the contour points. The conventional method has 3 problems. The first one is related to lip corner point. Calculation of variance depends on much skin pixels and therefore the accuracy decreases and have effect on the split for Q image. Second, there is no analysis for color systems except YIQ. YIQ is a good however, other color systems such as HVS, CIELUV, YCrCb would be considered. Final problem is related to selection of optimal contour. In selection process, they used maximum of average feature variance for the pixels near the contour points. The maximum of variance causes reduction of extracted contour compared to ground contours. To solve the first problem, the proposed method excludes some of skin pixels and got 30% performance increase. For the second problem, HSV, CIELUV, YCrCb coordinate systems are tested and found there is no relation between the conventional method and dependency to color systems. For the final problem, maximum of total sum for the feature variance is adopted rather than the maximum of average feature variance and got 46% performance increase. By combine all the solutions, the proposed method gives 2 times in accuracy and stability than conventional method.