• Title/Summary/Keyword: lip tracking

Search Result 11, Processing Time 0.029 seconds

An Experimental Multimodal Command Control Interface toy Car Navigation Systems

  • Kim, Kyungnam;Ko, Jong-Gook;SeungHo choi;Kim, Jin-Young;Kim, Ki-Jung
    • Proceedings of the IEEK Conference
    • /
    • 2000.07a
    • /
    • pp.249-252
    • /
    • 2000
  • An experimental multimodal system combining natural input modes such as speech, lip movement, and gaze is proposed in this paper. It benefits from novel human-compute. interaction (HCI) modalities and from multimodal integration for tackling the problem of the HCI bottleneck. This system allows the user to select menu items on the screen by employing speech recognition, lip reading, and gaze tracking components in parallel. Face tracking is a supplementary component to gaze tracking and lip movement analysis. These key components are reviewed and preliminary results are shown with multimodal integration and user testing on the prototype system. It is noteworthy that the system equipped with gaze tracking and lip reading is very effective in noisy environment, where the speech recognition rate is low, moreover, not stable. Our long term interest is to build a user interface embedded in a commercial car navigation system (CNS).

  • PDF

Lip Detection using Color Distribution and Support Vector Machine for Visual Feature Extraction of Bimodal Speech Recognition System (바이모달 음성인식기의 시각 특징 추출을 위한 색상 분석자 SVM을 이용한 입술 위치 검출)

  • 정지년;양현승
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.4
    • /
    • pp.403-410
    • /
    • 2004
  • Bimodal speech recognition systems have been proposed for enhancing recognition rate of ASR under noisy environments. Visual feature extraction is very important to develop these systems. To extract visual features, it is necessary to detect exact lip position. This paper proposed the method that detects a lip position using color similarity model and SVM. Face/Lip color distribution is teamed and the initial lip position is found by using that. The exact lip position is detected by scanning neighbor area with SVM. By experiments, it is shown that this method detects lip position exactly and fast.

Word-boundary and rate effects on upper and lower lip movements in the articulation of the bilabial stop /p/ in Korean

  • Son, Minjung
    • Phonetics and Speech Sciences
    • /
    • v.10 no.1
    • /
    • pp.23-31
    • /
    • 2018
  • In this study, we examined how the upper and lower lips articulate to produce labial /p/. Using electromagnetic midsagittal articulography, we collected flesh-point tracking movement data from eight native speakers of Seoul Korean (five females and three males). Individual articulatory movements in /p/ were examined in terms of minimum vertical upper lip position, maximum vertical lower lip position, and corresponding vertical upper lip position aligned with maximum vertical lower lip position. Using linear mixed-effect models, we tested two factors (word boundary [across-word vs. within-word] and speech rate [comfortable vs. fast]) and their interaction, considering subjects as random effects. The results are summarized as follows. First, maximum lower lip position varied with different word boundaries and speech rates, but no interaction was detected. In particular, maximum lower lip position was lower (e.g., less constricted or more reduced) in fast rate condition and across-word boundary condition. Second, minimum lower lip position, as well as lower lip position, measured at the time of maximum lower lip position only varied with different word boundaries, showing that they were consistently lower in across-word condition. We provide further empirical evidence of lower lip movement sensitive to both different word boundaries (e.g., linguistic factor) and speech rates (e.g., paralinguistic factor); this supports the traditional idea that the lower lip is an actively moving articulator. The sensitivity of upper lip movement is also observed with different word boundaries; this counters the traditional idea that the upper lip is the target area, which presupposes immobility. Taken together, the lip aperture gesture is a good indicator that takes into account upper and lower lip vertical movements, compared to the traditional approach that distinguishes a movable articulator from target place. Respective of different speech rates, the results of the present study patterned with cross-linguistic lenition-related allophonic variation, which is known to be more sensitive to fast rate.

Robust Lip Extraction and Tracking of the Mouth Region

  • Min, Duk-Soo;Kim, Jin-Young;Park, Seung-Ho;Kim, Ki-Jung
    • Proceedings of the IEEK Conference
    • /
    • 2000.07b
    • /
    • pp.927-930
    • /
    • 2000
  • Visual features of lip area play an important role in the visual speech information. We are concerned about correct lip area as region of interest (ROI). In this paper, we propose a robust and fast method for locating the mouth corners. Also, we define a region of interest at mouth during speech. A method, which we have used, only uses the horizontal and vertical image operators at mouth area. This searching is performed by fitting the ROI-template to image with illumination control. Most of the lip extraction algorithms are dependent on luminosity of image. We just used the binary image where the variable threshold is applied. The variable threshold varies to illumination condition. In order to control those variations, the gray-tone is converted to binary image by threshold, which is obtained through Multiple Linear Regression Analysis (MLRA) about divided 2D special region. Thus we obtained the region of interest at mouth area, which is the robust extraction about illumination. A region of interest is automatically extracted.

  • PDF

Speech Activity Detection using Lip Movement Image Signals (입술 움직임 영상 선호를 이용한 음성 구간 검출)

  • Kim, Eung-Kyeu
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.11 no.4
    • /
    • pp.289-297
    • /
    • 2010
  • In this paper, A method to prevent the external acoustic noise from being misrecognized as the speech recognition object is presented in the speech activity detection process for the speech recognition. Also this paper confirmed besides the acoustic energy to the lip movement image signals. First of all, the successive images are obtained through the image camera for personal computer and the lip movement whether or not is discriminated. The next, the lip movement image signal data is stored in the shared memory and shares with the speech recognition process. In the mean time, the acoustic energy whether or not by the utterance of a speaker is verified by confirming data stored in the shared memory in the speech activity detection process which is the preprocess phase of the speech recognition. Finally, as a experimental result of linking the speech recognition processor and the image processor, it is confirmed to be normal progression to the output of the speech recognition result if face to the image camera and speak. On the other hand, it is confirmed not to the output the result of the speech recognition if does not face to the image camera and speak. Also, the initial feature values under off-line are replaced by them. Similarly, the initial template image captured while off-line is replaced with a template image captured under on-line, so the discrimination of the lip movement image tracking is raised. An image processing test bed was implemented to confirm the lip movement image tracking process visually and to analyze the related parameters on a real-time basis. As a result of linking the speech and image processing system, the interworking rate shows 99.3% in the various illumination environments.

Functions and Driving Mechanisms for Face Robot Buddy (얼굴로봇 Buddy의 기능 및 구동 메커니즘)

  • Oh, Kyung-Geune;Jang, Myong-Soo;Kim, Seung-Jong;Park, Shin-Suk
    • The Journal of Korea Robotics Society
    • /
    • v.3 no.4
    • /
    • pp.270-277
    • /
    • 2008
  • The development of a face robot basically targets very natural human-robot interaction (HRI), especially emotional interaction. So does a face robot introduced in this paper, named Buddy. Since Buddy was developed for a mobile service robot, it doesn't have a living-being like face such as human's or animal's, but a typically robot-like face with hard skin, which maybe suitable for mass production. Besides, its structure and mechanism should be simple and its production cost also should be low enough. This paper introduces the mechanisms and functions of mobile face robot named Buddy which can take on natural and precise facial expressions and make dynamic gestures driven by one laptop PC. Buddy also can perform lip-sync, eye-contact, face-tracking for lifelike interaction. By adopting a customized emotional reaction decision model, Buddy can create own personality, emotion and motive using various sensor data input. Based on this model, Buddy can interact probably with users and perform real-time learning using personality factors. The interaction performance of Buddy is successfully demonstrated by experiments and simulations.

  • PDF

Understanding the Importance of Presenting Facial Expressions of an Avatar in Virtual Reality

  • Kim, Kyulee;Joh, Hwayeon;Kim, Yeojin;Park, Sohyeon;Oh, Uran
    • International journal of advanced smart convergence
    • /
    • v.11 no.4
    • /
    • pp.120-128
    • /
    • 2022
  • While online social interactions have been more prevalent with the increased popularity of Metaverse platforms, little has been studied the effects of facial expressions in virtual reality (VR), which is known to play a key role in social contexts. To understand the importance of presenting facial expressions of a virtual avatar under different contexts, we conducted a user study with 24 participants where they were asked to have a conversation and play a charades game with an avatar with and without facial expressions. The results show that participants tend to gaze at the face region for the majority of the time when having a conversation or trying to guess emotion-related keywords when playing charades regardless of the presence of facial expressions. Yet, we confirmed that participants prefer to see facial expressions in virtual reality as well as in real-world scenarios as it helps them to better understand the contexts and to have more immersive and focused experiences.

A Tracking Method of Robust Lip Movement Image Regions for Blocking the External Acoustic Noise (외부응향잡음 차단을 위한 강인한 입술움직임 영상영역 추적방법)

  • Kim, Eung-Kyeu
    • Proceedings of the KIEE Conference
    • /
    • 2009.07a
    • /
    • pp.1913_1914
    • /
    • 2009
  • 본 논문에서 조명환경하에서 음성/영상 연동시스템을 통해서 외부음향잡음의 차단을 위한 강인한 입술움직임 영상영역을 추적하는 한 가지 방법을 제안한다. 조명환경하에서 강인한 입술움직임 영상영역을 추적하기 위해 온라인상에서 입술움직임 표준영상을 수집하였고 다양한 조명환경에 적응하는 입술 움직임 영상의 특징들을 추출하였다. 동시에 온라인 템플릿 영상을 획득하였고, 이 영상들을 템플릿 정합을 위해 사용했다. 음성/영상처리시스템의 연동결과, 다양한 조명환경하에서 그 연동률을 99.3%까지 높일 수 있었고 음향잡음에 의한 음성인식 실행을 원천적으로 차단할 수 있었다.

  • PDF

Facial Feature Tracking from a General USB PC Camera (범용 USB PC 카메라를 이용한 얼굴 특징점의 추적)

  • 양정석;이칠우
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2001.10b
    • /
    • pp.412-414
    • /
    • 2001
  • In this paper, we describe an real-time facial feature tracker. We only used a general USB PC Camera without a frame grabber. The system has achieved a rate of 8+ frames/second without any low-level library support. It tracks pupils, nostrils and corners of the lip. The signal from USB Camera is YUV 4:2:0 vertical Format. we converted the signal into RGB color model to display the image and We interpolated V channel of the signal to be used for extracting a facial region. and we analysis 2D blob features in the Y channel, the luminance of the image with geometric restriction to locate each facial feature within the detected facial region. Our method is so simple and intuitive that we can make the system work in real-time.

  • PDF