Search | Korea Science

A Study on Analysis of Variant Factors of Recognition Performance for Lip-reading at Dynamic Environment (동적 환경에서의 립리딩 인식성능저하 요인분석에 대한 연구)

신도성;김진영;이주헌
- The Journal of the Acoustical Society of Korea
- /
- v.21 no.5
- /
- pp.471-477
- /
- 2002
Recently, lip-reading has been studied actively as an auxiliary method of automatic speech recognition(ASR) in noisy environments. However, almost of research results were obtained based on the database constructed in indoor condition. So, we dont know how developed lip-reading algorithms are robust to dynamic variation of image. Currently we have developed a lip-reading system based on image-transform based algorithm. This system recognize 22 words and this word recognizer achieves word recognition of up to 53.54%. In this paper we present how stable the lip-reading system is in environmental variance and what the main variant factors are about dropping off in word-recognition performance. For studying lip-reading robustness we consider spatial valiance (translation, rotation, scaling) and illumination variance. Two kinds of test data are used. One Is the simulated lip image database and the other is real dynamic database captured in car environment. As a result of our experiment, we show that the spatial variance is one of degradations factors of lip reading performance. But the most important factor of degradation is not the spatial variance. The illumination variances make severe reduction of recognition rates as much as 70%. In conclusion, robust lip reading algorithms against illumination variances should be developed for using lip reading as a complementary method of ASR.
PDF KSCI

Real-Time Multiple Face Detection Using Active illumination (능동적 조명을 이용한 실시간 복합 얼굴 검출)

한준희;심재창;설증보;나상동;배철수
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2003.05a
- /
- pp.155-160
- /
- 2003
This paper presents a multiple face detector based on a robust pupil detection technique. The pupil detector uses active illumination that exploits the retro-reflectivity property of eyes to facilitate detection. The detection range of this method is appropriate for interactive desktop and kiosk applications. Once the location of the pupil candidates are computed, the candidates are filtered and grouped into pairs that correspond to faces using heuristic rules. To demonstrate the robustness of the face detection technique, a dual mode face tracker was developed, which is initialized with the most salient detected face. Recursive estimators are used to guarantee the stability of the process and combine the measurements from the multi-face detector and a feature correlation tracker. The estimated position of the face is used to control a pan-tilt servo mechanism in real-time, that moves the camera to keep the tracked face always centered in the image.
PDF

Three Dimensional Object Recognition using PCA and KNN (peA 와 KNN를 이용한 3차원 물체인식)

Lee, Kee-Jun
- The Journal of the Korea Contents Association
- /
- v.9 no.8
- /
- pp.57-63
- /
- 2009
Object recognition technologies using PCA(principal component analysis) recognize objects by deciding representative features of objects in the model image, extracting feature vectors from objects in a image and measuring the distance between them and object representation. Given frequent recognition problems associated with the use of point-to-point distance approach, this study adopted the k-nearest neighbor technique(class-to-class) in which a group of object models of the same class is used as recognition unit for the images in-putted on a continual input image. However, the robustness of recognition strategies using PCA depends on several factors, including illumination. When scene constancy is not secured due to varying illumination conditions, the learning performance the feature detector can be compromised, undermining the recognition quality. This paper proposes a new PCA recognition in which database of objects can be detected under different illuminations between input images and the model images.
https://doi.org/10.5392/JKCA.2009.9.8.057 인용 PDF

Object Detection Using Combined Random Fern for RGB-D Image Format (RGB-D 영상 포맷을 위한 결합형 무작위 Fern을 이용한 객체 검출)

Lim, Seung-Ouk;Kim, Yu-Seon;Lee, Si-Woong
- The Journal of the Korea Contents Association
- /
- v.16 no.9
- /
- pp.451-459
- /
- 2016
While an object detection algorithm plays a key role in many computer vision applications, it requires extensive computation to show robustness under varying lightning and geometrical distortions. Recently, some approaches formulate the problem in a classification framework and show improved performances in object recognition. Among them, random fern algorithm drew a lot of attention because of its simple structure and high recognition rates. However, it reveals performance degradation under the illumination changes and noise addition, since it computes patch features based only on pixel intensities. In this paper, we propose a new structure of combined random fern which incorporates depth information into the conventional random fern reflecting 3D structure of the patch. In addition, a new structure of object tracker which exploits the combined random fern is also introduced. Experiments show that the proposed method provides superior performance of object detection under illumination change and noisy condition compared to the conventional methods.
https://doi.org/10.5392/JKCA.2016.16.09.451 인용 PDF KSCI

Visual Voice Activity Detection and Adaptive Threshold Estimation for Speech Recognition (음성인식기 성능 향상을 위한 영상기반 음성구간 검출 및 적응적 문턱값 추정)

Song, Taeyup;Lee, Kyungsun;Kim, Sung Soo;Lee, Jae-Won;Ko, Hanseok
- The Journal of the Acoustical Society of Korea
- /
- v.34 no.4
- /
- pp.321-327
- /
- 2015
In this paper, we propose an algorithm for achieving robust Visual Voice Activity Detection (VVAD) for enhanced speech recognition. In conventional VVAD algorithms, the motion of lip region is found by applying an optical flow or Chaos inspired measures for detecting visual speech frames. The optical flow-based VVAD is difficult to be adopted to driving scenarios due to its computational complexity. While invariant to illumination changes, Chaos theory based VVAD method is sensitive to motion translations caused by driver's head movements. The proposed Local Variance Histogram (LVH) is robust to the pixel intensity changes from both illumination change and translation change. Hence, for improved performance in environmental changes, we adopt the novel threshold estimation using total variance change. In the experimental results, the proposed VVAD algorithm achieves robustness in various driving situations.
https://doi.org/10.7776/ASK.2015.34.4.321 인용 PDF KSCI

Development of Tracking Equipment for RealTime Multiple Face Detection (실시간 복합 얼굴 검출을 위한 추적 장치 개발)

나상동;송선희;나하선;김천석;배철수
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.7 no.8
- /
- pp.1823-1830
- /
- 2003
This paper presents a multiple face detector based on a robust pupil detection technique. The pupil detector uses active illumination that exploits the retroreflectivity property of eyes to facilitate detection. The detection range of this method is appropriate for interactive desktop and kiosk applications. Once the location of the pupil candidates are computed, the candidates are filtered and grouped into pairs that correspond to faces using heuristic rules. To demonstrate the robustness of the face detection technique, a dual mode face tracker was developed, which is initialized with the most salient detected face. Recursive estimators are used to guarantee the stability of the process and combine the measurements from the multiface detector and a feature correlation tracker. The estimated position of the face is used to control a pantilt servo mechanism in realtime, that moves the camera to keep the tracked face always centered in the image.
PDF KSCI

Multimodal Biometrics Recognition from Facial Video with Missing Modalities Using Deep Learning

Maity, Sayan;Abdel-Mottaleb, Mohamed;Asfour, Shihab S.
- Journal of Information Processing Systems
- /
- v.16 no.1
- /
- pp.6-29
- /
- 2020
Biometrics identification using multiple modalities has attracted the attention of many researchers as it produces more robust and trustworthy results than single modality biometrics. In this paper, we present a novel multimodal recognition system that trains a deep learning network to automatically learn features after extracting multiple biometric modalities from a single data source, i.e., facial video clips. Utilizing different modalities, i.e., left ear, left profile face, frontal face, right profile face, and right ear, present in the facial video clips, we train supervised denoising auto-encoders to automatically extract robust and non-redundant features. The automatically learned features are then used to train modality specific sparse classifiers to perform the multimodal recognition. Moreover, the proposed technique has proven robust when some of the above modalities were missing during the testing. The proposed system has three main components that are responsible for detection, which consists of modality specific detectors to automatically detect images of different modalities present in facial video clips; feature selection, which uses supervised denoising sparse auto-encoders network to capture discriminative representations that are robust to the illumination and pose variations; and classification, which consists of a set of modality specific sparse representation classifiers for unimodal recognition, followed by score level fusion of the recognition results of the available modalities. Experiments conducted on the constrained facial video dataset (WVU) and the unconstrained facial video dataset (HONDA/UCSD), resulted in a 99.17% and 97.14% Rank-1 recognition rates, respectively. The multimodal recognition accuracy demonstrates the superiority and robustness of the proposed approach irrespective of the illumination, non-planar movement, and pose variations present in the video clips even in the situation of missing modalities.
https://doi.org/10.3745/JIPS.02.0129 인용 PDF KSCI

Adaptive V1-MT model for motion perception

Li, Shuai;Fan, Xiaoguang;Xu, Yuelei;Huang, Jinke
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.13 no.1
- /
- pp.371-384
- /
- 2019
Motion perception has been tremendously improved in neuroscience and computer vision. The baseline motion perception model is mediated by the dorsal visual pathway involving the cortex areas the primary visual cortex (V1) and the middle temporal (V5 or MT) visual area. However, few works have been done on the extension of neural models to improve the efficacy and robustness of motion perception of real sequences. To overcome shortcomings in situations, such as varying illumination and large displacement, an adaptive V1-MT motion perception (Ad-V1MTMP) algorithm enriched to deal with real sequences is proposed and analyzed. First, the total variation semi-norm model based on Gabor functions (TV-Gabor) for structure-texture decomposition is performed to manage the illumination and color changes. And then, we study the impact of image local context, which is processed in extra-striate visual areas II (V2), on spatial motion integration by MT neurons, and propose a V1-V2 method to extract the image contrast information at a given location. Furthermore, we take feedback inputs from V2 into account during the polling stage. To use the algorithm on natural scenes, finally, multi-scale approach has been used to handle the frequency range, and adaptive pyramidal decomposition and decomposed spatio-temporal filters have been used to diminish computational cost. Theoretical analysis and experimental results suggest the new Ad-V1MTMP algorithm which mimics human primary motion pathway has universal, effective and robust performance.
https://doi.org/10.3837/tiis.2019.01.021 인용 PDF KSCI HTML

A Robust Hand Recognition Method to Variations in Lighting (조명 변화에 안정적인 손 형태 인지 기술)

Choi, Yoo-Joo;Lee, Je-Sung;You, Hyo-Sun;Lee, Jung-Won;Cho, We-Duke
- The KIPS Transactions:PartB
- /
- v.15B no.1
- /
- pp.25-36
- /
- 2008
In this paper, we present a robust hand recognition approach to sudden illumination changes. The proposed approach constructs a background model with respect to hue and hue gradient in HSI color space and extracts a foreground hand region from an input image using the background subtraction method. Eighteen features are defined for a hand pose and multi-class SVM(Support Vector Machine) approach is applied to learn and classify hand poses based on eighteen features. The proposed approach robustly extracts the contour of a hand with variations in illumination by applying the hue gradient into the background subtraction. A hand pose is defined by two Eigen values which are normalized by the size of OBB(Object-Oriented Bounding Box), and sixteen feature values which represent the number of hand contour points included in each subrange of OBB. We compared the RGB-based background subtraction, hue-based background subtraction and the proposed approach with sudden illumination changes and proved the robustness of the proposed approach. In the experiment, we built a hand pose training model from 2,700 sample hand images of six subjects which represent nine numerical numbers from one to nine. Our implementation result shows 92.6% of successful recognition rate for 1,620 hand images with various lighting condition using the training model.
https://doi.org/10.3745/KIPSTB.2008.15-B.1.25 인용 PDF KSCI

Multi-Scale, Multi-Object and Real-Time Face Detection and Head Pose Estimation Using Deep Neural Networks (다중크기와 다중객체의 실시간 얼굴 검출과 머리 자세 추정을 위한 심층 신경망)

Ahn, Byungtae;Choi, Dong-Geol;Kweon, In So
- The Journal of Korea Robotics Society
- /
- v.12 no.3
- /
- pp.313-321
- /
- 2017
One of the most frequently performed tasks in human-robot interaction (HRI), intelligent vehicles, and security systems is face related applications such as face recognition, facial expression recognition, driver state monitoring, and gaze estimation. In these applications, accurate head pose estimation is an important issue. However, conventional methods have been lacking in accuracy, robustness or processing speed in practical use. In this paper, we propose a novel method for estimating head pose with a monocular camera. The proposed algorithm is based on a deep neural network for multi-task learning using a small grayscale image. This network jointly detects multi-view faces and estimates head pose in hard environmental conditions such as illumination change and large pose change. The proposed framework quantitatively and qualitatively outperforms the state-of-the-art method with an average head pose mean error of less than $4.5^{\circ}$ in real-time.
https://doi.org/10.7746/jkros.2017.12.3.313 인용 PDF KSCI

Search Result 62, Processing Time 0.022 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)