• Title/Summary/Keyword: Training Face Image

Search Result 125, Processing Time 0.027 seconds

Hardware Design of Super Resolution on Human Faces for Improving Face Recognition Performance of Intelligent Video Surveillance Systems (지능형 영상 보안 시스템의 얼굴 인식 성능 향상을 위한 얼굴 영역 초해상도 하드웨어 설계)

  • Kim, Cho-Rong;Jeong, Yong-Jin
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.48 no.9
    • /
    • pp.22-30
    • /
    • 2011
  • Recently, the rising demand for intelligent video surveillance system leads to high-performance face recognition systems. The solution for low-resolution images acquired by a long-distance camera is required to overcome the distance limits of the existing face recognition systems. For that reason, this paper proposes a hardware design of an image resolution enhancement algorithm for real-time intelligent video surveillance systems. The algorithm is synthesizing a high-resolution face image from an input low-resolution image, with the help of a large collection of other high-resolution face images, called training set. When we checked the performance of the algorithm at 32bit RISC micro-processor, the entire operation took about 25 sec, which is inappropriate for real-time target applications. Based on the result, we implemented the hardware module and verified it using Xilinx Virtex-4 and ARM9-based embedded processor(S3C2440A). The designed hardware can complete the whole operation within 33 msec, so it can deal with 30 frames per second. We expect that the proposed hardware could be one of the solutions not only for real-time processing at the embedded environment, but also for an easy integration with existing face recognition system.

Enhanced ACGAN based on Progressive Step Training and Weight Transfer

  • Jinmo Byeon;Inshil Doh;Dana Yang
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.3
    • /
    • pp.11-20
    • /
    • 2024
  • Among the generative models in Artificial Intelligence (AI), especially Generative Adversarial Network (GAN) has been successful in various applications such as image processing, density estimation, and style transfer. While the GAN models including Conditional GAN (CGAN), CycleGAN, BigGAN, have been extended and improved, researchers face challenges in real-world applications in specific domains such as disaster simulation, healthcare, and urban planning due to data scarcity and unstable learning causing Image distortion. This paper proposes a new progressive learning methodology called Progressive Step Training (PST) based on the Auxiliary Classifier GAN (ACGAN) that discriminates class labels, leveraging the progressive learning approach of the Progressive Growing of GAN (PGGAN). The PST model achieves 70.82% faster stabilization, 51.3% lower standard deviation, stable convergence of loss values in the later high resolution stages, and a 94.6% faster loss reduction compared to conventional methods.

Face Recognition Using Local Statistics of Gradients and Correlations (그래디언트와 상관관계의 국부통계를 이용한 얼굴 인식)

  • Ju, Yingai;So, Hyun-Joo;Kim, Nam-Chul
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.48 no.3
    • /
    • pp.19-29
    • /
    • 2011
  • Until now, many face recognition methods have been proposed, most of them use a 1-dimensional feature vector which is vectorized the input image without feature extraction process or input image itself is used as a feature matrix. It is known that the face recognition methods using raw image yield deteriorated performance in databases whose have severe illumination changes. In this paper, we propose a face recognition method using local statistics of gradients and correlations which are good for illumination changes. BDIP (block difference of inverse probabilities) is chosen as a local statistics of gradients and two types of BVLC (block variation of local correlation coefficients) is chosen as local statistics of correlations. When a input image enters the system, it extracts the BDIP, BVLC1 and BVLC2 feature images, fuses them, obtaining feature matrix by $(2D)^2$ PCA transformation, and classifies it with training feature matrix by nearest classifier. From experiment results of four face databases, FERET, Weizmann, Yale B, Yale, we can see that the proposed method is more reliable than other six methods in lighting and facial expression.

The Education Methodology for the Production of Stereoscopic 3D Image Contents -Focusing on University Education (3D 입체영상 콘텐츠 제작 교육 방법론 -대학교육을 중심으로)

  • Park, SungDae;Lee, Junsang
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.20 no.11
    • /
    • pp.2045-2053
    • /
    • 2016
  • Many research institutes have studied 3D stereoscopic images since the release of 3D stereoscopic film 'Avatar' in 2009. Universities have conducted research on and studied 3D stereoscopic image in various ways, and even university curriculums have adopted 3D stereoscopic image production courses. However, universities face many difficulties in purchasing expensive equipment including cameras and rigs for 3D stereoscopic image contents production training. This paper addresses the 3D stereoscopic image content production curriculum using software in university. A practical training course was carried out based on the theoretical contents and theories that must be dealt with in 3D stereoscopic image contents production curriculum. As a result, students could understand the principles of 3D stereoscopic image production and produce various 3D stereoscopic images using various software applications. In this parer, proper instructional methods for 3D stereoscopic image contents production in university are discussed through this production course.

Synthesis of Expressive Talking Heads from Speech with Recurrent Neural Network (RNN을 이용한 Expressive Talking Head from Speech의 합성)

  • Sakurai, Ryuhei;Shimba, Taiki;Yamazoe, Hirotake;Lee, Joo-Ho
    • The Journal of Korea Robotics Society
    • /
    • v.13 no.1
    • /
    • pp.16-25
    • /
    • 2018
  • The talking head (TH) indicates an utterance face animation generated based on text and voice input. In this paper, we propose the generation method of TH with facial expression and intonation by speech input only. The problem of generating TH from speech can be regarded as a regression problem from the acoustic feature sequence to the facial code sequence which is a low dimensional vector representation that can efficiently encode and decode a face image. This regression was modeled by bidirectional RNN and trained by using SAVEE database of the front utterance face animation database as training data. The proposed method is able to generate TH with facial expression and intonation TH by using acoustic features such as MFCC, dynamic elements of MFCC, energy, and F0. According to the experiments, the configuration of the BLSTM layer of the first and second layers of bidirectional RNN was able to predict the face code best. For the evaluation, a questionnaire survey was conducted for 62 persons who watched TH animations, generated by the proposed method and the previous method. As a result, 77% of the respondents answered that the proposed method generated TH, which matches well with the speech.

Anomaly-based Alzheimer's disease detection using entropy-based probability Positron Emission Tomography images

  • Husnu Baris Baydargil;Jangsik Park;Ibrahim Furkan Ince
    • ETRI Journal
    • /
    • v.46 no.3
    • /
    • pp.513-525
    • /
    • 2024
  • Deep neural networks trained on labeled medical data face major challenges owing to the economic costs of data acquisition through expensive medical imaging devices, expert labor for data annotation, and large datasets to achieve optimal model performance. The heterogeneity of diseases, such as Alzheimer's disease, further complicates deep learning because the test cases may substantially differ from the training data, possibly increasing the rate of false positives. We propose a reconstruction-based self-supervised anomaly detection model to overcome these challenges. It has a dual-subnetwork encoder that enhances feature encoding augmented by skip connections to the decoder for improving the gradient flow. The novel encoder captures local and global features to improve image reconstruction. In addition, we introduce an entropy-based image conversion method. Extensive evaluations show that the proposed model outperforms benchmark models in anomaly detection and classification using an encoder. The supervised and unsupervised models show improved performances when trained with data preprocessed using the proposed image conversion method.

Face Identification Using a Near-Infrared Camera in a Nonrestrictive In-Vehicle Environment (적외선 카메라를 이용한 비제약적 환경에서의 얼굴 인증)

  • Ki, Min Song;Choi, Yeong Woo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.10 no.3
    • /
    • pp.99-108
    • /
    • 2021
  • There are unrestricted conditions on the driver's face inside the vehicle, such as changes in lighting, partial occlusion and various changes in the driver's condition. In this paper, we propose a face identification system in an unrestricted vehicle environment. The proposed method uses a near-infrared (NIR) camera to minimize the changes in facial images that occur according to the illumination changes inside and outside the vehicle. In order to process a face exposed to extreme light, the normal face image is changed to a simulated overexposed image using mean and variance for training. Thus, facial classifiers are simultaneously generated under both normal and extreme illumination conditions. Our method identifies a face by detecting facial landmarks and aggregating the confidence score of each landmark for the final decision. In particular, the performance improvement is the highest in the class where the driver wears glasses or sunglasses, owing to the robustness to partial occlusions by recognizing each landmark. We can recognize the driver by using the scores of remaining visible landmarks. We also propose a novel robust rejection and a new evaluation method, which considers the relations between registered and unregistered drivers. The experimental results on our dataset, PolyU and ORL datasets demonstrate the effectiveness of the proposed method.

Text-to-Face Generation Using Multi-Scale Gradients Conditional Generative Adversarial Networks (다중 스케일 그라디언트 조건부 적대적 생성 신경망을 활용한 문장 기반 영상 생성 기법)

  • Bui, Nguyen P.;Le, Duc-Tai;Choo, Hyunseung
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2021.11a
    • /
    • pp.764-767
    • /
    • 2021
  • While Generative Adversarial Networks (GANs) have seen huge success in image synthesis tasks, synthesizing high-quality images from text descriptions is a challenging problem in computer vision. This paper proposes a method named Text-to-Face Generation Using Multi-Scale Gradients for Conditional Generative Adversarial Networks (T2F-MSGGANs) that combines GANs and a natural language processing model to create human faces has features found in the input text. The proposed method addresses two problems of GANs: model collapse and training instability by investigating how gradients at multiple scales can be used to generate high-resolution images. We show that T2F-MSGGANs converge stably and generate good-quality images.

Extreme Learning Machine Ensemble Using Bagging for Facial Expression Recognition

  • Ghimire, Deepak;Lee, Joonwhoan
    • Journal of Information Processing Systems
    • /
    • v.10 no.3
    • /
    • pp.443-458
    • /
    • 2014
  • An extreme learning machine (ELM) is a recently proposed learning algorithm for a single-layer feed forward neural network. In this paper we studied the ensemble of ELM by using a bagging algorithm for facial expression recognition (FER). Facial expression analysis is widely used in the behavior interpretation of emotions, for cognitive science, and social interactions. This paper presents a method for FER based on the histogram of orientation gradient (HOG) features using an ELM ensemble. First, the HOG features were extracted from the face image by dividing it into a number of small cells. A bagging algorithm was then used to construct many different bags of training data and each of them was trained by using separate ELMs. To recognize the expression of the input face image, HOG features were fed to each trained ELM and the results were combined by using a majority voting scheme. The ELM ensemble using bagging improves the generalized capability of the network significantly. The two available datasets (JAFFE and CK+) of facial expressions were used to evaluate the performance of the proposed classification system. Even the performance of individual ELM was smaller and the ELM ensemble using a bagging algorithm improved the recognition performance significantly.

Patch based Semi-supervised Linear Regression for Face Recognition

  • Ding, Yuhua;Liu, Fan;Rui, Ting;Tang, Zhenmin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.8
    • /
    • pp.3962-3980
    • /
    • 2019
  • To deal with single sample face recognition, this paper presents a patch based semi-supervised linear regression (PSLR) algorithm, which draws facial variation information from unlabeled samples. Each facial image is divided into overlapped patches, and a regression model with mapping matrix will be constructed on each patch. Then, we adjust these matrices by mapping unlabeled patches to $[1,1,{\cdots},1]^T$. The solutions of all the mapping matrices are integrated into an overall objective function, which uses ${\ell}_{2,1}$-norm minimization constraints to improve discrimination ability of mapping matrices and reduce the impact of noise. After mapping matrices are computed, we adopt majority-voting strategy to classify the probe samples. To further learn the discrimination information between probe samples and obtain more robust mapping matrices, we also propose a multistage PSLR (MPSLR) algorithm, which iteratively updates the training dataset by adding those reliably labeled probe samples into it. The effectiveness of our approaches is evaluated using three public facial databases. Experimental results prove that our approaches are robust to illumination, expression and occlusion.