DOI QR코드

DOI QR Code

A study on age distortion reduction in facial expression image generation using StyleGAN Encoder

StyleGAN Encoder를 활용한 표정 이미지 생성에서의 연령 왜곡 감소에 대한 연구

  • Hee-Yeol Lee (Dept. Electronic Engineering, Hanbat National University ) ;
  • Seung-Ho Lee (Dept. Electronic Engineering, Hanbat National University )
  • Received : 2023.11.24
  • Accepted : 2023.12.15
  • Published : 2023.12.31

Abstract

In this paper, we propose a method to reduce age distortion in facial expression image generation using StyleGAN Encoder. The facial expression image generation process first creates a face image using StyleGAN Encoder, and changes the expression by applying the learned boundary to the latent vector using SVM. However, when learning the boundary of a smiling expression, age distortion occurs due to changes in facial expression. The smile boundary created in SVM learning for smiling expressions includes wrinkles caused by changes in facial expressions as learning elements, and it is determined that age characteristics were also learned. To solve this problem, the proposed method calculates the correlation coefficient between the smile boundary and the age boundary and uses this to introduce a method of adjusting the age boundary at the smile boundary in proportion to the correlation coefficient. To confirm the effectiveness of the proposed method, the results of an experiment using the FFHQ dataset, a publicly available standard face dataset, and measuring the FID score are as follows. In the smile image, compared to the existing method, the FID score of the smile image generated by the ground truth and the proposed method was improved by about 0.46. In addition, compared to the existing method in the smile image, the FID score of the image generated by StyleGAN Encoder and the smile image generated by the proposed method improved by about 1.031. In non-smile images, compared to the existing method, the FID score of the non-smile image generated by the ground truth and the method proposed in this paper was improved by about 2.25. In addition, compared to the existing method in non-smile images, it was confirmed that the FID score of the image generated by StyleGAN Encoder and the non-smile image generated by the proposed method improved by about 1.908. Meanwhile, as a result of estimating the age of each generated facial expression image and measuring the estimated age and MSE of the image generated with StyleGAN Encoder, compared to the existing method, the proposed method has an average age of about 1.5 in smile images and about 1.63 in non-smile images. Performance was improved, proving the effectiveness of the proposed method.

본 논문에서는 StyleGAN Encoder를 활용한 표정 이미지 생성에서의 연령 왜곡을 감소시키는 방법을 제안한다. 표정 이미지 생성 과정은 StyleGAN Encoder를 사용하여 얼굴 이미지를 생성하고, SVM을 이용하여 학습된 boundary를 잠재 벡터에 적용하여 표정을 변화시킨다. 그러나 웃는 표정의 boundary를 학습할 때 표정 변화에 따른 연령 왜곡이 발생한다. 웃는 표정에 대한 SVM 학습에서 생성된 smile boundary는 표정 변화로 인해 생긴 주름이 학습 요소로 포함되어 있으며 연령에 대한 특성도 함께 학습된 것으로 판단한다. 이를 해결하기 위해, 제안된 방법에서는 smile boundary와 age boundary의 상관계수를 계산하고, 이를 이용하여 smile boundary에서 age boundary를 상관계수에 비례하여 조절하는 방식을 도입한다. 제안된 방법의 효과를 확인하기 위해 공개된 표준 얼굴 데이터셋인 FFHQ 데이터셋을 사용하고 FID score를 측정하여 실험한 결과는 다음과 같다. Smile 이미지에서는 기존 방법에 비하여, Ground Truth와 제안된 방법으로 생성된 smile 이미지의 FID score가 약 0.46 향상되었다. 또한, Smile 이미지에서 기존 방법에 비하여, StyleGAN Encoder로 생성된 이미지와 제안된 방법으로 생성된 smile 이미지의 FID score가 약 1.031 향상되었다. Non-smile 이미지에서는 기존 방법에 비하여, Ground Truth와 본 논문에서 제안된 방법으로 생성된 non-smile 이미지의 FID score가 약 2.25 향상되었다. 또한, Non-smile 이미지에서 기존 방법에 비하여, StyleGAN Encoder로 생성된 이미지와 제안된 방법으로 생성된 non-smile 이미지의 FID score가 약 약 1.908 향상됨을 확인하였다. 한편, 각 생성된 표정 이미지의 연령을 추정하여 StyleGAN Encoder로 생성된 이미지의 추정된 연령과 MSE를 측정한 결과, 기존방법 대비 제안하는 방법이 smile 이미지에서 약 1.5, non-smile 이미지에서 약 1.63의 성능 향상되어 제안한 방법에 대한 성능의 효율성이 입증되었다.

Keywords

Acknowledgement

This work was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government (MSIT)(No. NRF-2022R1F1A1066371) This results was supported by "Regional Innovation Strategy (RIS)" through the National Research Foundation of Korea(NRF) funded by the Ministry of Education(MOE)(2021RIS-004) This research was supported by the MSIT (Ministry of Science and ICT), Korea, under the ICAN (ICT Challenge and Advanced Network of HRD) program (IITP-2022-RS-2022-00156212) supervised by the IITP (Institute of Information & Communications Technology Planning & Evaluation).

References

  1. Karras, Tero, Samuli Laine, and Timo Aila, "A style-based generator architecture for generative adversarial networks," Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019. DOI: 10.48550/arXiv.1812.04948 
  2. J. Y. Kim, S. H. Lee, "3D Point Cloud Reconstruction Technique from 2D Image Using Efficient Feature Map Extraction Network," j.inst. Korean.electr.electron.eng, vol.26, no.3, pp.408-415, 2022. 
  3. J. H. Shim, J. E. Lee, E. J. Hwang, "A Scheme for Preventing Data Augmentation Leaks in GAN-based Models Using Auxiliary Classifier," j.inst.Korean.electr.electron.eng, vol.26, no.2, pp. 176-185, 2022. 
  4. Richardson, Elad, et al. "Encoding in style: a stylegan encoder for image-to-image translation," Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp.2287-2296, 2021. DOI: 10.1109/CVPR46437.2021.00232 
  5. Shen, Yujun, et al. "Interpreting the latent space of gans for semantic face editing," Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020. DOI: 10.1109/CVPR42600.2020.00926 
  6. Alaluf, Yuval, Or Patashnik, and Daniel Cohen-Or, "Restyle: A residual-based stylegan encoder via iterative refinement," Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021. DOI: 10.48550/arXiv.2104.02699 
  7. KARRAS, Tero, et al. "Alias-free generative adversarial networks," Advances in Neural Information Processing Systems, 34, pp.852-863. 2021. DOI: 10.48550/arXiv.2106.12423 
  8. Heusel, Martin, et al,. "Gans trained by a two time-scale update rule converge to a local nash equilibrium," Advances in neural information processing systems 30 (2017). DOI: 10.48550/arXiv.1706.08500 
  9. Rothe, Rasmus, Radu Timofte, and Luc Van Goo,. "Dex: Deep expectation of apparent age from a single image," Proceedings of the IEEE international conference on computer vision workshops. 2015. DOI: 10.1109/ICCVW.2015.41