DOI QR코드

DOI QR Code

Integral Regression Network for Facial Landmark Detection

얼굴 특징점 검출을 위한 적분 회귀 네트워크

  • Kim, Do Yeop (Department of Electronics and Communications Engineering, Kwangwoon University) ;
  • Chang, Ju Yong (Department of Electronics and Communications Engineering, Kwangwoon University)
  • Received : 2019.05.07
  • Accepted : 2019.07.05
  • Published : 2019.07.30

Abstract

With the development of deep learning, the performance of facial landmark detection methods has been greatly improved. The heat map regression method, which is a representative facial landmark detection method, is widely used as an efficient and robust method. However, the landmark coordinates cannot be directly obtained through a single network, and the accuracy is reduced in determining the landmark coordinates from the heat map. To solve these problems, we propose to combine integral regression with the existing heat map regression method. Through experiments using various datasets, we show that the proposed integral regression network significantly improves the performance of facial landmark detection.

최근 딥러닝 기술의 발전과 함께 얼굴 특징점 검출 방법의 성능은 크게 향상되었다. 대표적인 얼굴 특징점 검출 방법인 히트맵 회귀 방법은 효율적이고 강력한 방법으로 널리 사용되고 있으나, 단일 네트워크를 통해 특징점 좌표를 즉시 얻을 수 없으며, 히트맵으로부터 특징점 좌표를 결정하는 과정에서 정확도가 손실된다는 단점이 존재한다. 이러한 문제점들을 해결하기 위해 본 논문에서는 기존의 히트맵 회귀 방법에 적분 회귀 방법을 결합할 것을 제안한다. 여러 가지 데이터셋을 사용한 실험을 통해 제안하는 적분 회귀 네트워크가 얼굴 특징점 검출 성능을 크게 향상시킨다는 것을 보인다.

Keywords

BSGHC3_2019_v24n4_564_f0001.png 이미지

그림 1. 제안하는 얼굴 특징점 검출 네트워크의 개요 Fig. 1. Overview of proposed facial landmark detection network

BSGHC3_2019_v24n4_564_f0002.png 이미지

그림 2. 제안하는 방법(ResNet50-Int)과 비교 모델들의 NME 곡선 Fig. 2. NME curves of proposed method (ResNet50-Int) and baseline models

BSGHC3_2019_v24n4_564_f0003.png 이미지

그림 3. 제안하는 모델의 얼굴 특징점 검출 결과 (a) 성공적으로 검출한 경우 (b) 검출에 실패한 경우 Fig. 3. Examples of facial landmarks detected by proposed method (a) success cases (b) failure cases

표 1. 제안하는 네트워크의 세부 구성 Table 1. Detail of the proposed network

BSGHC3_2019_v24n4_564_t0001.png 이미지

표 2. 각 모델에 대한 AUC(%) Table 2. AUC (%) of each model

BSGHC3_2019_v24n4_564_t0002.png 이미지

표 3. 각 모델의 프레임 당 처리 시간 (ms) Table 3. Processing time per frame (ms) of each model

BSGHC3_2019_v24n4_564_t0003.png 이미지

References

  1. A. Bulat and G. Tzimiropoulos, "How far are we from solving the 2d & 3d face alignment problem? (and a dataset of 230,000 3d facial landmarks)," IEEE International Conference on Computer Vision, pp. 1021-1030, 2017.
  2. X. Sun, B. Xiao, F. Wei, S. Liang, and Y. Wei, "Integral human pose regression," European Conference on Computer Vision, pp. 529-545, 2018.
  3. Y. Sun, X. Wang, and X. Tang, "Deep convolutional network cascade for facial point detection," IEEE Conference on Computer Vision and Pattern Recognition, 2013.
  4. E. Zhou, H. Fan, Z. Cao, Y. Jiang, and Q. Yin, "Extensive facial landmark localization with coarse-to-fine convolutional network cascade," IEEE International Conference on Computer Vision Workshops, 2013.
  5. Z. Zhang, P. Luo, C. C. Loy, and X. Tang, "Facial landmark detection by deep multi-task learning," European Conference on Computer Vision, 2014.
  6. R. Ranjan, V. M. Patel, and R. Chellappa, "Hyperface: A deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 1, pp. 121-135. 2019. https://doi.org/10.1109/TPAMI.2017.2781233
  7. K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," IEEE Conference on Computer Vision and Pattern Recognition, pp. 770-778, 2016.
  8. A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," Advances in Neural Information Processing Systems, pp. 1097-1105, 2012.
  9. X. Zhu, Z. Lei, X. Liu, H. Shi, and S. Z. Li, "Face alignment across large poses: A 3d solution," IEEE Conference on Computer Vision and Pattern Recognition, pp. 146-155, 2016.
  10. C. Sagonas, E. Antonakos, G. Tzimiropoulos, S. Zafeiriou, and M. Pantic, "300 faces in-the-wild challenge: Database and results," Image and Vision Computing, vol. 47, pp. 3-18, 2016. https://doi.org/10.1016/j.imavis.2016.01.002
  11. S. Zaferiou, "The menpo facial landmark localisation challenge," IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017.
  12. J. Shen, S. Zafeiriou, G. G. Chrysos, J. Kossaifi, G. Tzimiropoulos, and M. Pantic, "The first facial landmark tracking in-the-wild challenge: Benchmark and results," IEEE International Conference on Computer Vision Workshops, 2015.
  13. V. Jain and E. Learned-Miller, "Fddb: A benchmark for face detection in unconstrained settings," UMass Amherst Technical Report, 2010.
  14. M. Kostinger, P. Wohlhart, P. M. Roth, and H. Bischof, "Annotated facial landmarks in the wild: A large-scale, real world database for facial landmark localization," IEEE International Conference on Computer Vision Workshops, 2011.
  15. D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," International Conference on Learning Representations, 2015.
  16. A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, "Automatic differentiation in pytorch," Advances in Neural Information Processing Systems Workshops, 2017.
  17. A. Newell, K. Yang, and J. Deng, "Stacked hourglass network for human pose estimation," European Conference on Computer Vision, pp. 483-499, 2016.