DOI QR코드

DOI QR Code

Multi-Scale, Multi-Object and Real-Time Face Detection and Head Pose Estimation Using Deep Neural Networks

다중크기와 다중객체의 실시간 얼굴 검출과 머리 자세 추정을 위한 심층 신경망

  • Received : 2017.03.10
  • Accepted : 2017.07.28
  • Published : 2017.08.31

Abstract

One of the most frequently performed tasks in human-robot interaction (HRI), intelligent vehicles, and security systems is face related applications such as face recognition, facial expression recognition, driver state monitoring, and gaze estimation. In these applications, accurate head pose estimation is an important issue. However, conventional methods have been lacking in accuracy, robustness or processing speed in practical use. In this paper, we propose a novel method for estimating head pose with a monocular camera. The proposed algorithm is based on a deep neural network for multi-task learning using a small grayscale image. This network jointly detects multi-view faces and estimates head pose in hard environmental conditions such as illumination change and large pose change. The proposed framework quantitatively and qualitatively outperforms the state-of-the-art method with an average head pose mean error of less than $4.5^{\circ}$ in real-time.

Keywords

References

  1. T.F. Cootes, C.J. Taylor, D.H. Cooper, and J. Graham, "Active shape models-their training and application," Computer Vision and Image Understanding, vol. 61, pp. 38-59, 1995. https://doi.org/10.1006/cviu.1995.1004
  2. T.F. Cootes, G. Edwards, and C. Taylor, "Active appearance models," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 6, pp. 681-685, 2001. https://doi.org/10.1109/34.927467
  3. F.D. la Torre, W.S. Chu, X. Xiong, F. Vicente, X. Ding, and J. Cohn, "Intraface," in IEEE International Conference on Automatic Face and Gesture Recognition, vol. 1, pp. 1-8, 2015.
  4. F. Vicente, Z. Huang, X. Xiong, F.D. la Torre, W. Zhang, and D. Levi, "Driver gaze tracking and eyes off the road detection system," IEEE Transactions on Intelligent Transportation Systems, vol. 16, no. 4, pp. 2014-2027, 2015. https://doi.org/10.1109/TITS.2015.2396031
  5. V. N. Balasubramanian, J. Ye, and S. Panchanathan, "Biased manifold embedding: a framework for person-independent head pose estimation," in IEEE Conference on Computer Vision and Pattern Recognition, 2007.
  6. J. Foytik and V.K. Asari, "A two-layer framework for piecewise linear manifold-based head pose estimation," International Journal of Computer Vision, vol. 101, pp. 270-287, 2013. https://doi.org/10.1007/s11263-012-0567-y
  7. X. Zhu and D. Ramanan, "Face detection, pose estimation and landmark localization in the wild," in IEEE Conference on Computer Vision and Pattern Recognition, pp. 2879-2886, 2012.
  8. M.D. Breitenstein, D. Kuettel, T. Weise, and L. van Gool, "Real-time face pose estimation from single range images," in IEEE Conference on Computer Vision and Pattern Recognition, 2008.
  9. G. Fanelli, M. Dantone, J. Gall, A. Fossati, and L.V. Gool, "Random forests for real time 3d face analysis," International Journal of Computer Vision, vol. 101, pp. 437-458, 2013. https://doi.org/10.1007/s11263-012-0549-0
  10. Y. Lecun, B. Boser, J. Denker, D. Henderson, R. Howard, W. Hubbard, and L. Jackel, "Backpropagation applied to handwritten zip code recognition," Neural Computation, vol. 1, no. 4, pp. 541-551, 1989. https://doi.org/10.1162/neco.1989.1.4.541
  11. Y. Sun, X. Wang, and X. Tang, "Deep convolutional network cascade for facial point detection," in IEEE Conference on Computer Vision and Pattern Recognition, pp. 3476-3483, 2013.
  12. B. Ahn, J. Park, and I.S. Kweon, "Real-time head orientation from a monocular camera using deep neural network," in Asian Conference on Computer Vision, pp. 82-96, 2014.
  13. R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation," in IEEE Conference on Computer Vision and Pattern Recognition, pp. 580-587, 2014.
  14. M. Koestinger, P. Wohlhart, P.M. Roth, and H. Bischof, "Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization," in IEEE International Workshop on Benchmarking Facial Image Analysis Technologies, 2011.
  15. H. Li, Z. Lin, X. Shen, J. Brandt, and G. Hua, "A convolutional neural network cascade for face detection," in IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325-5334, 2015.

Cited by

  1. Registration method for maintenance-work support based on augmented-reality-model generation from drawing data vol.7, pp.6, 2020, https://doi.org/10.1093/jcde/qwaa056