Browse > Article
http://dx.doi.org/10.7746/jkros.2017.12.3.313

Multi-Scale, Multi-Object and Real-Time Face Detection and Head Pose Estimation Using Deep Neural Networks  

Ahn, Byungtae (Robotics Program, KAIST)
Choi, Dong-Geol (Department of Electrical Engineering, KAIST)
Kweon, In So (Department of Electrical Engineering, KAIST)
Publication Information
The Journal of Korea Robotics Society / v.12, no.3, 2017 , pp. 313-321 More about this Journal
Abstract
One of the most frequently performed tasks in human-robot interaction (HRI), intelligent vehicles, and security systems is face related applications such as face recognition, facial expression recognition, driver state monitoring, and gaze estimation. In these applications, accurate head pose estimation is an important issue. However, conventional methods have been lacking in accuracy, robustness or processing speed in practical use. In this paper, we propose a novel method for estimating head pose with a monocular camera. The proposed algorithm is based on a deep neural network for multi-task learning using a small grayscale image. This network jointly detects multi-view faces and estimates head pose in hard environmental conditions such as illumination change and large pose change. The proposed framework quantitatively and qualitatively outperforms the state-of-the-art method with an average head pose mean error of less than $4.5^{\circ}$ in real-time.
Keywords
Head Pose; Deep Learning; Convolutional Neural Network;
Citations & Related Records
연도 인용수 순위
  • Reference
1 T.F. Cootes, C.J. Taylor, D.H. Cooper, and J. Graham, "Active shape models-their training and application," Computer Vision and Image Understanding, vol. 61, pp. 38-59, 1995.   DOI
2 T.F. Cootes, G. Edwards, and C. Taylor, "Active appearance models," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 6, pp. 681-685, 2001.   DOI
3 F.D. la Torre, W.S. Chu, X. Xiong, F. Vicente, X. Ding, and J. Cohn, "Intraface," in IEEE International Conference on Automatic Face and Gesture Recognition, vol. 1, pp. 1-8, 2015.
4 F. Vicente, Z. Huang, X. Xiong, F.D. la Torre, W. Zhang, and D. Levi, "Driver gaze tracking and eyes off the road detection system," IEEE Transactions on Intelligent Transportation Systems, vol. 16, no. 4, pp. 2014-2027, 2015.   DOI
5 V. N. Balasubramanian, J. Ye, and S. Panchanathan, "Biased manifold embedding: a framework for person-independent head pose estimation," in IEEE Conference on Computer Vision and Pattern Recognition, 2007.
6 J. Foytik and V.K. Asari, "A two-layer framework for piecewise linear manifold-based head pose estimation," International Journal of Computer Vision, vol. 101, pp. 270-287, 2013.   DOI
7 X. Zhu and D. Ramanan, "Face detection, pose estimation and landmark localization in the wild," in IEEE Conference on Computer Vision and Pattern Recognition, pp. 2879-2886, 2012.
8 M.D. Breitenstein, D. Kuettel, T. Weise, and L. van Gool, "Real-time face pose estimation from single range images," in IEEE Conference on Computer Vision and Pattern Recognition, 2008.
9 G. Fanelli, M. Dantone, J. Gall, A. Fossati, and L.V. Gool, "Random forests for real time 3d face analysis," International Journal of Computer Vision, vol. 101, pp. 437-458, 2013.   DOI
10 Y. Lecun, B. Boser, J. Denker, D. Henderson, R. Howard, W. Hubbard, and L. Jackel, "Backpropagation applied to handwritten zip code recognition," Neural Computation, vol. 1, no. 4, pp. 541-551, 1989.   DOI
11 Y. Sun, X. Wang, and X. Tang, "Deep convolutional network cascade for facial point detection," in IEEE Conference on Computer Vision and Pattern Recognition, pp. 3476-3483, 2013.
12 B. Ahn, J. Park, and I.S. Kweon, "Real-time head orientation from a monocular camera using deep neural network," in Asian Conference on Computer Vision, pp. 82-96, 2014.
13 R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation," in IEEE Conference on Computer Vision and Pattern Recognition, pp. 580-587, 2014.
14 M. Koestinger, P. Wohlhart, P.M. Roth, and H. Bischof, "Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization," in IEEE International Workshop on Benchmarking Facial Image Analysis Technologies, 2011.
15 H. Li, Z. Lin, X. Shen, J. Brandt, and G. Hua, "A convolutional neural network cascade for face detection," in IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325-5334, 2015.