Browse > Article
http://dx.doi.org/10.6109/jicce.2013.11.3.207

3D Facial Landmark Tracking and Facial Expression Recognition  

Medioni, Gerard (Computer Vision Lab, Institute for Robotics and Intelligent Systems, University of Southern California)
Choi, Jongmoo (Computer Vision Lab, Institute for Robotics and Intelligent Systems, University of Southern California)
Labeau, Matthieu (Computer Vision Lab, Institute for Robotics and Intelligent Systems, University of Southern California)
Leksut, Jatuporn Toy (Computer Vision Lab, Institute for Robotics and Intelligent Systems, University of Southern California)
Meng, Lingchao (Computer Vision Lab, Institute for Robotics and Intelligent Systems, University of Southern California)
Abstract
In this paper, we address the challenging computer vision problem of obtaining a reliable facial expression analysis from a naturally interacting person. We propose a system that combines a 3D generic face model, 3D head tracking, and 2D tracker to track facial landmarks and recognize expressions. First, we extract facial landmarks from a neutral frontal face, and then we deform a 3D generic face to fit the input face. Next, we use our real-time 3D head tracking module to track a person's head in 3D and predict facial landmark positions in 2D using the projection from the updated 3D face model. Finally, we use tracked 2D landmarks to update the 3D landmarks. This integrated tracking loop enables efficient tracking of the non-rigid parts of a face in the presence of large 3D head motion. We conducted experiments for facial expression recognition using both framebased and sequence-based approaches. Our method provides a 75.9% recognition rate in 8 subjects with 7 key expressions. Our approach provides a considerable step forward toward new applications including human-computer interactions, behavioral science, robotics, and game applications.
Keywords
Computer vision; Facial expression recognition; Facial landmark tracking; 3D-face tracking;
Citations & Related Records
연도 인용수 순위
  • Reference
1 C. S. Myers and L. R. Rabiner, "A comparative study of several dynamic time-warping algorithms for connected word recognition," Bell System Techhnical Journal, vol. 60, no. 7, pp. 1389-1409, 1981.   DOI
2 H. Sakoe and S. Chiba, "Dynamic programming algorithm optimization for spoken word recognition," IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 26, no. 1, pp. 43-49, 1978.   DOI
3 V. Blanz aand T. Vetter, "Face recognition based on fitting a 3D morphable model," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, no. 9, pp. 106 3-1074, 2003.   DOI   ScienceOn
4 L. Gu and T. Kanade, "3D alignment of face in a single image," in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, New York, NY, pp. 1305-1312, 2006.
5 D. Cristinacce and T. F. Cootes, "Feature detection and tracking with constrained local models," in Proceedings of the 17th British Machine Vision Conference, Edinburgh, UK, pp. 929-938, 2006.
6 Z. Zhu and Q. Ji, "Robust real-time face pose and facial expression reecovery," in Proceedings of the IEEEE Computer Soociety Conference on Computer Vision and Pattern Recognition, New York, NY, ppp. 681-688, 20006.
7 C. Vogler, Z. Li, A. Kanaujia, S. Goldenstein, and D. Metaxas, "The best of both worlds: combining 3D deformable models with active shape models," in Proceedings of the 11th IEEE International Conference on Computer Vision, Rio de Janeiro, Brazil, 2007.
8 S. Taheri, P. Turaga, and R. Chellappa, "Towards view-invariant expression analysis using analytic shape manifolds," in Proceedings of the IEEE International Conference on Automatic Face & Gesture Recognition and Workshops, Santa Barbara, CA, pp. 306-313, 2011.
9 S. T. Roweis andd L. K. Saul, "Nonlinear dimensionality reductionn by locally linear embedding," Science, vol. 290, no. 5500, pp. 2323-2326, 2000.   DOI   ScienceOn
10 W. K. Liao and G. Medioni, "3D face tracking and expression inference from a 2D sequence using manifold learning," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, 2008.
11 B. Lucas and T. Kanade, "An iterative image registration technique with an application to stereo vision (DARPA)," in Proceedings of the DARPA Imagge Understanding Workshop, Washington, DC, pp. 121-130, 1981.
12 J. Choi, Y. Dumortier, S. I. Choi, M. B. Ahmad, and G. Medioni, "Real-time 3-D face tracking and modeling from a webcam," in Proceedings of the IEEE Workshop on Applications of Computer Vision, Breckenridge, CO, pp. 33-40, 2012.
13 P. Viola and M. J. Jones, "Robust real-time face detection," International Journal of Computer Vision, vol. 7, no. 2, pp. 137-154, 2004.
14 OpenCV: Open Source Computer Vision [Internet], Available: http://opencv.org/.
15 T. F. Cootes and C. J. Taylor, "A mixture model for representing shape variation," in Proceedings of the 8th British Machine Vision Conference, Essex, UK, 1997.
16 T. F. Cootes, G. J. Edwards, and C. J. Taylor, "Active appearance models," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 6, pp. 681-685, 2001.   DOI   ScienceOn
17 S. Milborrow and F. Nicolls, "Locating facial features with an extended active shape model," in Proceedings of the 10th European Conference on Computer Vision, Marseille, France, pp. 504-513, 2008.
18 G. Medioni, J. Choi, C. H. Kuo, and D. Fidaleo, "Identifying noncooperative subjects at a distance using face images and inferred three-dimensional face models," IEEE Transactions on Systems, Man, and Cybernetics Part A: Systems and Humans, vol. 39, no. 1, pp. 12-24, 2009.   DOI   ScienceOn
19 K. Pearson, "On lines and planes of closest fit to systems of points in space," Philosophical Magazine, vol. 2, no. 6, pp. 559-572, 1901.   DOI
20 T. F. Cootes, C. J. Taylor, D. H. Cooper, and J. Graham, "Active shape models: their training and application," Computer Vision and Image Understanding, vol. 61, no. 1, pp. 38-59, 1995.   DOI   ScienceOn
21 J. Xiao, S. Baker, I. Matthews, and T. Kanade, "Real-time combined 2D+3D active appearance models," in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Washington, DC, pp. 535-542, 2004.
22 W. K. Liao, D. Fidaleo, and G. Medioni, "Robust: real-time 3D face tracking from a monocular view," EURASIP Journal on Image and Video Processing, vol. 2010, article no. 5, 2010.
23 J. Choi, G. Medioni, Y. Lin, L. Silva, O. Regina, M. Pamplona, and T. C. Faltemier, "3D face reconstruction using a single or multiple views," in Proceedings of the 20th International Conference on Pattern Recognition, Istanbul, Turkey, pp. 3959-3962, 2010.
24 J. A. Russell, "Emotion, core affect, and psychological construction," Cognition & Emotion, vol. 23, no. 7, pp. 1259-1283, 2009.   DOI   ScienceOn