Browse > Article
http://dx.doi.org/10.3837/tiis.2020.10.009

Multi-Task FaceBoxes: A Lightweight Face Detector Based on Channel Attention and Context Information  

Qi, Shuaihui (National University of Defense Technology)
Yang, Jungang (National University of Defense Technology)
Song, Xiaofeng (National University of Defense Technology)
Jiang, Chen (National University of Defense Technology)
Publication Information
KSII Transactions on Internet and Information Systems (TIIS) / v.14, no.10, 2020 , pp. 4080-4097 More about this Journal
Abstract
In recent years, convolutional neural network (CNN) has become the primary method for face detection. But its shortcomings are obvious, such as expensive calculation, heavy model, etc. This makes CNN difficult to use on the mobile devices which have limited computing and storage capabilities. Therefore, the design of lightweight CNN for face detection is becoming more and more important with the popularity of smartphones and mobile Internet. Based on the CPU real-time face detector FaceBoxes, we propose a multi-task lightweight face detector, which has low computing cost and higher detection precision. First, to improve the detection capability, the squeeze and excitation modules are used to extract attention between channels. Then, the textual and semantic information are extracted by shallow networks and deep networks respectively to get rich features. Finally, the landmark detection module is used to improve the detection performance for small faces and provide landmark data for face alignment. Experiments on AFW, FDDB, PASCAL, and WIDER FACE datasets show that our algorithm has achieved significant improvement in the mean average precision. Especially, on the WIDER FACE hard validation set, our algorithm outperforms the mean average precision of FaceBoxes by 7.2%. For VGA-resolution images, the running speed of our algorithm can reach 23FPS on a CPU device.
Keywords
Multi-Task FaceBoxes; Feature Fusion; Attention; Landmark Detection;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Shu, Xiangbo, et al., "Personalized Age Progression with Aging Dictionary," IEEE International Conference on Computer Vision, pp. 3970-3978, October 8-16, 2016.
2 X. Shu, J. Tang, Z. Li, H. Lai, L. Zhang and S. Yan, "Personalized Age Progression with Bi-Level Aging Dictionary Learning," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 4, pp. 905-917, April, 2018.   DOI
3 H. Yang, U. Ciftci and L. Yin, "Facial Expression Recognition by De-expression Residue Learning," in Proc. of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2168-2177, June 18-23, 2018.
4 Ishii, Idaku, et al, "500-Fps Face Tracking System," Journal of Real-Time Image Processing, vol. 8, no. 4, pp. 379-388, December, 2013.   DOI
5 P. Viola and M. Jones, "Rapid object detection using a boosted cascade of simple features," in Proc. of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 1, pp. 511-518, December 8-14, 2001.
6 Viola, P., & Jones, M, "Robust real-time face detection," in Proc. of Eighth IEEE International Conference on Computer Vision, vol. 57, pp. 137-154, 2004.
7 Bin Yang, J. Yan, Z. Lei and S. Z. Li, "Aggregate channel features for multi-view face detection," in Proc. of IEEE International Joint Conference on Biometrics, pp. 1-8, September 29-October 2, 2014.
8 S. Liao, A. K. Jain and S. Z. Li, "A Fast and Accurate Unconstrained Face Detector," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 2, pp. 211-223, Feb, 2016.   DOI
9 Brubaker, S. C., Wu, J., Sun, J., Mullin, M. D., and Rehg, J. M, "On the Design of Cascades of Boosted Ensembles for Face Detection," International Journal of Computer Vision, vol. 77, pp. 65-86, September, 2008.   DOI
10 M. Pham and T. Cham, "Fast training and selection of Haar features using statistics in boosting-based face detection," in Proc. of 2007 IEEE 11th International Conference on Computer Vision, pp. 1-7, October 14-21, 2007.
11 L. Bourdev and J. Brandt, "Robust object detection via soft cascade," in Proc. of 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 236-243, June 20-25, 2005.
12 Li, S. Z., Zhu, L., Zhang, Z., Blake, A., Zhang, H., and Shum, H., "Statistical Learning of Multi-view Face Detection," in Proc. of ECCV '02 Proceedings of the 7th European Conference on Computer Vision-Part IV, pp. 67-81, May 28-31, 2002.
13 Junjie Yan, Xucong Zhang, Zhen Lei, and S. Z. Li, "Face detection by structural models," in Proc. of 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, vol. 32, no 10, pp. 790-799, October, 2014.
14 P. Felzenszwalb, D. McAllester and D. Ramanan, "A discriminatively trained, multiscale, deformable part model," in Proc. of 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1-8, June 23-28, 2008.
15 K. Zhang, Z. Zhang, Z. Li and Y. Qiao, "Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks," IEEE Signal Processing Letters, vol. 23, no. 10, pp. 1499-1503, October, 2016.   DOI
16 J. Yan, Z. Lei, L. Wen and S. Z. Li, "The Fastest Deformable Part Model for Object Detection," in Proc. of 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2497-2504, June 23-28, 2014.
17 Li, H., Lin, Z., Shen, X., Brandt, J., and Hua, "A convolutional neural network cascade for face detection," in Proc. of 2015 IEEE Conference on Computer Vision and Pattern Recognition , pp. 5325-5334, June 7-12, 2015.
18 Yu, J., Jiang, Y., Wang, Z., Cao, Z., and Huang, T, "UnitBox: An Advanced Object Detection Network," in Proc. of the 24th ACM international conference on Multimedia, pp. 516-520, October 15-19, 2016.
19 Tang, X., Du, D. K., He, Z., and Liu, J., "PyramidBox: A Context-assisted Single Shot Face Detector," in Proc. of the European Conference on Computer Vision, pp. 812-828, September 8-14, 2018.
20 J. Li et al., "DSFD: Dual Shot Face Detector," in Proc. of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5055-5064, June 15-20, 2019.
21 S. Zhang, X. Zhu, Z. Lei, H. Shi, X. Wang and S. Z. Li, "FaceBoxes: A CPU real-time face detector with high accuracy," in Proc. of 2017 IEEE International Joint Conference on Biometrics, pp. 1-9, October 1-4, 2017.
22 https://github.com/ShiqiYu/libfacedetection.train.
23 https://github.com/Linzaer/Ultra-Light-Fast-Generic-Face-Detector-1MB.
24 https://github.com/dlunion/DBFace.
25 Deng, J., Guo, J., Zhou, Y., Yu, J., Kotsia, I., and Zafeiriou, S., "RetinaFace: Single-stage Dense Face Localisation in the Wild," in Proc. of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
26 C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens and Z. Wojna, "Rethinking the Inception Architecture for Computer Vision," in Proc. of 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818-2826, June 27-30, 2016.
27 J. Redmon, S. Divvala, R. Girshick and A. Farhadi, "You Only Look Once: Unified, Real-Time Object Detection," in Proc. of 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp. 779-788, June 26-30, 2016.
28 Liu, Wei, et al. "SSD: Single Shot MultiBox Detector," in Proc. of European Conference on Computer Vision, vol. 9905, pp. 21-37, October 8-16, 2016.
29 Shang, W., Sohn, K., Almeida, D., and Lee, H., "Understanding and improving convolutional neural networks via concatenated rectified linear units," in Proc. of the 33rd International Conference on International Conference on Machine Learning, vol. 48, pp. 2217-2225, June 19-24, 2016.
30 J. Hu, L. Shen and G. Sun, "Squeeze-and-Excitation Networks," in Proc. of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7132-7141, June 18-23, 2018.
31 Vidit Jain and Erik Learned-Miller, "FDDB: A Benchmark for Face Detection in Unconstrained Settings," Technical Report UM-CS-2010-009, Dept. of Computer Science, University of Massachusetts, Amherst, 2010.
32 X. Zhu and D. Ramanan, "Face detection, pose estimation, and landmark localization in the wild," in Proc. of 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2879-2886, June 16-21, 2012.
33 Yang, Shuo, et al., "WIDER FACE: A Face Detection Benchmark," in Proc. of 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp. 5525-5533, June 27-30, 2016.
34 Li, Y., Sun, B., Wu, T., andWang, Y., "Face Detection with End-to-End Integration of a ConvNet and a 3D Model," in Proc. of European Conference on Computer Vision, vol. 9907, pp. 420-436, October 8-16, 2016.
35 Mathias, M., Benenson, R., Pedersoli, M., & Gool, L. J. V., "Face Detection without Bells and Whistles," in Proc. of European Conference on Computer Vision," vol 8692, 720-735, September 5-12, 2014.
36 R. Benenson, M. Mathias, T. Tuytelaars and L. Van Gool, "Seeking the Strongest Rigid Detector," in Proc. of 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3666-3673, June 23-28, 2013.
37 X. Shen, Z. Lin, J. Brandt and Y. Wu, "Detecting and Aligning Faces by Image Retrieval," in Proc. of 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3460-3467, June 23-28, 2013.
38 E. Ohn-Bar and M. M. Trivedi, "To boost or not to boost? On the limits of boosted trees for object detection," in Proc. of 2016 23rd International Conference on Pattern Recognition, pp. 3350-3355, October 4-8, 2016.
39 S. Yang, P. Luo, C. Loy and X. Tang, "From Facial Parts Responses to Face Detection: A Deep Learning Approach," in Proc. of 2015 IEEE International Conference on Computer Vision, pp. 3676-3684, December 7-13, 2015.
40 Triantafyllidou, D., and Tefas, A., "A Fast Deep Convolutional Neural Network for Face Detection in Big Visual Data," in Proc. of INNS Conference on Big Data, vol. 529, pp. 61-70, 2016.
41 Chen, D., Ren, S., Wei, Y., Cao, X., and Sun, J., "Joint Cascade Face Detection and Alignment," in Proc. of European Conference on Computer Vision, vol. 8694, pp. 109-122, 2014.
42 Farfade, S. S., Saberian, M. J., and Li, L.-J., "Multi-view Face Detection Using Deep Convolutional Neural Networks," in Proc. of the 5th ACM on International Conference on Multimedia Retrieval, pp. 643-650, June 9-12, 2015.
43 Kalal, Z., Matas, J., andMikolajczyk, K., "Weighted Sampling for Large-Scale Boosting," in Proc. of British Machine Vision Conference, pp. 42.1-42.1, September 1-4, 2008.
44 H. Li, G. Hua, Z. Lin, J. Brandt and J. Yang, "Probabilistic Elastic Part Model for Unsupervised Face Detector Adaptation," in Proc. of 2013 IEEE International Conference on Computer Vision, pp. 793-800, December 1-8, 2013.