Browse > Article
http://dx.doi.org/10.15701/kcgs.2019.25.5.11

Fingertip Detection through Atrous Convolution and Grad-CAM  

Noh, Dae-Cheol (School of Computer Engineering, Seokyeong University)
Kim, Tae-Young (School of Computer Engineering, Seokyeong University)
Abstract
With the development of deep learning technology, research is being actively carried out on user-friendly interfaces that are suitable for use in virtual reality or augmented reality applications. To support the interface using the user's hands, this paper proposes a deep learning-based fingertip detection method to enable the tracking of fingertip coordinates to select virtual objects, or to write or draw in the air. After cutting the approximate part of the corresponding fingertip object from the input image with the Grad-CAM, and perform the convolution neural network with Atrous Convolution for the cut image to detect fingertip location. This method is simpler and easier to implement than existing object detection algorithms without requiring a pre-processing for annotating objects. To verify this method we implemented an air writing application and showed that the recognition rate of 81% and the speed of 76 ms were able to write smoothly without delay in the air, making it possible to utilize the application in real time.
Keywords
Deep Learning; Atrous Convolution; Grad-CAM; Object Detection;
Citations & Related Records
연도 인용수 순위
  • Reference
1 W. Liu, O. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Y. Fu, and A. C. Berg, "SSD: Single shot multibox detector," European conference on computer vision, pp. 21-37, 2016.
2 A. Krizhevsky, and G. Hinton, "Learning multiple layers of features from tiny images," Tech Report, 2009.
3 D. Lowe, "Distinctive image features from scale invariant keypoints," IJCV, 60(2): pp. 91-110, 2004.   DOI
4 P. Viola, and M. Joncs, "Rapid object detection using a boosted cascade of simple features," CVPR, pp. 511-518, 2004.
5 G. Csurka, C. Dance, L. Fan, J. Willamowski, and C. Bray, "Visual categorization with bags of keypoints," Workshop on statistical learning in computer vision, ECCV, pp. 1-22, 2004.
6 A. Krizhevsky, I. Sutskcver, and G. E. Hinton, "ImageNet classification with deep convolutional neural networks," Advances in neural information processing systems, pp. 1097-1105, 2012.
7 K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778, 2016.
8 C. Farabet, C. Couprie, L. Najaman, and Y. LeCun, "Learning hierarchical features for scene labeling," IEEE transactions on pattern analysis and machine intelligence, pp. 1915-1929, 2012,
9 R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation," Proceedings of the IEEE conference on computer vision and pattern recognition, pp.580-587, 2014.
10 R. Girshick, "Fast R-CNN," Proceedings af the IEEE international conference on computer vision, pp.1440-1448, 2015.
11 L. C. Chen, G. Papandreou, L Kokkinos, K. Murphy, and A. L. Yuille, "Semantic image segmentation with deep convolutional nets and fully connected CRFs," arXiv preprint arXiv: 1411.7061, 2014.
12 S. Ren, K He, R. Girshick, and J. Sun, "Faster R-CNN: Towards real-time object detection with region proposal networks," Advances in neural information processing systems, pp. 91-99, 2015.
13 J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You Only Look Once: Unified, real-time object detection," Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779-788, 2016.
14 R. R. Selvaraju, M. Cogswcll, A. Das, R. Vedantam, D. Parikh, and D. Batra, "Grad-CAM: Visual explanations from deep networks via gradient-based localization," Proeceedings of the IEEE international conference on computer vision, pp. 618-626, 2017.
15 L. C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, "Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs," IEEE transaction on the pattern analysis and machine intelligence, 40(4): pp. 834-848, 2017.   DOI
16 J. Hosang, R. Benenson, and B, Schiele, "Learning non-maximum suppression," Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4507-4515, 2017.
17 B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, "Learning deep features for discriminative localization," Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2921-2929, 2016.
18 M. Lin, Q. Chen, and S. Van, "Network in network," arXiv preprint arXiv: 1312.4400, 2013.
19 V. Nair, G. E. Hinton, "Rectified linear units restricted boltzmann machines," Proceedings improve of the 27th international conference on machine learning(ICML-10), pp. 807-814, 2010.
20 K. Simonyan, and A. Zisserman, "Very deep convolutional networks for image recognition," arXiv preprint arXiv: 1409,1556, 2014.