Browse > Article
http://dx.doi.org/10.3745/KTSDE.2022.11.1.29

Single Shot Detector for Detecting Clickable Object in Mobile Device Screen  

Jo, Min-Seok (고려대학교 전기전자공학과)
Chun, Hye-won (고려대학교 전기전자공학과)
Han, Seong-Soo (강원대학교 자유전공학부)
Jeong, Chang-Sung (고려대학교 전자공학과)
Publication Information
KIPS Transactions on Software and Data Engineering / v.11, no.1, 2022 , pp. 29-34 More about this Journal
Abstract
We propose a novel network architecture and build dataset for recognizing clickable objects on mobile device screens. The data was collected based on clickable objects on the mobile device screen that have numerous resolution, and a total of 24,937 annotation data were subdivided into seven categories: text, edit text, image, button, region, status bar, and navigation bar. We use the Deconvolution Single Shot Detector as a baseline, the backbone network with Squeeze-and-Excitation blocks, the Single Shot Detector layer structure to derive inference results and the Feature pyramid networks structure. Also we efficiently extract features by changing the input resolution of the existing 1:1 ratio of the network to a 1:2 ratio similar to the mobile device screen. As a result of experimenting with the dataset we have built, the mean average precision was improved by up to 101% compared to baseline.
Keywords
Test Automation; Android Object Detection; Mobile Screen Detection; Computer Vision; Deep Learning;
Citations & Related Records
연도 인용수 순위
  • Reference
1 W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Y. Fu, and A. C. Berg, "SSD: Single shot multibox detector," in Proceedings of the European Conference on Computer Vision, Amsterdam, pp.21-37, 2016.
2 K. He, X. Zhang, S. Ren, and J. Sun, "Deep resudual learning for image recognition," in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Nevada, pp.770-778, 2016.
3 A. Newell, K. Yang, and J. Deng, "Stacked hourglass networks for human pose estimation," in Proceedings of the European Conference on Computer Vision, Amsterdam, pp.483-499, 2016.
4 Y. Gao, O. Beijbom, N. Zhang, and T. Darrell, "Compact bilinear pooling," in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Nevada, pp.317-326, 2016
5 M. Everingham, L. V. Gool, C. K. I. Williams, J. Winn, and A. Zisserman, "The pascal visual object classes (VOC) challenge," International Journal of Computer Vision, Vol.88, No.2, pp.303-338, 2010.   DOI
6 S. Ren, K. He, R. Girshick, and J. Sun, "Faster R-CNN: Towards real-time object detection with region proposal networks," in Proceedings of the Advances in Neural Information Processing Systems, Quebec, pp.91-99, 2015.
7 I. A. Salihu, R. Ibrahim, B. S. Ahmed, K. Z. Zamli, and A. Usman, "AMOGA: A static-dynamic model generation strategy for mobile apps testing," IEEE Access, Vol.7, pp.17158-17173, 2019.   DOI
8 J. Hu, L. Shen, and G. Sun, "Squeeze-and-excitation networks," in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Utah, pp.7132-7141, 2018.
9 T. Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, and S. Belongie, "Feature Pyramid Networks for Object Detection," in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Honolulu, pp.2117-2125, 2017.
10 Y. Baek and D. Bae, "Automated model-based android GUI testing using multi-level GUI comparison criteria," in Proceedings of the IEEE/ACM International Conference on Automated Software Engineering, Singapore, pp.238-249, 2016.
11 A. Usman, N. Ibrahim, and I. A. Salihu, "TEGDroid: Test case generation approach for android apps considering context and GUI events," International Journal on Advanced Science, Engineering and Information Technology, Vol.10, No.1, pp.16-23, 2020.   DOI
12 R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, pp.580-587, 2014.
13 J. R. R. Uijlings, K. E. A. Van De Sande, T. Gevers, and A. W. M. Smeulders, "Selective search for object recognition," International Journal of Computer Vision, Vol.104, No.2, pp.154-171, 2013.   DOI
14 R. Girshick, "Fast R-CNN," in Proceedings of the IEEE International Conference on Computer Vision, Santiago, pp.1440-1448, 2015.
15 K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv:1409.1556v6 [cs.CV] 10 Apr. 2015.
16 J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You only look once: Unified, real-time object detection," in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Nevada, pp.779-788, 2016.
17 C.-Y. Fu, Wei Liu, Ananth Ranga, Ambrish Tyagi and Alexander C. Berg, "DSSD: Deconvolutional Single Shot Detector," arXiv:1701.06659v1 [cs.CV] 23 Jan. 2017.