DOI QR코드

DOI QR Code

Recognition of GUI Widgets Utilizing Translational Embeddings based on Relational Learning

트랜슬레이션 임베딩 기반 관계 학습을 이용한 GUI 위젯 인식

  • Park, Min-Su (Dept. of Computer and Communication Engineering, Kangwon National University) ;
  • Seok, Ho-Sik (Dept. of Computer and Communication Engineering, Kangwon National University)
  • Received : 2018.09.05
  • Accepted : 2018.09.13
  • Published : 2018.09.30

Abstract

CNN based object recognitions have reported splendid results. However, the recognition of mobile apps raises an interesting challenge that recognition performance of similar widgets is not consistent. In order to improve the performance, we propose a noble method utilizing relations between input widgets. The recognition process flows from the Faster R-CNN based recognition to enhancement using a relation recognizer. The relations are represented as vector translation between objects in a relation space. Experiments on 323 apps show that our method significantly enhances the Faster R-CNN only approach.

CNN 기반의 객체 인식 성능은 매우 우수한 것으로 보고되고 있지만 모바일 기기의 앱 GUI와 같이 일반적으로 생각하기에 잡음이 적고 분명하게 인식될 수 있을 것으로 기대되는 환경에 적용해보면 인간의 관점에서 매우 유사한 GUI 입력 위젯들이 의외로 잘 인식되지는 않는다는 문제가 발생한다. 본 논문에서는 CNN의 입력 위젯 인식 성능을 향상시키기 위하여 모바일 앱의 GUI를 구성하는 객체들의 관계를 활용하는 방법을 제안한다. 제안 방법에서는 (1) CNN 기반의 객체 인식 도구인 Faster R-CNN을 이용하여 모바일 앱을 구성하는 입력 위젯을 1차 인식한 후 (2) 위젯 인식률 향상을 위하여 객체 간의 관계를 활용하는 방법을 결합하였다. 객체 간의 관계는 표현 공간상에서의 벡터의 평행 이동을 활용하여 표현되었으며, 총 323개의 앱에서 생성한 데이터에 적용한 결과 Faster R-CNN만을 사용한 경우와 비교하여 위젯 인식률을 상당히 개선할 수 있음을 확인하였다.

Keywords

References

  1. R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation," in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2014), pp. 580-587, 2014.
  2. R. Girshick, "Fast R-CNN," in Proc. of the International Conference on Computer Vision (ICCV 2015), pp. 1440-1448, 2014.
  3. S. Ren, K. He, R. Girshick, and J. Sun, "Faster R-CNN: towards real-time object detection with region proposal networks," IEEE Trans. Pattern Anal. Mach. Intell., vol.39, no.6, pp. 1137-1149, 2016. DOI:10.1109/TPAMI.2016.2577031
  4. J. Johnson, R. Krishna, M. Stark, L.-J. Li, D. A. Shamma, M. S. Bernstein, and L. Fei-Fei, "Image retrieval using scene graphs," in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015), pp. 3668-3678, 2015.
  5. C. Galleguillos, A. Rabinovich, and S. Belongie, "Object categorization using co-occurrence, location and appearance," in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2008), pp. 1-8. 2014. DOI:10.1109/CVPR.2008.4587799
  6. W. Choi, Y.-W. Chao, C. Pantofaru, and S. Savarese, "Understanding indoor scenes using 3D geometric phrases," in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2013), pp. 33-40, 2013.
  7. C. Lu, R. Krishna, M. Bernstein, and L. Fei-Fei, "Visual relationship detection with language priors," in Proc. of the European Conference on Computer Vision (ECCV 2016), pp. 852-869, 2016.
  8. G. Gkioxari, R. Girshick, and J. Malik, "Contextual action recognition with R*CNN," in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015), pp. 1080-1088, 2015.
  9. V. Ramanathan, C. Li, J. Deng, W. Han, Z. Li, K. Gu, Y. Song, S. Bengio, C. Rossenberg, and L. Fei-Fei, "Learning semantic relationships for better action retrieval in images," in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015), pp. 1100-1109, 2015.
  10. B. Dai, Y. Zhang, and D. Lin, "Detecting visual relationships with deep relational networks," in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), pp. 3076-3086, 2017.
  11. L.-C. Chen, A. G. Schwing, A. L. Yuille, and R. Urtasun, "Learning deep structured models," in Proc. of the International Conference on Machine Learning (ICML 2015), pp. 1785-1794, 2015.
  12. H. Wang, X. Shi, and D.-Y. Yeung, "Relational deep learning: a deep latent variable model for link prediction," in Proc. of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17), pp. 2688-2694, 2017.
  13. M. Long, Z. Cao, J. Wang, and P. S. Yu, "Learning multiple tasks with multilinear relationship networks," in Proc. of the Thirty-first Annual Conference on Neural Information Processing (NIPS 2017), pp. 1593-1602, 2017.
  14. J.R.R. Uijlings, K.D.A. van de Sande, T. Gevers, and A. W. M. Smeulders, "Selective search for object recognition," INT J COMPUT VISION, vol.104, no.2, pp. 154-171, 2013. DOI:10.1007/s11263-013-0620-5
  15. K. Mao, M. Harman, and Y. Jia, "Sapienz: multi-objective automated testing for Android applications," in Proc. of 2016 International Symposium on Software Testing and Analysis, pp. 94-105, 2016.
  16. A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston, and O. Yakhnenko, "Translating embeddings for modeling multi-relational data," in Proc. of the Twenty-seventh Conference on Neural Information Processing Systems (NIPS 2013), pp. 2787-2795, 2013.
  17. H. Zhang, Z. Kyaw, S.-F. Chang, and T.-S. Chua, "Visual translation embedding network for visual relational detection," in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), pp. 5532-5540, 2017.