DOI QR코드

DOI QR Code

Small Marker Detection with Attention Model in Robotic Applications

로봇시스템에서 작은 마커 인식을 하기 위한 사물 감지 어텐션 모델

  • Kim, Minjae (Mechanical Engineering, Sungkyunkwan University) ;
  • Moon, Hyungpil (Mechanical Engineering, Sungkyunkwan University)
  • Received : 2022.09.07
  • Accepted : 2022.10.31
  • Published : 2022.11.30

Abstract

As robots are considered one of the mainstream digital transformations, robots with machine vision becomes a main area of study providing the ability to check what robots watch and make decisions based on it. However, it is difficult to find a small object in the image mainly due to the flaw of the most of visual recognition networks. Because visual recognition networks are mostly convolution neural network which usually consider local features. So, we make a model considering not only local feature, but also global feature. In this paper, we propose a detection method of a small marker on the object using deep learning and an algorithm that considers global features by combining Transformer's self-attention technique with a convolutional neural network. We suggest a self-attention model with new definition of Query, Key and Value for model to learn global feature and simplified equation by getting rid of position vector and classification token which cause the model to be heavy and slow. Finally, we show that our model achieves higher mAP than state of the art model YOLOr.

Keywords

Acknowledgement

This research was supported by the MSIT (Ministry of Science and ICT), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2022-2020-0-01460) supervised by the IITP (Institute of Information & Communications Technology Planning & Evaluation

References

  1. J. Ruiz-del-Solar, P. Loncomilla, and S. Naiomi, "A survey on deep learning methods for robot vision," Computer Vision and Pattern Recognition, 2018, DOI : 10.48550/arXiv.1803.10862.
  2. Y. Wu and D. Ge. "Key technologies of warehousing robot for intelligent logistics," The First International Symposium on Management and Social Sciences (ISMSS 2019), Atlantis Press, 2019, DOI: 10.2991/ismss-19.2019.16.
  3. H. Liang, X. Ma, S. Li, M. Gorner, S. Tang, B. Fang, F. Sun, and J. Zhang. "Pointnetgpd: Detecting grasp configurations from point sets," 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, pp. 3629-3635, 2019, DOI: 10.1109/icra.2019.8794435.
  4. A. Zeng, S. Song, K. T. Yu, E. Donlon, F. R. Hogan, M. Bauza, D. Ma, O. Taylor, M. Liu, E. Romo, N. Fazeli, F. Alet, N. C. Dafle, R. Holladay, I. Morona, P. O. Nair, D. Green, I. Taylor, W. Liu, T. Funkhouser, and A. Rodriguez, "Robotic pick-and-place of novel objects in clutter with multi-affordance grasping and cross-domain image matching," 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia, 2018, DOI: 10.1109/icra.2018.8461044.
  5. Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel, "Backpropagation applied to handwritten zip code recognition," Neural Computation, vol. 1, no. 4, pp. 541-551, 1989, DOI: 10.1162/neco.1989.1.4.541.
  6. A. Bochkovskiy, C. Y. Wang, and H. Y. M. Liao, "Yolov4: Optimal speed and accuracy of object detection," Computer Vision and Pattern Recognition, 2020, DOI: 10.48550/arXiv.2004.10934.
  7. R. Joseph and A. Farhadi, "Yolov3: An incremental improvement, Computer Vision and Pattern Recognition, 2018, DOI: 10.48550/arXiv.1804.02767.
  8. W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Y. Fu, and A. C. Berg, "Ssd: Single shot multibox detector." European Conference on Computer Vision, pp. 21-37. Springer, Cham, 2016, DOI: 10.1007/978-3-319-10578-9_23.
  9. L. Deng, M. Yang, T. Li, Y. He, and C. Wang, "RFBNet: deep multimodal networks with residual fusion blocks for RGB-D semantic segmentation," Computer Vision and Pattern Recognition, 2019, DOI: 10.48550/arXiv.1907.00135.
  10. B. Koonce, "Efficientnet," Convolutional Neural Networks with Swift for Tensorflow, Apress, Berkeley, CA., USA, 2021, pp 109-123, DOI: 10.1007/978-1-4842-6168-2_10.
  11. A. Aja z, A. Sa la r, T. Ja ma l, a nd A. U. Kha n, "Sma ll Object Detection using Deep Learning," Computer Vision and Pattern Recognition, 2022, DOI: 10.48550/arXiv.2201.03243.
  12. F. O. Unel, B. O. Ozkalayci, and C. Cigla, "The power of tiling for small object detection," 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Long Beach, CA, USA, 2019, DOI: 10.1109/cvprw.2019.00084.
  13. T. Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, and S. Belongie, "Feature pyramid networks for object detection," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, DOI: 10.1109/cvpr.2017.106.
  14. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, "Attention is all you need," Computation and Language, 2017, DOI: 10.48550/arXiv.1706.03762.
  15. H. Wu, B. Xiao, N. Codella, M. Liu, X. Dai, L. Yuan, and L. Zhang, "Cvt: Introducing convolutions to vision transformers," Computer Vision and Pattern Recognition, 2021, DOI: 10.48550/arXiv.2103.15808.
  16. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, "An image is worth 16x16 words: Transformers for image recognition at scale, Computer Vision and Pattern Recognition, 2020, DOI: 10.48550/arXiv.2010.11929.
  17. C. Y. Wang, I. H. Yeh, and H. Y. M. Liao, "You only learn one representation: Unified network for multiple tasks," Computer Vision and Pattern Recognition, 2021, DOI: 10.48550/arXiv.2105.04206.
  18. K. Wang, J. H. Liew, Y. Zou, D. Zhou, and J. Feng, "Panet: Fewshot image semantic segmentation with prototype alignment," 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), 2019, DOI: 10.1109/iccv.2019.00929.
  19. J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You only look once: Unified, real-time object detection," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, Las Vegas, NV, USA, DOI: 10.1109/cvpr.2016.91.
  20. J. Redmon, and A. Farhadi, "YOLO9000: better, faster, stronger," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, DOI: 10.1109/cvpr.2017.690.
  21. R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, "Grad-cam: Visual explanations from deep networks via gradient-based localization," 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 2017, DOI: 10.1109/iccv.2017.74.
  22. S. M. Lundberg and S. I. Lee, "A unified approach to interpreting model predictions," Artificial Intelligence, 2017, DOI: 10.48550/arXiv.1705.07874.