Browse > Article
http://dx.doi.org/10.7746/jkros.2022.17.4.425

Small Marker Detection with Attention Model in Robotic Applications  

Kim, Minjae (Mechanical Engineering, Sungkyunkwan University)
Moon, Hyungpil (Mechanical Engineering, Sungkyunkwan University)
Publication Information
The Journal of Korea Robotics Society / v.17, no.4, 2022 , pp. 425-430 More about this Journal
Abstract
As robots are considered one of the mainstream digital transformations, robots with machine vision becomes a main area of study providing the ability to check what robots watch and make decisions based on it. However, it is difficult to find a small object in the image mainly due to the flaw of the most of visual recognition networks. Because visual recognition networks are mostly convolution neural network which usually consider local features. So, we make a model considering not only local feature, but also global feature. In this paper, we propose a detection method of a small marker on the object using deep learning and an algorithm that considers global features by combining Transformer's self-attention technique with a convolutional neural network. We suggest a self-attention model with new definition of Query, Key and Value for model to learn global feature and simplified equation by getting rid of position vector and classification token which cause the model to be heavy and slow. Finally, we show that our model achieves higher mAP than state of the art model YOLOr.
Keywords
Unloading Robot; Object Detection; Deep learning;
Citations & Related Records
Times Cited By KSCI : 10  (Citation Analysis)
연도 인용수 순위
1 F. O. Unel, B. O. Ozkalayci, and C. Cigla, "The power of tiling for small object detection," 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Long Beach, CA, USA, 2019, DOI: 10.1109/cvprw.2019.00084.   DOI
2 C. Y. Wang, I. H. Yeh, and H. Y. M. Liao, "You only learn one representation: Unified network for multiple tasks," Computer Vision and Pattern Recognition, 2021, DOI: 10.48550/arXiv.2105.04206.   DOI
3 K. Wang, J. H. Liew, Y. Zou, D. Zhou, and J. Feng, "Panet: Fewshot image semantic segmentation with prototype alignment," 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), 2019, DOI: 10.1109/iccv.2019.00929.   DOI
4 J. Ruiz-del-Solar, P. Loncomilla, and S. Naiomi, "A survey on deep learning methods for robot vision," Computer Vision and Pattern Recognition, 2018, DOI : 10.48550/arXiv.1803.10862.   DOI
5 A. Zeng, S. Song, K. T. Yu, E. Donlon, F. R. Hogan, M. Bauza, D. Ma, O. Taylor, M. Liu, E. Romo, N. Fazeli, F. Alet, N. C. Dafle, R. Holladay, I. Morona, P. O. Nair, D. Green, I. Taylor, W. Liu, T. Funkhouser, and A. Rodriguez, "Robotic pick-and-place of novel objects in clutter with multi-affordance grasping and cross-domain image matching," 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia, 2018, DOI: 10.1109/icra.2018.8461044.   DOI
6 R. Joseph and A. Farhadi, "Yolov3: An incremental improvement, Computer Vision and Pattern Recognition, 2018, DOI: 10.48550/arXiv.1804.02767.   DOI
7 B. Koonce, "Efficientnet," Convolutional Neural Networks with Swift for Tensorflow, Apress, Berkeley, CA., USA, 2021, pp 109-123, DOI: 10.1007/978-1-4842-6168-2_10.   DOI
8 T. Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, and S. Belongie, "Feature pyramid networks for object detection," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, DOI: 10.1109/cvpr.2017.106.   DOI
9 H. Wu, B. Xiao, N. Codella, M. Liu, X. Dai, L. Yuan, and L. Zhang, "Cvt: Introducing convolutions to vision transformers," Computer Vision and Pattern Recognition, 2021, DOI: 10.48550/arXiv.2103.15808.   DOI
10 A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, "An image is worth 16x16 words: Transformers for image recognition at scale, Computer Vision and Pattern Recognition, 2020, DOI: 10.48550/arXiv.2010.11929.   DOI
11 J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You only look once: Unified, real-time object detection," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, Las Vegas, NV, USA, DOI: 10.1109/cvpr.2016.91.   DOI
12 R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, "Grad-cam: Visual explanations from deep networks via gradient-based localization," 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 2017, DOI: 10.1109/iccv.2017.74.   DOI
13 S. M. Lundberg and S. I. Lee, "A unified approach to interpreting model predictions," Artificial Intelligence, 2017, DOI: 10.48550/arXiv.1705.07874.   DOI
14 Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel, "Backpropagation applied to handwritten zip code recognition," Neural Computation, vol. 1, no. 4, pp. 541-551, 1989, DOI: 10.1162/neco.1989.1.4.541.   DOI
15 A. Bochkovskiy, C. Y. Wang, and H. Y. M. Liao, "Yolov4: Optimal speed and accuracy of object detection," Computer Vision and Pattern Recognition, 2020, DOI: 10.48550/arXiv.2004.10934.   DOI
16 W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Y. Fu, and A. C. Berg, "Ssd: Single shot multibox detector." European Conference on Computer Vision, pp. 21-37. Springer, Cham, 2016, DOI: 10.1007/978-3-319-10578-9_23.   DOI
17 L. Deng, M. Yang, T. Li, Y. He, and C. Wang, "RFBNet: deep multimodal networks with residual fusion blocks for RGB-D semantic segmentation," Computer Vision and Pattern Recognition, 2019, DOI: 10.48550/arXiv.1907.00135.   DOI
18 A. Aja z, A. Sa la r, T. Ja ma l, a nd A. U. Kha n, "Sma ll Object Detection using Deep Learning," Computer Vision and Pattern Recognition, 2022, DOI: 10.48550/arXiv.2201.03243.   DOI
19 A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, "Attention is all you need," Computation and Language, 2017, DOI: 10.48550/arXiv.1706.03762.   DOI
20 J. Redmon, and A. Farhadi, "YOLO9000: better, faster, stronger," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, DOI: 10.1109/cvpr.2017.690.   DOI
21 H. Liang, X. Ma, S. Li, M. Gorner, S. Tang, B. Fang, F. Sun, and J. Zhang. "Pointnetgpd: Detecting grasp configurations from point sets," 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, pp. 3629-3635, 2019, DOI: 10.1109/icra.2019.8794435.   DOI
22 Y. Wu and D. Ge. "Key technologies of warehousing robot for intelligent logistics," The First International Symposium on Management and Social Sciences (ISMSS 2019), Atlantis Press, 2019, DOI: 10.2991/ismss-19.2019.16.   DOI