Small Marker Detection with Attention Model in Robotic Applications

Kim, Minjae;Moon, Hyungpil;

doi:10.7746/jkros.2022.17.4.425

로봇학회논문지 (The Journal of Korea Robotics Society)

제17권4호
/
Pages.425-430
/
2022
/
1975-6291(pISSN)
/
2287-3961(eISSN)

한국로봇학회 (Korea Robotics Society)

DOI QR Code

로봇시스템에서 작은 마커 인식을 하기 위한 사물 감지 어텐션 모델

Small Marker Detection with Attention Model in Robotic Applications

김민재 ;
문형필

Kim, Minjae (Mechanical Engineering, Sungkyunkwan University) ;
Moon, Hyungpil (Mechanical Engineering, Sungkyunkwan University)

투고 : 2022.09.07
심사 : 2022.10.31
발행 : 2022.11.30

https://doi.org/10.7746/jkros.2022.17.4.425 인용 PDF KSCI

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

As robots are considered one of the mainstream digital transformations, robots with machine vision becomes a main area of study providing the ability to check what robots watch and make decisions based on it. However, it is difficult to find a small object in the image mainly due to the flaw of the most of visual recognition networks. Because visual recognition networks are mostly convolution neural network which usually consider local features. So, we make a model considering not only local feature, but also global feature. In this paper, we propose a detection method of a small marker on the object using deep learning and an algorithm that considers global features by combining Transformer's self-attention technique with a convolutional neural network. We suggest a self-attention model with new definition of Query, Key and Value for model to learn global feature and simplified equation by getting rid of position vector and classification token which cause the model to be heavy and slow. Finally, we show that our model achieves higher mAP than state of the art model YOLOr.

키워드

과제정보

This research was supported by the MSIT (Ministry of Science and ICT), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2022-2020-0-01460) supervised by the IITP (Institute of Information & Communications Technology Planning & Evaluation

참고문헌

J. Ruiz-del-Solar, P. Loncomilla, and S. Naiomi, "A survey on deep learning methods for robot vision," Computer Vision and Pattern Recognition, 2018, DOI : 10.48550/arXiv.1803.10862.
Y. Wu and D. Ge. "Key technologies of warehousing robot for intelligent logistics," The First International Symposium on Management and Social Sciences (ISMSS 2019), Atlantis Press, 2019, DOI: 10.2991/ismss-19.2019.16.
H. Liang, X. Ma, S. Li, M. Gorner, S. Tang, B. Fang, F. Sun, and J. Zhang. "Pointnetgpd: Detecting grasp configurations from point sets," 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, pp. 3629-3635, 2019, DOI: 10.1109/icra.2019.8794435.
A. Zeng, S. Song, K. T. Yu, E. Donlon, F. R. Hogan, M. Bauza, D. Ma, O. Taylor, M. Liu, E. Romo, N. Fazeli, F. Alet, N. C. Dafle, R. Holladay, I. Morona, P. O. Nair, D. Green, I. Taylor, W. Liu, T. Funkhouser, and A. Rodriguez, "Robotic pick-and-place of novel objects in clutter with multi-affordance grasping and cross-domain image matching," 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia, 2018, DOI: 10.1109/icra.2018.8461044.
Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel, "Backpropagation applied to handwritten zip code recognition," Neural Computation, vol. 1, no. 4, pp. 541-551, 1989, DOI: 10.1162/neco.1989.1.4.541.
A. Bochkovskiy, C. Y. Wang, and H. Y. M. Liao, "Yolov4: Optimal speed and accuracy of object detection," Computer Vision and Pattern Recognition, 2020, DOI: 10.48550/arXiv.2004.10934.
R. Joseph and A. Farhadi, "Yolov3: An incremental improvement, Computer Vision and Pattern Recognition, 2018, DOI: 10.48550/arXiv.1804.02767.
W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Y. Fu, and A. C. Berg, "Ssd: Single shot multibox detector." European Conference on Computer Vision, pp. 21-37. Springer, Cham, 2016, DOI: 10.1007/978-3-319-10578-9_23.
L. Deng, M. Yang, T. Li, Y. He, and C. Wang, "RFBNet: deep multimodal networks with residual fusion blocks for RGB-D semantic segmentation," Computer Vision and Pattern Recognition, 2019, DOI: 10.48550/arXiv.1907.00135.
B. Koonce, "Efficientnet," Convolutional Neural Networks with Swift for Tensorflow, Apress, Berkeley, CA., USA, 2021, pp 109-123, DOI: 10.1007/978-1-4842-6168-2_10.
A. Aja z, A. Sa la r, T. Ja ma l, a nd A. U. Kha n, "Sma ll Object Detection using Deep Learning," Computer Vision and Pattern Recognition, 2022, DOI: 10.48550/arXiv.2201.03243.
F. O. Unel, B. O. Ozkalayci, and C. Cigla, "The power of tiling for small object detection," 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Long Beach, CA, USA, 2019, DOI: 10.1109/cvprw.2019.00084.
T. Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, and S. Belongie, "Feature pyramid networks for object detection," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, DOI: 10.1109/cvpr.2017.106.
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, "Attention is all you need," Computation and Language, 2017, DOI: 10.48550/arXiv.1706.03762.
H. Wu, B. Xiao, N. Codella, M. Liu, X. Dai, L. Yuan, and L. Zhang, "Cvt: Introducing convolutions to vision transformers," Computer Vision and Pattern Recognition, 2021, DOI: 10.48550/arXiv.2103.15808.
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, "An image is worth 16x16 words: Transformers for image recognition at scale, Computer Vision and Pattern Recognition, 2020, DOI: 10.48550/arXiv.2010.11929.
C. Y. Wang, I. H. Yeh, and H. Y. M. Liao, "You only learn one representation: Unified network for multiple tasks," Computer Vision and Pattern Recognition, 2021, DOI: 10.48550/arXiv.2105.04206.
K. Wang, J. H. Liew, Y. Zou, D. Zhou, and J. Feng, "Panet: Fewshot image semantic segmentation with prototype alignment," 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), 2019, DOI: 10.1109/iccv.2019.00929.
J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You only look once: Unified, real-time object detection," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, Las Vegas, NV, USA, DOI: 10.1109/cvpr.2016.91.
J. Redmon, and A. Farhadi, "YOLO9000: better, faster, stronger," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, DOI: 10.1109/cvpr.2017.690.
R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, "Grad-cam: Visual explanations from deep networks via gradient-based localization," 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 2017, DOI: 10.1109/iccv.2017.74.
S. M. Lundberg and S. I. Lee, "A unified approach to interpreting model predictions," Artificial Intelligence, 2017, DOI: 10.48550/arXiv.1705.07874.

로봇학회논문지 (The Journal of Korea Robotics Society)

로봇시스템에서 작은 마커 인식을 하기 위한 사물 감지 어텐션 모델

Small Marker Detection with Attention Model in Robotic Applications

초록

키워드

과제정보

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)