DOI QR코드

DOI QR Code

Domain Adaptive Fruit Detection Method based on a Vision-Language Model for Harvest Automation

작물 수확 자동화를 위한 시각 언어 모델 기반의 환경적응형 과수 검출 기술

  • Received : 2023.12.25
  • Accepted : 2024.02.05
  • Published : 2024.04.30

Abstract

Recently, mobile manipulators have been utilized in agriculture industry for weed removal and harvest automation. This paper proposes a domain adaptive fruit detection method for harvest automation, by utilizing OWL-ViT model which is an open-vocabulary object detection model. The vision-language model can detect objects based on text prompt, and therefore, it can be extended to detect objects of undefined categories. In the development of deep learning models for real-world problems, constructing a large-scale labeled dataset is a time-consuming task and heavily relies on human effort. To reduce the labor-intensive workload, we utilized a large-scale public dataset as a source domain data and employed a domain adaptation method. Adversarial learning was conducted between a domain discriminator and feature extractor to reduce the gap between the distribution of feature vectors from the source domain and our target domain data. We collected a target domain dataset in a real-like environment and conducted experiments to demonstrate the effectiveness of the proposed method. In experiments, the domain adaptation method improved the AP50 metric from 38.88% to 78.59% for detecting objects within the range of 2m, and we achieved 81.7% of manipulation success rate.

Keywords

Acknowledgement

This work was supported by Electronics and Telecommunications Research Institute (ETRI) grant funded by the Korean government [23ZD1130, Regional Industry ICT Convergence Technology Advancement and Support Project in Daegu-GyeongBuk (Robot)]. 이 연구는 2023년도 산업통상자원부 및 산업기술평가관리원 (KEIT) 연구비 지원에 의한 연구임 (20023305).

References

  1. D. R. Vincent, N. Deepa, D. Elavarasan, K. Srinivasan, S. H. Chauhdary, C. Iwendi, "Sensors Driven AI-based Agriculture Recommendation Model for Assessing Land Suitability," Sensors, Vol. 19, No. 17, pp. 3667, 2019.
  2. M. D. Bah, A. Hafiane, R. Canals, "Deep Learning with Unsupervised Data Labeling for Weed Detection in Line Crops in UAV Images," Sensors, Vol. 10, No. 11, pp. 1690, 2018.
  3. L. Li, S. Zhang, B. Wang, "Plant Disease Detection and Classification by Deep Learning-A Review," IEEE Access, Vol. 9, pp. 56683-56698, 2021. https://doi.org/10.1109/ACCESS.2021.3069646
  4. Y. Onishi, T. Yoshida, H. Kurita, T. Fukao, H. Arihara, A. Iwai, "An Automated Fruit Harvesting Robot by Using Deep Learning," Robomech Journal, Vol. 6, No. 1, pp. 1-8, 2019. https://doi.org/10.1186/s40648-019-0129-y
  5. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, N. Houlsby, "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale," arXiv:2010.11929, 2021.
  6. 남창우, 송지민, 진용식, 이상준, "작물 수확 자동화를 위한 환경적응형 과수 검출 알고리즘," 2023 대한임베디드공학회 추계학술대회, 제주, 2023.
  7. J. Redmon, S. Divvala, R. Girshick, A. Farhadi, "You Only Look Oonce: Unified, Real-time Object Detection," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779-788, 2016.
  8. N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, S. Zagoruyko, "End-to-end Object Detection with Transformers," Proceedings of the European Conference on Computer Vision, pp. 213-229, 2020.
  9. A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, I. Sutskever, "Learning Transferable Visual Models From Natural Language Supervision," International Conference on Machine Learning, PMLR, pp. 8748-8763, 2021.
  10. M. Minderer, A. Gritsenko, A. Stone, M. Neumann, D. Weissenborn, A. Dosovitskiy, A. Mahendran, A. Arnab, M. Dehghani, Z. Shen, X. Wang, X. Zhai, T. Kipf, N. Houlsby, "Simple Open-Vocabulary Object Detection with Vision Transformers," European Conference on Computer Vision, pp. 728-755, 2022.
  11. N. Hani, P. Roy, V. Isler, "MinneApple: a Benchmark Dataset for Apple Detection and Segmentation," IEEE Robotics and Automation Letters, Vol. 5, No. 2, pp. 852-858, 2020. https://doi.org/10.1109/LRA.2020.2965061
  12. Sugar Content-Quality Data of Apple in Jeonbuk Jangsu, online available: https://www.aihub.or.kr/aihubdata/data/view.do?currMenu=115 &topMenu=100&dataSetSn=490
  13. P. Chu, Z. Li, K. Lammers, R. Lu, X. Liu, "Deep Learning-based Apple Detection Using a Suppression Mask R-CNN," Pattern Recognition Letters, Vol. 147, pp. 206-211, 2021. https://doi.org/10.1016/j.patrec.2021.04.022
  14. Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, V. Lempitsky, "Domain-Adversarial Training of Neural Networks," The Journal of Machine Learning Research, Vol. 17, No. 59, pp. 1-35, 2016.
  15. E. Tzeng, J. Hoffman, J. Saenko, C. Chen, "Adversarial Discriminative Domain Adaptation," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7167-7176, 2017.
  16. D. P. Kingma, J. Ba, "Adam: A Method for Stochastic Optimization," arXiv:1412.6980, 2014.
  17. B. T. Polyak, A. B. Juditsky, "Acceleration of Stochastic Approximation by Averaging," SIAM Journal on Control and Optimization, Vol. 30, No. 4, pp. 838-855, 1992. https://doi.org/10.1137/0330046
  18. S. Abnar, W. Zuidema, "Quantifying Attention Flow in Transformers," arXiv:2005.00928, 2020.