DOI QR코드

DOI QR Code

Agricultural Applicability of AI based Image Generation

AI 기반 이미지 생성 기술의 농업 적용 가능성

  • Seungri Yoon (Department of Agriculture, Forestry and Bioresources (Horticultural Science and Biotechnology), Seoul National University) ;
  • Yeyeong Lee (Department of Agriculture, Forestry and Bioresources (Horticultural Science and Biotechnology), Seoul National University) ;
  • Eunkyu Jung (Department of Agriculture, Forestry and Bioresources (Horticultural Science and Biotechnology), Seoul National University) ;
  • Tae In Ahn (Department of Agriculture, Forestry and Bioresources (Horticultural Science and Biotechnology), Seoul National University)
  • 윤승리 (서울대학교 농림생물자원학부) ;
  • 이예영 (서울대학교 농림생물자원학부) ;
  • 정은규 (서울대학교 농림생물자원학부) ;
  • 안태인 (서울대학교 농림생물자원학부)
  • Received : 2024.04.16
  • Accepted : 2024.04.29
  • Published : 2024.04.30

Abstract

Since ChatGPT was released in 2022, the generative artificial intelligence (AI) industry has seen massive growth and is expected to bring significant innovations to cognitive tasks. AI-based image generation, in particular, is leading major changes in the digital world. This study investigates the technical foundations of Midjourney, Stable Diffusion, and Firefly-three notable AI image generation tools-and compares their effectiveness by examining the images they produce. The results show that these AI tools can generate realistic images of tomatoes, strawberries, paprikas, and cucumbers, typical crops grown in greenhouse. Especially, Firefly stood out for its ability to produce very realistic images of greenhouse-grown crops. However, all tools struggled to fully capture the environmental context of greenhouses where these crops grow. The process of refining prompts and using reference images has proven effective in accurately generating images of strawberry fruits and their cultivation systems. In the case of generating cucumber images, the AI tools produced images very close to real ones, with no significant differences found in their evaluation scores. This study demonstrates how AI-based image generation technology can be applied in agriculture, suggesting a bright future for its use in this field.

2022년 ChatGPT 출시 이후, 생성형 AI 산업은 엄청난 규모로 성장하였으며, 인지 작업에 혁신을 가져올 것으로 기대되고 있다. 특히 AI 기반 이미지 생성 기술은 현재 디지털 세계의 핵심적인 변화를 주도하고 있다. 본 연구는 대표적인 AI 이미지 생성 도구인 미드저니, 스테이블 디퓨전, 그리고 파이어플라이의 기술적 원리를 분석하고, 이미지 생성 결과를 비교함으로써 그 유용성을 평가하였다. 실험 결과, 이 AI 도구들은 대표 시설원예 작물인 토마토, 딸기, 파프리카, 오이의 과실 이미지를 실제와 유사하게 재현하였다. 특히 파이어플라이는 실제 온실 재배 작물 이미지를 매우 사실적으로 묘사하는 능력을 보여주었다. 그러나 모든 도구들은 작물이 자라는 온실의 환경적 맥락을 완전히 반영하는 데에 있어서 다소 한계를 보였다. 프롬프트 개선 및 레퍼런스 이미지를 활용하여 딸기과실 이미지와 시설 딸기재배 시스템을 보다 정교하게 생성하는 과정도 포함되었으며, 이러한 접근은 AI 이미지 생성 기술의 세밀한 조정이 가능함을 보여준다. 오이 과실 이미지 생성능력을 비교한 결과, AI 생성 도구들은 실제 이미지와 매우 유사한 이미지를 생성해 냄으로써 이미지 생성 점수(CLIP score)에 있어서 통계적 차이를 보이지 않았다. 본 연구는 AI 기반 이미지 생성 이미지 기술이 농업 분야에 활용될 수 있는 방안을 모색하며, 생성형 AI의 농업에 대한 적용을 긍정적으로 전망한다.

Keywords

Acknowledgement

본 결과물은 농림축산식품부 및 과학기술정보통신부, 농촌진흥청의 재원으로 농림식품기술기획평가원과 재단 법인 스마트팜연구개발사업단의 스마트팜다부처 패키지혁신기술개발사업의 지원을 받아 연구되었음(423001-02).

References

  1. Alreshidi E. 2019, Smart sustainable agriculture (SSA) solution underpinned by internet of things (IoT) and artificial intelligence (AI). Int J Adv Comput Sci Appl 10(5):90-102. doi:10.14569/IJACSA.2019.0100513
  2. Anyatasia F. 2023, Investigating motivation and usage of textto-image generative AI for creative practitioner. Available via https://helda.helsinki.fi/server/api/core/bitstreams/4edf6adb-2d67-4047-bf81-ea09a9b940f1/content
  3. Bengio Y., Y. Lecun, and G. Hinton 2021, Deep learning for AI. Commun ACM 64(7):58-65.
  4. Bird J.J., C.M. Barnes, L.J. Manso, A. Ekart, and D.R. Faria 2022, Fruit quality and defect image classification with conditional GAN data augmentation. Sci Hortic 293:110684. doi:10.1016/j.scienta.2021.110684
  5. Borji A. 2022, Generated faces in the wild: Quantitative comparison of stable diffusion, midjourney and dall-e 2. arXiv preprint arXiv:2210.00586. doi:10.48550/arXiv.2210.00586
  6. Brewer M.T., L. Lang, K. Fujimura, N. Dujmovic, S. Gray, and E. van der Knaap 2006, Development of a controlled vocabulary and software application to analyze fruit shape variation in tomato and other plant species. Plant Physiol 141:15-25. doi:10.1104/pp.106.077867
  7. Chang A., M. Savva, and C.D. Manning 2014, Learning spatial knowledge for text to 3D scene generation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 2028-2038.
  8. Creswell A., T. White, V. Dumoulin, K. Arulkumaran, B. Sengupta, and A.A. Bharath 2018, Generative adversarial networks: An overview. IEEE Signal Process Mag 35(1):53-65.
  9. Dehouche N., and K. Dehouche 2023, What's in a text-to-image prompt? The potential of stable diffusion in visual arts education. Heliyon 9(6):e16757. doi:10.1016/j.heliyon.2023.e16757
  10. Derevyanko N., and O. Zalevska 2023, Comparative analysis of neural networks Midjourney, Stable Diffusion, and DALL-E and ways of their implementation in the educational process of students of design specialities. Scientific Bulletin of Mukachevo State University Series "Pedagogy and Psychology" 9(3):36-44. doi:10.52534/msu-pp3.2023.36
  11. Dhariwal P., and A. Nichol 2021, Diffusion models beat GANs on image synthesis. Adv Neural Inf Process Syst 34:8780-8794. doi:10.48550/arXiv.2105.05233
  12. Farooq M., A. Rehman, and M. Pisante 2019, Sustainable agriculture and food security. Innovations in Sustainable Agriculture, pp 3-24. Springer International Publishing. doi:10.1007/978-3-030-23169-9_1
  13. Feldmann M.J., M.A. Hardigan, R.A. Famula, C.M. Lopez, A. Tabb, G.S. Cole, and S.J. Knapp 2020, Multi-dimensional machine learning approaches for fruit shape phenotyping in strawberry. GigaScience 9:1-17. doi:10.1093/gigascience/giaa030
  14. Fjelland R. 2020, Why general artificial intelligence will not be realized. Humanit Soc Sci Commun 7(1):1-9. doi:10.1057/s41599-020-0494-4
  15. Gehan M.A., N. Falgren, A. Abbasi, J.C. Berry, S.T. Callen, L. Chavez, A.N. Doust, M.J. Feldman, K.B. Gilbert, J.G. Hodge, and J.S. Hoyer 2017, PlantCV v2: image analysis software for high-throughput plant phenotyping. PeerJ 5:e4088. doi:10.7717/peerj.4088
  16. Goertzel B., and C. Pennachin (Eds.) 2007, Artificial General Intelligence. Springer Berlin Heidelberg. doi:10.1007/978-3-540-68677-4
  17. Goodfellow I., J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio 2020, Generative adversarial networks. Commun ACM 63(11):139-144.
  18. He K., X. Zhang, S. Ren, and J. Sun 2016, Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770-778.
  19. Huang H., P.S. Yu, and C. Wang 2018, An introduction to image synthesis with generative adversarial nets. arXiv preprint arXiv:1803.04469. https://doi.org/10.48550/arXiv.1803.04469
  20. Huang Z., F. Bianchi, M. Yuksekgonul, T.J. Montine, and J. Zou 2023, A visual-language foundation model for pathology image analysis using medical twitter. Nat Med 29(9):2307-2316.
  21. Jie P., X. Shan, and J. Chung 2023, Comparative analysis of AI painting using [Midjourney] and [Stable Diffusion]-a case study on character drawing. Int J Adv Culture Technol 11(2):403-408. doi:10.17703/IJACT.2023.11.2.403
  22. Khalifa N.E., M. Loey, and S. Mirjalili 2022, A comprehensive survey of recent trends in deep learning for digital images augmentation. Artif Intell Rev 55(3):2351-2377.
  23. Kim D., D. Joo, and J. Kim 2020, Tivgan: Text to image to video generation with step-by-step evolutionary generator. IEEE Access 8:153113-153122. doi:10.1109/ACCESS.2020.3017881
  24. Kim J.G., I.B. Lee, K.S. Yoon, T.H. Ha, R.W. Kim, U.H. Yeo, and S.Y. Lee 2018, A study on the trends of virtual reality application technology for agricultural education. J Bio-Env Con 27(2):147-157. doi:10.12791/KSBEC.2018.27.2.147
  25. Kwon D.H. 2024, Analysis of prompt elements and use cases in image-generating AI: focusing on Midjourney, Stable Diffusion, Firefly, DALL.E. J Digit Contents Soc 25(2):341-354. doi:10.9728/dcs.2024.25.2.341
  26. LeCun Y., Y. Bengio, and G. Hinton 2015, Deep learning. Nature 521(7553):436-444. doi:10.1038/nature14539
  27. Liu J., Y. Zhou, Y. Li, Y. Li, S. Hong, Q. Li, X. Liu, M. Lu, and X. Wang 2023, Exploring the integration of digital twin and generative AI in agriculture. 2023 15th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC), pp 223-228. doi:10.1109/IHMSC58761.2023.00059
  28. Liu V., and L.B. Chilton 2022, Design guidelines for prompt engineering text-to-image generative models. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1-23.
  29. Liu Y., K. Zhang, Y. Li, Z. Yan, C. Gao, R. Chen, Z. Yuan, Y. Huang, H. Sun, J. Gao, L. He, and L. Sun 2024, Sora: a review on background, technology, limitations, and opportunities of large vision models. arXiv preprint arXiv:2402.17177. doi:10.48550/arXiv.2402.17177
  30. Lu Y., and S. Young 2020, A survey of public datasets for computer vision tasks in precision agriculture. Comput Electron Agric 178:105760.
  31. Lu Y., D. Chen, E. Olaniyi, and Y. Huang 2022, Generative adversarial networks (GANs) for image augmentation in agriculture: a systematic review. Comput Electron Agric 200:107208. doi:10.1016/j.compag.2022.107208
  32. Muller V.C., and N. Bostrom 2016, Future progress in artificial intelligence: A survey of expert opinion. Fundamental issues of artificial intelligence, pp 555-572. doi:10.1007/978-3-319-26485-1_33
  33. Oppenlaender J. 2023, A taxonomy of prompt modifiers for text-to-image generation. Behav Inf Technol 1-14. doi:10.1080/0144929X.2023.2286532
  34. Oppenlaender J., R. Linder, and J. Silvennoinen 2023, Prompting AI art: an investigation into the creative skill of prompt engineering. arXiv preprint arXiv:2303.13534. doi:10.48550/arXiv.2303.13534
  35. Or-El R., X. Luo, M. Shan, E. Shechtman, J.J. Park, and I. Kemelmacher-Shlizerman 2022, Stylesdf: High-resolution 3d-consistent image and geometry generation. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit, pp 13503-13513.
  36. Pavlichenko N., and D. Ustalov 2023, Best prompts for textto-image models and how to find them. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 2067-2071. doi:10.1145/3539618.3592000
  37. Plant R., G. Pettygrove, and W. Reinert 2000, Precision agriculture can increase profits and limit environmental impacts. Calif Agric 54(4):66-71. doi:10.3733/ca.v054n04p66
  38. Poole B., A. Jain, J.T. Barron, and B. Mildenhall 2022, Dream-fusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988. doi:10.48550/arXiv.2209.14988
  39. Radford A., J.W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger and I. Sutskever 2021, Learning transferable visual models from natural language supervision. In International conference on machine learning, pp 8748-8763. PMLR. doi:10.48550/arXiv.2103.00020
  40. Reviriego P., and E. Merino-Gomez 2022, Text to image generation: leaving no language behind. arXiv preprint arXiv:2208.09333. doi:10.48550/arXiv.2208.09333
  41. Rural Development Administration (RDA) 2013, Available via https://www.rda.go.kr:2360/ptoPtoFrmPrmnList.do?prgId=pto_farmprmnptoEntry
  42. Sapkota R., D. Ahmed, and M. Karkee 2024, Creating image datasets in agricultural environments using DALL. E: generative AI-powered large language model. arXiv preprint arXiv:2307.08789. doi:10.48550/arXiv.2307.08789
  43. Shorten C., and T.M. Khoshgoftaar 2019, A survey on image data augmentation for deep learning. J Big Data 6(1):1-48.
  44. Stockl A. 2023, Evaluating a synthetic image dataset generated with stable diffusion. In International Congress on Information and Communication Technology, pp 805-818. Singapore: Springer Nature Singapore. doi:10.48550/arXiv.2211.01777
  45. Vougioukas S.G. 2019, Agricultural robotics. Annu Rev Control Robot Auton Syst 2:365-392.
  46. Wakchaure M., B.K. Patle, and A.K. Mahindrakar 2023, Application of AI techniques and robotics in agriculture: a review. Artif Intell Life Sci 3:100057. doi:10.1016/j.ailsci.2023.100057
  47. Wasielewski A. 2023, Midjourney can't count": questions of representation and meaning for text-to-image generators. Interdiscip J Image Sci 37(1):71-82. Retrieved from https://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-510407 10407
  48. Wong S.C., A. Gatt, V. Stamatescu, and M.D. McDonnell 2016, Understanding data augmentation for classification: when to warp?. In 2016 International Conference on Digital Image Computing: Techniques and Applications, pp 1-6.
  49. Wu J., Y. Wang, T. Xue, X. Sun, B. Freeman, and J. Tenenbaum 2017, Marrnet: 3d shape reconstruction via 2.5 d sketches. Adv Neural Inf Process Syst 30. doi:10.48550/arXiv.1711.03129
  50. Yin H., Z. Zhang, and Y. Liu 2023, The exploration of integrating the Midjourney artificial intelligence generated content tool into design systems to direct designers towards future-oriented innovation. Systems 11(12):566. doi:10.3390/systems11120566
  51. Zhai X., A. Kolesnikov, N. Houlsby, and L. Beyer 2022, Scaling vision transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 12104-12113.
  52. Zhang Q., Y. Liu, C. Gong, Y. Chen, and H. Yu 2020, Applications of deep learning for dense scenes analysis in agriculture: A review. Sensors 20(5):1520.
  53. Zingaretti L.M., A. Monfort, and M. Perez-Enciso 2021, Automatic fruit morphology phenome and genetic analysis: an application in the octoploid strawberry. Plant Phenomics 2021:9812910. doi:10.34133/2021/9812910