DOI QR코드

DOI QR Code

Injection of Cultural-based Subjects into Stable Diffusion Image Generative Model

  • Amirah Alharbi (Department of Computer Science and Artificial Intelligence, College of Computing, Umm Alqura University) ;
  • Reem Alluhibi (Department of Computer Science and Artificial Intelligence, College of Computing, Umm Alqura University) ;
  • Maryam Saif (Department of Computer Science and Artificial Intelligence, College of Computing, Umm Alqura University) ;
  • Nada Altalhi (Department of Computer Science and Artificial Intelligence, College of Computing, Umm Alqura University) ;
  • Yara Alharthi (Department of Computer Science and Artificial Intelligence, College of Computing, Umm Alqura University)
  • 투고 : 2024.02.05
  • 발행 : 2024.02.29

초록

While text-to-image models have made remarkable progress in image synthesis, certain models, particularly generative diffusion models, have exhibited a noticeable bias to- wards generating images related to the culture of some developing countries. This paper introduces an empirical investigation aimed at mitigating the bias of image generative model. We achieve this by incorporating symbols representing Saudi culture into a stable diffusion model using the Dreambooth technique. CLIP score metric is used to assess the outcomes in this study. This paper also explores the impact of varying parameters for instance the quantity of training images and the learning rate. The findings reveal a substantial reduction in bias-related concerns and propose an innovative metric for evaluating cultural relevance.

키워드

과제정보

Dr. Alharbi would like to thank the Deanship of Scientific Research at Umm Al-Qura University for supporting this work by Grant Code: (23UQU43101400DSR005). She also would like to express her gratitude for support this research ID:4401095348.

참고문헌

  1. D. E. Rumelhart, G. E. Hinton and R. J. Williams, Learning representations by back- propagating errors, Nature 323 (1986) 533-536. https://doi.org/10.1038/323533a0
  2. H. L. Jungang Xu and S. Zhou, An overview of deep generative models, IETE Tech- nical Review 32(2) (2015) 131-139. https://doi.org/10.1080/02564602.2014.987328
  3. A. Razavi, A. Van den Oord and O. Vinyals, Generating diverse high-fidelity images with vq-vae-2, Advances in neural information processing systems 32 (2019).
  4. T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell et al., Language models are few-shot learners, Advances in neural information processing systems 33 (2020) 1877-1901.
  5. P. Dhariwal, H. Jun, C. Payne, J. W. Kim, A. Radford and I. Sutskever, Jukebox: A generative model for music, arXiv preprint arXiv:2005.00341 (2020).
  6. M. Kumar, M. Babaeizadeh, D. Erhan, C. Finn, S. Levine, L. Dinh and D. Kingma, Videoflow: A flow-based generative model for video, arXiv preprint arXiv:1903.01434 2(5) (2019) p. 3.
  7. T. Marwah, G. Mittal and V. N. Balasubramanian, Attentive semantic video genera-tion using captions, in Proceedings of the IEEE international conference on computer vision2017, pp. 1426-1434.
  8. E. I. Nikolaev, Opportunities and challenges in deep generative models, in CEUR Workshop Proceedings2018, pp. 326-329.
  9. I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville and Y. Bengio, Generative adversarial networks, Communications of the ACM 63(11) (2020) 139-144. https://doi.org/10.1145/3422622
  10. M. Elasri, O. Elharrouss, S. Al-ma'adeed and H. Tairi, Image generation: A review, Neural Processing Letters 54 (03 2022).
  11. J. Ho, A. Jain and P. Abbeel, Denoising diffusion probabilistic models, Advances in neural information processing systems 33 (2020) 6840-6851.
  12. P. Dhariwal and A. Nichol, Diffusion models beat gans on image synthesis, Advances in neural information processing systems 34 (2021) 8780-8794.
  13. R. Rombach, A. Blattmann, D. Lorenz, P. Esser and B. Ommer, High-resolution image synthesis with latent diffusion models, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)June 2022, pp. 10684-10695.