DOI QR코드

DOI QR Code

Performance Improvement of Image-to-Image Translation with RAPGAN and RRDB

RAPGAN와 RRDB를 이용한 Image-to-Image Translation의 성능 개선

  • Dongsik Yoon (Department of Electronics and Electrical Engineering, Korea University) ;
  • Noyoon Kwak (Division of Computer Engineering, Baekseok University)
  • 윤동식 (고려대학교 전기전자공학과) ;
  • 곽노윤 (백석대학교 컴퓨터공학부)
  • Received : 2022.11.16
  • Accepted : 2022.12.30
  • Published : 2023.02.28

Abstract

This paper is related to performance improvement of Image-to-Image translation using Relativistic Average Patch GAN and Residual in Residual Dense Block. The purpose of this paper is to improve performance through technical improvements in three aspects to compensate for the shortcomings of the previous pix2pix, a type of Image-to-Image translation. First, unlike the previous pix2pix constructor, it enables deeper learning by using Residual in Residual Block in the part of encoding the input image. Second, since we use a loss function based on Relativistic Average Patch GAN to predict how real the original image is compared to the generated image, both of these images affect adversarial generative learning. Finally, the generator is pre-trained to prevent the discriminator from being learned prematurely. According to the proposed method, it was possible to generate images superior to the previous pix2pix by more than 13% on average at the aspect of FID.

본 논문은 RAPGAN(Relativistic Average Patch GAN)과 RRDB(Residual in Residual Dense Block)을 이용한 Image-to-Image 변환의 성능 개선에 관한 것이다. 본 논문은 Image-to-Image 변환의 일종인 기존의 pix2pix의 결점을 보완하기 위해 세 가지 측면의 기술적 개선을 통한 성능 향상을 도모함에 그 목적이 있다. 첫째, 기존의 pix2pix 생성자와 달리 입력 이미지를 인코딩하는 부분에서 RRDB를 이용함으로써 더욱 더 깊은 학습을 가능하게 한다. 둘째, RAPGAN 기반의 손실함수를 사용해 원본 이미지가 생성된 이미지에 비해 얼마나 진짜 같은지를 예측하기 때문에 이 두 이미지가 모두 적대적 생성 학습에 영향을 미치게 된다. 마지막으로, 생성자를 사전학습시켜 판별자가 조기에 학습되는 것을 억제하도록 조치한다. 제안된 방법에 따르면, FID 측면에서 기존의 pix2pix보다 평균 13% 이상의 우수한 이미지를 생성할 수 있었다.

Keywords

Acknowledgement

본 논문은 2022년도 교육부의 재원으로 한국연구재단의 지원을 받아 수행된 지자체-대학 협력기반 지역혁신 사업의 연구과제(2021RIS-004)로 수행되었음.

References

  1. I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, "Generative Adversarial Nets," Proceedings of Conference on Neural Information Processing Systems, 2014.
  2. P. Isola, J. Zhu, T. Zhou, and A. A. Efros, "Image-to-Image Translation with Conditional Adversarial Networks," Proceedings of Conference on Computer Vision and Pattern Recognition, 2017.
  3. M. Mirza and S. Osindero, "Conditional Generative Adversarial Nets," arXiv: 1411.1784, Nov. 2014.
  4. T. Wang, M. Liu, J. Zhu, A. Tao, J. Kautz, and B. Catanzaro, "High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs," Proceedings of Conference on Computer Vision and Pattern Recognition, 2018
  5. X. Wang, K. Yu, S. Wum J. Gu, Y. Liu, C. Dong, Y. Qiao, and C. C. Loy, "ESRGAN: Enhanced Super-resolution Generative Adversarial Networks," Proceedings of European Conference on Computer Vision, pp.63-79, 2018.
  6. D. Bashkirova, B. Usman, and K. Saenko, "Adversarial Self-Defense for Cycle-Consistent GANs," Proceedings of Conference on Neural Information Processing Systems, 2019.
  7. T. Bepler, E. D. Zhong, K. Kelley, E. Brignole, and B. Berger, "Explicitly Disentangling Image Content from Translation and Rotation with Spatial-VAE," Proceedings of Conference on Neural Information Processing Systems, 2019.
  8. Y. Alharbi, N. Smith, and P. Wonka, "Latent Filter Scaling for Multimodal Unsupervised Image-to-Image Translation," Proceedings of Conference on Computer Vision and Pattern Recognition, 2019.
  9. Tycho F.A. van der Ouderaa and D. E. Worrall, "Reversible GANs for Memory-efficient Image-to-Image Translation," Proceedings of Conference on Computer Vision and Pattern Recognition, 2019.
  10. K. Frans and C. Cheng, "Unsupervised Image to Sequence Translation with Canvas-Drawer Networks," arXiv preprint arXiv:1809.08340, 2018.
  11. Z. Murez, S. Kolouri, D. Kriegman, R. Ramamoorthi, and K. Kim, "Image to Image Translation for Domain Adaptation," Proceedings of Conference on Computer Vision and Pattern Recognition, 2018.
  12. J. Zhu, R. Zhang, D. Pathak, T. Darrell, A. A. Efros, O. Wang, and E. Shechtman, "Toward Multimodal Image-to-Image Translation," Proceedings of Conference on Neural Information Processing Systems, 2017.
  13. M. Liu, T. Breuel, and J. Kautz, "Unsupervised Image-to-Image Translation Networks," Proceedings of Conference on Neural Information Processing Systems, 2017.
  14. B. Lim, S. Son, H. Kim, S. Nah, and K. M. Lee, "Enhanced Deep Residual Networks for Single Image Super-resolution," Proceedings of Conference on Computer Vision and Pattern Recognition, 2017
  15. Y. Zhang, Y. Tian, Y. Kong, B. Zhong, and Y. Fu, "Residual Dense Network for Image Super-resolution," Proceedings of Conference on Computer Vision and Pattern Recognition, 2018.
  16. A. Jolicoeur-Martineau, "The Relativistic Discriminator: A Key Element Missing from Standard GAN," arXiv preprint arXiv:1807.00734, 2018.