Lightweight Attention-Guided Network with Frequency Domain Reconstruction for High Dynamic Range Image Fusion

  • Park, Jae Hyun (Dept. of Electrical and Computer Eng., Interdisciplinary Program in Artificial Intelligence, Institute of New Media and Communications, Seoul National University) ;
  • Lee, Keuntek (Dept. of Electrical and Computer Eng., Interdisciplinary Program in Artificial Intelligence, Institute of New Media and Communications, Seoul National University) ;
  • Cho, Nam Ik (Dept. of Electrical and Computer Eng., Interdisciplinary Program in Artificial Intelligence, Institute of New Media and Communications, Seoul National University)
  • 박재현 (서울대학교 전기정보공학부, 인공지능협동과정, 뉴미디어통신공동연구소) ;
  • 이근택 (서울대학교 전기정보공학부, 인공지능협동과정, 뉴미디어통신공동연구소) ;
  • 조남익 (서울대학교 전기정보공학부, 인공지능협동과정, 뉴미디어통신공동연구소)
  • Published : 2022.06.20

Abstract

Multi-exposure high dynamic range (HDR) image reconstruction, the task of reconstructing an HDR image from multiple low dynamic range (LDR) images in a dynamic scene, often produces ghosting artifacts caused by camera motion and moving objects and also cannot deal with washed-out regions due to over or under-exposures. While there has been many deep-learning-based methods with motion estimation to alleviate these problems, they still have limitations for severely moving scenes. They also require large parameter counts, especially in the case of state-of-the-art methods that employ attention modules. To address these issues, we propose a frequency domain approach based on the idea that the transform domain coefficients inherently involve the global information from whole image pixels to cope with large motions. Specifically we adopt Residual Fast Fourier Transform (RFFT) blocks, which allows for global interactions of pixels. Moreover, we also employ Depthwise Overparametrized convolution (DO-conv) blocks, a convolution in which each input channel is convolved with its own 2D kernel, for faster convergence and performance gains. We call this LFFNet (Lightweight Frequency Fusion Network), and experiments on the benchmarks show reduced ghosting artifacts and improved performance up to 0.6dB tonemapped PSNR compared to recent state-of-the-art methods. Our architecture also requires fewer parameters and converges faster in training.

Keywords

Acknowledgement

This work was supported by the National Research Foundation of Korea (NRF), grant funded by the Korea government (MSIT) (2021R1A2C2007220). Additionally this work was supported by Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) [NO.2021-0-01343, Artificial Intelligence Graduate School Program (Seoul National University)] Finally, this work was supported by the BK21 FOUR program of the Education and Research Program for Future ICT Pioneers, Seoul National University in 2022