CAttNet: A Compound Attention Network for Depth Estimation of Light Field Images

Dingkang Hua;Qian Zhang;Wan Liao;Bin Wang;Tao Yan;

doi:10.3745/JIPS.02.0201

Journal of Information Processing Systems

Volume 19 Issue 4
/
Pages.483-497
/
2023
/
1976-913X(pISSN)
/
2092-805X(eISSN)

Korea Information Processing Society (한국정보처리학회)

DOI QR Code

CAttNet: A Compound Attention Network for Depth Estimation of Light Field Images

Dingkang Hua (School of Information, Mechanical and Electrical Engineering, Shanghai Normal University) ;
Qian Zhang (School of Information, Mechanical and Electrical Engineering, Shanghai Normal University) ;
Wan Liao (School of Information, Mechanical and Electrical Engineering, Shanghai Normal University) ;
Bin Wang (School of Information, Mechanical and Electrical Engineering, Shanghai Normal University) ;
Tao Yan (School of Mechanical, Electrical & Information Engineering, Putian University)

Received : 2022.08.10
Accepted : 2023.02.26
Published : 2023.08.31

https://doi.org/10.3745/JIPS.02.0201 Citation PDF

Download PDF

⟨ Previous Next ⟩

Abstract

Depth estimation is one of the most complicated and difficult problems to deal with in the light field. In this paper, a compound attention convolutional neural network (CAttNet) is proposed to extract depth maps from light field images. To make more effective use of the sub-aperture images (SAIs) of light field and reduce the redundancy in SAIs, we use a compound attention mechanism to weigh the channel and space of the feature map after extracting the primary features, so it can more efficiently select the required view and the important area within the view. We modified various layers of feature extraction to make it more efficient and useful to extract features without adding parameters. By exploring the characteristics of light field, we increased the network depth and optimized the network structure to reduce the adverse impact of this change. CAttNet can efficiently utilize different SAIs correlations and features to generate a high-quality light field depth map. The experimental results show that CAttNet has advantages in both accuracy and time.

Keywords

Acknowledgement

This research was jointly sponsored by the Natural Science Foundation of Fujian Province (No. 2019J01816), the Putian Science and Technology Bureau (No. 2021G2001-8) and New Century Excellent Talents in Fujian Province University (No. 2018JY7RC(PU), Yantao).

References

A. C. Tsai, Y. Y. Ou, W. C. Wu, and J. F. Wang, "Occlusion resistant face detection and recognition system," in Proceedings of 2020 8th International Conference on Orange Technology (ICOT), Daegu, South Korea, 2020, pp. 1-4. https://doi.org/10.1109/ICOT51877.2020.9468767
J. Liu, "Survey of the image recognition based on deep learning network for autonomous driving car," in Proceedings of 2020 5th International Conference on Information Science, Computer Technology and Transportation (ISCTT), Shenyang, China, 2020, pp. 1-6. https://doi.org/10.1109/ISCTT51595.2020.00007
X. F. Han, H. Laga, and M. Bennamoun, "Image-based 3D object reconstruction: state-of-the-art and trends in the deep learning era," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 5, pp. 1578-1604, 2021. https://doi.org/10.1109/TPAMI.2019.2954885
H. C. Yang, P. H. Chen, K. W. Chen, C. Y. Lee, and Y. S. Chen, "FADE: feature aggregation for depth estimation with multi-view stereo," IEEE Transactions on Image Processing, vol. 29, pp. 6590-6600, 2020. https://doi.org/10.1109/TIP.2020.2991883
Y. Zhang, H. Lv, Y. Liu, H. Wang, X. Wang, Q. Huang, X. Xiang, and Q. Dai, "Light-field depth estimation via epipolar plane image analysis and locally linear embedding," IEEE Transactions on Circuits and Systems for Video Technology, vol. 27, no. 4, pp. 739-747, 2017. https://doi.org/10.1109/TCSVT.2016.2555778
A. Ak and P. Le-Callet, "Investigating epipolar plane image representations for objective quality evaluation of light field images," in Proceedings of 2019 8th European Workshop on Visual Information Processing (EUVIP), Roma, Italy, 2019, pp. 135-139. https://doi.org/10.1109/EUVIP47703.2019.8946194
W. Zhou, E. Zhou, Y. Yan, L. Lin, and A. Lumsdaine, "Learning depth cues from focal stack for light field depth estimation," in Proceedings of 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 2019, pp. 1074-1078. https://doi.org/10.1109/ICIP.2019.8804270
C. Shin, H. G. Jeon, Y. Yoon, I. S. Kweon, and S. J. Kim, "EpiNet: a fully-convolutional neural network using epipolar geometry for depth from light field images," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, 2018, pp. 4748-4757. https://doi.org/10.1109/CVPR.2018.00499
K. Honauer, O. Johannsen, D. Kondermann, and B. Goldluecke, "A dataset and evaluation methodology for depth estimation on 4D light fields," in Computer Vision-ACCV 2016. Cham, Switzerland: Springer, 2017, pp. 19-34. https://doi.org/10.1007/978-3-319-54187-7_2
S. Woo, J. Park, J. Y. Lee, and I. S. Kweon, "CBAM: convolutional block attention module," in Computer Vision-ECCV 2018. Cham, Switzerland: Springer, 2018, pp. 3-19. https://doi.org/10.1007/978-3-030-01234-2_1
K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, 2016, pp. 770-778. https://doi.org/10.1109/CVPR.2016.90
Z. Yu, X. Guo, H. Lin, A. Lumsdaine, and J. Yu, "Line assisted light field triangulation and stereo matching," in Proceedings of the IEEE International Conference on Computer Vision, Sydney, Australia, 2013, pp. 2792-2799. https://doi.org/10.1109/ICCV.2013.347
S. Heber and T. Pock, "Shape from light field meets robust PCA," in Computer Vision-ECCV 2014. Cham, Switzerland: Springer, 2014, pp. 751-767. https://doi.org/10.1007/978-3-319-10599-4_48
J. Chen, J. Hou, Y. Ni, and L. P. Chau, "Accurate light field depth estimation with superpixel regularization over partially occluded regions," IEEE Transactions on Image Processing, vol. 27, no. 10, pp. 4889-4900, 2018. https://doi.org/10.1109/TIP.2018.2839524
M. W. Tao, S. Hadap, J. Malik, and R. Ramamoorthi, "Depth from combining defocus and correspondence using light-field cameras," in Proceedings of the IEEE International Conference on Computer Vision, Sydney, Australia, 2013, pp. 673-680. https://doi.org/10.1109/ICCV.2013.89
S. Wanner and B. Goldluecke, "Variational light field analysis for disparity estimation and super-resolution," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 3, pp. 606-619, 2014. https://doi.org/10.1109/TPAMI.2013.147
H. Sheng, P. Zhao, S. Zhang, J. Zhang, and D. Yang, "Occlusion-aware depth estimation for light field using multi-orientation EPIs," Pattern Recognition, vol. 74, pp. 587-599, 2018. https://doi.org/10.1016/j.patcog.2017.09.010
J. Li and X. Jin, "EPI-neighborhood distribution based light field depth estimation," in Proceedings of 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 2020, pp. 2003-2007. https://doi.org/10.1109/ICASSP40776.2020.9053664
Y. J. Tsai, Y. L. Liu, M. Ouhyoung, and Y. Y. Chuang, "Attention-based view selection networks for lightfield disparity estimation," Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 07, pp. 12095-12103, 2020. https://doi.org/10.1609/aaai.v34i07.6888
Y. Li, L. Zhang, Q. Wang, and G. Lafruit, "MANet: multi-scale aggregated network for light field depth estimation," in Proceedings of 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 2020, pp. 1998-2002. https://doi.org/10.1109/ICASSP40776.2020.9053532
Y. Li, Q. Wang, L. Zhang, and G. Lafruit, "A lightweight depth estimation network for wide-baseline light fields," IEEE Transactions on Image Processing, 30, 2288-2300, 2021. https://doi.org/10.1109/TIP.2021.3051761
G. Klambauer, T. Unterthiner, A. Mayr, and S. Hochreiter, "Self-normalizing neural networks," Advances in Neural Information Processing Systems, vol. 30, pp. 971-980, 2017.
H. Schilling, M. Diebold, C. Rother, and B. Jahne, "Trust your model: light field depth estimation with inline occlusion handling," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 4530-4538. https://doi.org/10.1109/CVPR.2018.00476
Y. Luo, W. Zhou, J. Fang, L. Liang, H. Zhang, and G. Dai, "EPI-patch based convolutional neural network for depth estimation on 4D light field," in Neural Information Processing. Cham, Switzerland: Springer, 2017, pp. 642-652. https://doi.org/10.1007/978-3-319-70090-8_65

Journal of Information Processing Systems

CAttNet: A Compound Attention Network for Depth Estimation of Light Field Images

Abstract

Keywords

Acknowledgement

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)