DOI QR코드

DOI QR Code

An Efficient Monocular Depth Prediction Network Using Coordinate Attention and Feature Fusion

  • Huihui, Xu (School of Computer Science and Technology, Shandong Jianzhu University) ;
  • Fei ,Li (School of Information and Electric Engineering, Shandong Jianzhu University)
  • Received : 2022.05.30
  • Accepted : 2022.08.28
  • Published : 2022.12.31

Abstract

The recovery of reasonable depth information from different scenes is a popular topic in the field of computer vision. For generating depth maps with better details, we present an efficacious monocular depth prediction framework with coordinate attention and feature fusion. Specifically, the proposed framework contains attention, multi-scale and feature fusion modules. The attention module improves features based on coordinate attention to enhance the predicted effect, whereas the multi-scale module integrates useful low- and high-level contextual features with higher resolution. Moreover, we developed a feature fusion module to combine the heterogeneous features to generate high-quality depth outputs. We also designed a hybrid loss function that measures prediction errors from the perspective of depth and scale-invariant gradients, which contribute to preserving rich details. We conducted the experiments on public RGBD datasets, and the evaluation results show that the proposed scheme can considerably enhance the accuracy of depth prediction, achieving 0.051 for log10 and 0.992 for δ<1.253 on the NYUv2 dataset.

Keywords

Acknowledgement

This research was supported in part by the Opening Fund of Shandong Provincial Key Laboratory of Network based Intelligent Computing.

References

  1. W. Zhang, B. Han, and P. Hui, "SEAR: scaling experiences in multi-user augmented reality," IEEE Transactions on Visualization and Computer Graphics, vol. 28, no. 5, pp. 1982-1992, 2022. https://doi.org/10.1109/TVCG.2022.3150467
  2. S. C. Hsia, S. H. Wang, and H. C. Tsai, "Real-time 2D to 3D image conversion algorithm and VLSI architecture for natural scene," Circuits, Systems, and Signal Processing, vol. 41, pp. 4455-4478, 2022. https://doi.org/10.1007/s00034-022-01983-y
  3. J. H. Lee and C. S. Kim, "Monocular depth estimation using relative depth maps," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, 2019, pp. 9729-9738.
  4. H. Jiang and R. Huang, "High quality monocular depth estimation via a multi-scale network and a detail-preserving objective," in Proceedings of 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 2019, pp. 1920-1924.
  5. L. Wang, J. Zhang, O. Wang, Z. Lin, and H. Lu, "SDC-Depth: semantic divide-and-conquer network for monocular depth estimation," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, 2020, pp. 538-547.
  6. J. Hu, M. Ozay, Y. Zhang, and T. Okatani, "Revisiting single image depth estimation: toward higher resolution maps with accurate object boundaries," in Proceedings of 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, 2019, pp. 1043-1051.
  7. Q. Hou, D. Zhou, and J. Feng, "Coordinate attention for efficient mobile network design," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Virtual Event, 2021, pp. 13713-13722.
  8. X. Tu, C. Xu, S. Liu, R. Li, G. Xie, J. Huang, and L. T. Yang, "Efficient monocular depth estimation for edge devices in Internet of Things," IEEE Transactions on Industrial Informatics, vol. 17, no. 4, pp. 2821-2832, 2020.
  9. X. Ye, S. Chen, and R. Xu, "DPNet: detail-preserving network for high quality monocular depth estimation," Pattern Recognition, vol. 109, article no. 107578, 2021. https://doi.org/10.1016/j.patcog.2020.107578
  10. M. Pei, "MSFNet: multi-scale features network for monocular depth estimation," 2021 [Online]. Available: https://arxiv.org/abs/2107.06445.