[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.3745/JIPS.02.0187

An Efficient Monocular Depth Prediction Network Using Coordinate Attention and Feature Fusion

Huihui, Xu (School of Computer Science and Technology, Shandong Jianzhu University)
Fei ,Li (School of Information and Electric Engineering, Shandong Jianzhu University)

Publication Information

Journal of Information Processing Systems / v.18, no.6, 2022 , pp. 794-802 More about this Journal

Abstract

The recovery of reasonable depth information from different scenes is a popular topic in the field of computer vision. For generating depth maps with better details, we present an efficacious monocular depth prediction framework with coordinate attention and feature fusion. Specifically, the proposed framework contains attention, multi-scale and feature fusion modules. The attention module improves features based on coordinate attention to enhance the predicted effect, whereas the multi-scale module integrates useful low- and high-level contextual features with higher resolution. Moreover, we developed a feature fusion module to combine the heterogeneous features to generate high-quality depth outputs. We also designed a hybrid loss function that measures prediction errors from the perspective of depth and scale-invariant gradients, which contribute to preserving rich details. We conducted the experiments on public RGBD datasets, and the evaluation results show that the proposed scheme can considerably enhance the accuracy of depth prediction, achieving 0.051 for log10 and 0.992 for δ<1.25³ on the NYUv2 dataset.

Keywords

Attention Mechanism; Depth Prediction; Feature Fusion; Multi-Scale Features;

Citations & Related Records

Reference

1	W. Zhang, B. Han, and P. Hui, "SEAR: scaling experiences in multi-user augmented reality," IEEE Transactions on Visualization and Computer Graphics, vol. 28, no. 5, pp. 1982-1992, 2022. DOI
2	S. C. Hsia, S. H. Wang, and H. C. Tsai, "Real-time 2D to 3D image conversion algorithm and VLSI architecture for natural scene," Circuits, Systems, and Signal Processing, vol. 41, pp. 4455-4478, 2022. DOI
3	J. H. Lee and C. S. Kim, "Monocular depth estimation using relative depth maps," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, 2019, pp. 9729-9738.
4	H. Jiang and R. Huang, "High quality monocular depth estimation via a multi-scale network and a detail-preserving objective," in Proceedings of 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 2019, pp. 1920-1924.
5	L. Wang, J. Zhang, O. Wang, Z. Lin, and H. Lu, "SDC-Depth: semantic divide-and-conquer network for monocular depth estimation," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, 2020, pp. 538-547.
6	J. Hu, M. Ozay, Y. Zhang, and T. Okatani, "Revisiting single image depth estimation: toward higher resolution maps with accurate object boundaries," in Proceedings of 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, 2019, pp. 1043-1051.
7	Q. Hou, D. Zhou, and J. Feng, "Coordinate attention for efficient mobile network design," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Virtual Event, 2021, pp. 13713-13722.
8	X. Tu, C. Xu, S. Liu, R. Li, G. Xie, J. Huang, and L. T. Yang, "Efficient monocular depth estimation for edge devices in Internet of Things," IEEE Transactions on Industrial Informatics, vol. 17, no. 4, pp. 2821-2832, 2020.
9	X. Ye, S. Chen, and R. Xu, "DPNet: detail-preserving network for high quality monocular depth estimation," Pattern Recognition, vol. 109, article no. 107578, 2021. https://doi.org/10.1016/j.patcog.2020.107578 DOI
10	M. Pei, "MSFNet: multi-scale features network for monocular depth estimation," 2021 [Online]. Available: https://arxiv.org/abs/2107.06445.