This work was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MSIT) (No. 2022R1A2C200686411).
Multiview stereo (MVS) 3D reconstruction of a scene from images is a fundamental computer vision problem that has been thoroughly researched in recent times. Traditionally, MVS approaches create dense correspondences by constructing regularizations and hand-crafted similarity metrics. Although these techniques have achieved excellent results in the best Lambertian conditions, traditional MVS algorithms still contain a lot of artifacts. Therefore, in this study, we suggest using a transformer network to accelerate the MVS reconstruction. The network is based on a transformer model and can extract dense features with 3D consistency and global context, which are necessary to provide accurate matching for MVS.
This work was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MSIT) (No. 2022R1A2C200686411).