DOI QR코드

DOI QR Code

Neural network with occlusion-resistant and reduced parameters in stereo images

스테레오 영상에서 폐색에 강인하고 축소된 파라미터를 갖는 신경망

  • Received : 2024.03.18
  • Accepted : 2024.03.25
  • Published : 2024.03.31

Abstract

This paper proposes a neural network that can reduce the number of parameters while reducing matching errors in occluded regions to increase the accuracy of depth maps in stereo matching. Stereo matching-based object recognition is utilized in many fields to more accurately recognize situations using images. When there are many objects in a complex image, an occluded area is generated due to overlap between objects and occlusion by background, thereby lowering the accuracy of the depth map. To solve this problem, existing research methods that create context information and combine it with the cost volume or RoIselect in the occluded area increase the complexity of neural networks, making it difficult to learn and expensive to implement. In this paper, we create a depthwise seperable neural network that enhances regional feature extraction before cost volume generation, reducing the number of parameters and proposing a neural network that is robust to occlusion errors. Compared to PSMNet, the proposed neural network reduced the number of parameters by 30%, improving 5.3% in color error and 3.6% in test loss.

본 논문은 스테레오 매칭에서 깊이 맵의 정확도를 높이기 위해 폐색 영역의 매칭 오류를 줄이면서 파라메터의 수를 줄일 수 있는 신경망을 제안한다. 이미지를 이용한 상황인식을 보다 정확하게 하기 위해 많은 분야에서 스테레오 매칭기반 객체인식이 활용된다. 복잡한 이미지에 많은 객체가 있을 때 객체간의 겹침과 배경에 의한 가림으로 폐색영역이 발생하여 깊이 맵의 정확도를 낮추게 된다. 이를 해결하기 위해 context 정보를 만들어 cost volume에 결합하거나 폐색영역에 RoI를 만들어 선택하는 기존 연구 방법은 신경망의 복잡도를 높여서 학습의 어려움과 구현에 비용이 많이 들게 된다. 본 논문에서는 cost volume 생성전에 지역적인 특징추출을 보다 강화하는 depthwise seperable 신경망을 만들어 파라메터의 수를 줄이고 폐색 오류에 강인한 신경망을 제안한다. 제안한 신경망은 PSMNet에 비하여 파라메터 수를 30% 줄이면서 페색오류에서 5.3%, 테스트 손실에서 3.6% 개선하였다.

Keywords

Acknowledgement

This Research was supported by Seokyeong University in 2024.

References

  1. Hyun-Jae Bae, Jin-Pyung Kim, Jee-Hyong Lee, "Semantic Occlusion Augmentation for Effective Human Pose Estimation," KIPS Trans. Softw. and Data Eng., Vol.11, No.12 pp.517-524, 2022. DOI: 10.3745/KTSDE.2022.11.12.517
  2. Kyoung-Ho Bae, Hong-Gi Park,"A Study on the Applicability of Deep Learning Algorithm for Detection and Resolving of Occlusion Area," Journal of the Korea Academia-Industrial cooperation Society, vol.20, No.11, pp.305-313, 2019. DOI: 10.5762/KAIS.2019.20.11.305
  3. Patricio Loncomilla, Javier Ruiz del Solar, "Improving SIFT-based object recognition for robot applications," ICIAP 2005, pp.1084-1092, 2005. DOI: DOI:10.1007/11553595_133
  4. H. Bay, T. Tuytelaars, L. V. Gool, "SURF: Speeded Up Robust Features," 9th European Conf. Computer Vision, pp.404-417, 2006. DOI: 10.1007/11744023_32
  5. E. Rublee, V. Rabaud, K. Konolige, "ORB: An efficient alternative to SIFT or SURF," Computer Vision and 2011 IEEE International Conference, pp.2564-2571, 2011. DOI: 10.1109/ICCV.2011.6126544
  6. Wade S. Fife, James K. Archibald, "Improved Census Transforms for Resource-Optimized Stereo Vision," IEEE Tr. on Circuits and Systems for Video Technology, Vol.23, Issue1, pp.60-73, 2013. DOI: 10.1109/TCSVT.2012.2203197
  7. J. Zbontar and Y. LeCun, "Computing the stereo matching cost with a convolutional neural network," in IEEE CVPR, 2015, pp.1592-1599. DOI: 10.1109/CVPR.2015.7298767
  8. Gregory Koch, Richard Zemel, Ruslan Salakhutdinov, "Siamese Neural Networks for One-shot Image Recognition," ICML deep learning workshop, 2015.
  9. Sergey Zagoruyko, Nikos Komodakis, "Learning to compare Image Patches via Convolutional Neural Networks,"Proc. of the IEEE CVPR 2015, pp.4353-4361, 2015. DOI: 10.48550/arXiv.1504.03641
  10. Jia-Ren Chang, Yong-Sheng Chen, "Pyramid Stereo Matching Network," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.5410-5418, 2018. DOI: 10.1109/CVPR.2018.00567
  11. Jaesung Choe, Kyungdon Joo, Francois Rameau, In So Kweon,"Stereo Object Matching Network," 2021 IEEE International Conference on Robotics and Automation (ICRA), 2021. DOI: 10.1109/ICRA48506.2021.9562027
  12. Ang Li, Zejian Yuan, "Occlusion Aware Stereo Matching via Cooperative Unsupervised Learning," 2018 Asian Conference on Computer Vision, Springer, pp.197-213, 2018. DOI: 10.1007/978-3-030-20876-9_13
  13. Zihua Liu, Yizhou Li, Masatoshi Okutomi, "Global Occlusion-Aware Transformer for robust Stereo Matching," Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp.3535-3544, 2024. DOI: 10.48550/arXiv.2312.14650
  14. Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 580-587, 2014. DOI: 10.1109/CVPR.2014.81
  15. https://www.cvlibs.net/datasets/kitti/
  16. https://lmb.informatik.uni-freiburg.de/resources/datasets/SceneFlowDatasets.en.html