DOI QR코드

DOI QR Code

CRFNet: Context ReFinement Network used for semantic segmentation

  • Taeghyun An (Artificial Intelligence Research Laboratory, Electronics and Telecommunications Research Institute) ;
  • Jungyu Kang (Artificial Intelligence Research Laboratory, Electronics and Telecommunications Research Institute) ;
  • Dooseop Choi (Artificial Intelligence Research Laboratory, Electronics and Telecommunications Research Institute) ;
  • Kyoung-Wook Min (Artificial Intelligence Research Laboratory, Electronics and Telecommunications Research Institute)
  • Received : 2023.01.18
  • Accepted : 2023.08.09
  • Published : 2023.10.20

Abstract

Recent semantic segmentation frameworks usually combine low-level and high-level context information to achieve improved performance. In addition, postlevel context information is also considered. In this study, we present a Context ReFinement Network (CRFNet) and its training method to improve the semantic predictions of segmentation models of the encoder-decoder structure. Our study is based on postprocessing, which directly considers the relationship between spatially neighboring pixels of a label map, such as Markov and conditional random fields. CRFNet comprises two modules: a refiner and a combiner that, respectively, refine the context information from the output features of the conventional semantic segmentation network model and combine the refined features with the intermediate features from the decoding process of the segmentation model to produce the final output. To train CRFNet to refine the semantic predictions more accurately, we proposed a sequential training scheme. Using various backbone networks (ENet, ERFNet, and HyperSeg), we extensively evaluated our model on three large-scale, real-world datasets to demonstrate the effectiveness of our approach.

Keywords

Acknowledgement

This research work was partly supported by an Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korean government (MSIT) (No. 2020-0-00002, Development of standard SW platform-based autonomous driving technology to solve social problems of mobility and safety for public transport-marginalized communities, contribution rate: 50%) and by an IITP grant funded by the Korean government (MSIT) (No. 2021-0-00891, Development of AI Service Integrated Framework for Autonomous Driving, contribution rate: 50%).

References

  1. J. Long, E. Shelhamer, and T. Darrell, Fully convolutional networks for semantic segmentation, (Proc. IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA), 2015. https://doi.org/10.1109/CVPR.2015.7298965
  2. N. Hyeonwoo, S. Hong, and B. Han, Learning deconvolution network for semantic segmentation, (IEEE International Conference on Computer Vision, Santiago, Chile) 2015. https://doi.org/10.1109/ICCV.2015.178
  3. C. Farabet, C. Couprie, L. Najman, and Y. LeCun, Learning hierarchical features for scene labeling, IEEE Trans. Pattern Anal. Mach. Intell. 35 (2012), no. 8, 1915-1929.
  4. H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, Pyramid scene parsing network, (Proc. IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA), 2017. https://doi.org/10.1109/CVPR.2017.660
  5. O. Marin and S. Sinisa, Efficient semantic segmentation with pyramidal fusion, Pattern Recognition 110 (2021). https://doi.org/10.1016/j.patcog.2020.107611
  6. Y. Liu, K. Chen, C. Liu, Z. Qin, Z. Luo, and J. Wang, Structured knowledge distillation for semantic segmentation, (IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA), 2019. https://doi.org/10.1109/CVPR.2019.00271
  7. R. Sun, X. Zhu, C. Wu, C. Huang, J. Shi, and L. Ma, Not all areas are equal: Transfer learning for semantic segmentation via hierarchical region selection, (IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA), 2019. https://doi.org/10.1109/CVPR.2019.00449
  8. A. Kirillov, K. He, R. Girshick, C. Rother, and P. Dollar, Panoptic segmentation, (IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA), 2019. https://doi.org/10.1109/CVPR.2019.00963
  9. Y. Zeng, Y. Zhuge, H. Lu, and L. Zhang, Joint learning of saliency detection and weakly supervised semantic segmentation, (IEEE/CVF International Conference on Computer Vision, Seoul, Rep. of Korea), 2019. https://doi.org/10.1109/ICCV.2019.00732
  10. P. Chen Liang-Chieh, Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs, IEEE Trans. Pattern Anal. Mach. Intell. 40 (2017), no. 5, 834-848.
  11. Y. Fisher and V. Koltun, Multi-scale context aggregation by dilated convolutions, arXiv preprint, 2015. https://doi.org/10.48550/arXiv.1511.07122
  12. G. Ghiasi and C. C. Fowlkes, Laplacian pyramidal reconstruction and refinement for semantic segmentation, (European Conference on Computer Vision, Amsterdam, The Netherlands), 2016, pp. 519-534.
  13. K. He, X. Zhang, S. Ren, and J. Sun, Spatial pyramid pooling in deep convolutional networks for visual recognition, IEEE Trans. Pattern Anal. Mach. Intell. 37 (2015), no. 9, 1904-1916. https://doi.org/10.1109/TPAMI.2015.2389824
  14. J. Lafferty, A. McCallum, and F. C. N. Pereira, Conditional random fields: probabilistic models for segmenting and labeling sequence data, (ICML '01: Proceedings of the Eighteenth International Conference on Machine Learning), 2021, pp. 282-289.
  15. J. Ji, R. Shi, S. Li, P. Chen, and Q. Miao, Encoder-decoder with cascaded CRFs for semantic segmentation, IEEE Trans. Circ. Syst. Video Technol. 31 (2020), no. 5, 1926-1938.
  16. M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, The cityscapes dataset for semantic urban scene understanding, (Proc. IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA) 2016. https://doi.org/10.1109/CVPR.2016.350
  17. G. J. Brostow, J. Fauqueur, and R. Cipolla, Semantic object classes in video: high-definition ground-truth database, Pattern Recognit. Lett. 30 (2009), no. 2, 88-97.
  18. L. Yiyi, J. Xie, and A. Geiger, KITTI-360: A novel dataset and benchmarks for urban scene understanding in 2D and 3D, IEEE Trans. Pattern Anal. Mach. Intell. 45 (2022), no. 3, 3292-3310.
  19. O. Ronneberger, P. Fischer, and T. Brox, U-net: convolutional networks for biomedical image segmentation, (International Conference on Medical Image Computing and Computer-Assisted Interventions, Munich, Germany), 2015, pp. 234-241.
  20. L. C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, Encoder-decoder with atrous separable convolution for semantic image segmentation, (European Conference on Computer Vision (ECCV), Munich, Germany), 2018, pp. 833-851.
  21. L.-C. Chen, Y. Yang, J. Wang, W. Xu, and A. L. Yuille, Attention to scale: scale-aware semantic image segmentation, (Proc. IEEE Conference on Computer Vision and Pattern Recognition, Las Begas, NV, USA), 2016. https://doi.org/10.1109/CVPR.2016.396
  22. F. Yu, V. Koltun, and T. Funkhouser, Dilated residual networks, (Proc. IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA), 2017. https://doi.org/10.1109/CVPR.2017.75
  23. M. Rohit and A. Valada, EfficientPS: efficient panoptic segmentation, Int. J. Comput. Vision 129 (2021), no. 5, 1551-1579. https://doi.org/10.1007/s11263-021-01445-z
  24. Y. Yuan, X. Chen, and J. Wang, Object-contextual representations for semantic segmentation, (European Conference on Computer Vision, Glasgow, UK), 2020, pp. 173-190.
  25. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, and A. N. Gomez, Attention is all you need, Adv. Neural Inf. Process. Syst. 30 (2017).
  26. H. Jie, S. Li, and G. Sun, Squeeze-and-excitation networks, (Proc. IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA), 2018. https://doi.org/10.1109/CVPR.2018.00745
  27. B. Cheng, L. C. Chen, Y. Wei, Y. Zhu, Z. Huang, J. Xiong, T. S. Huang, W. M. Hwu, and H. Shi, SPGNet: semantic prediction guidance for scene parsing, (IEEE/CVF International Conference on Computer Vision, Seoul, Rep. of Korea), 2019. https://doi.org/10.1109/ICCV.2019.00532
  28. Y. Nirkin, L. Wolf, and T. Hassner, Patch-wise hypernetwork for real-time semantic segmentation, (IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA), 2021. https://doi.org/10.1109/CVPR46437.2021.00405
  29. E. Romera, J. M. Alvarez, L. M. Bergasa, and R. Arroyo, ERFNet: efficient residual factorized ConvNet for real-time semantic segmentation, IEEE Trans. Intell. Trans.ort. Syst. 19 (2017), no. 1, 263-272.
  30. Y. Changqian, J. Wang, C. Peng, C. Gao, G. Yu, and N. Sang, BiSeNet: bilateral segmentation network for real-time semantic segmentation, (European Conference on Computer Vision (ECCV)), 2018, pp. 334-349.
  31. S. Gould, R. Fulton, and D. Koller, Decomposing a scene into geometrically and semantically consistent regions, (IEEE 12th International Conference on Computer Vision, Kyoto, Japan), 2009. https://doi.org/10.1109/ICCV.2009.5459211
  32. L. U. Ladicka, C. Russell, P. Kohli, and P. H. Torr, Associative hierarchical CRFs for object class image segmentation, (IEEE 12th International Conference on Computer Vision, Kyoto, Japan), 2009. https://doi.org/10.1109/ICCV.2009.5459248
  33. Z. Liu, X. Li, P. Luo, C. C. Loy, and X. Tang, Deep learning Markov random field for semantic segmentation, IEEE Trans. Pattern Anal. Mach. Intell 40 (2017), no. 8, 1814-1828.
  34. S. Zheng, S. Jayasumana, B. Romera-Paredes, V. Vineet, Z. Su, D. Du, C. Huang, and P. H. Torr, Conditional random fields as recurrent neural networks, (IEEE International Conference on Computer Vision, Santiago, Chile), 2015. https://doi.org/10.1109/ICCV.2015.179
  35. T. Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, and S. Belongie, Feature pyramid networks for object detection, (Proc. IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA), 2017. https://doi.org/10.1109/CVPR.2017.106
  36. G. Lin, A. Milan, C. Shen, and I. Reid, RefineNet: multi-path refinement networks for high-resolution semantic segmentation, (Proc. IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA), 2017. https://doi.org/10.1109/CVPR.2017.549
  37. F. Visin, M. Ciccone, A. Romero, K. Kastner, K. Cho, Y. Bengio, M. Matteucci, and A. Courville, ReSeg: a recurrent neural network-based model for semantic segmentation, (IEEE Conference on Computer Vision and Pattern Recognition Workshops, Las Vegas, NV, USA), 2016. https://doi.org/10.1109/CVPRW.2016.60
  38. W. Byeon, T. M. Breuel, F. Raue, and M. Liwicki, Scene labeling with LSTM recurrent neural networks, (Proc. IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA), 2015. https://doi.org/10.1109/CVPR.2015.7298977
  39. A. Newell, K. Yang, and J. Deng, Stacked hourglass networks for human pose estimation, (European Conference on Computer Vision, Amsterdam, The Netherlands), 2016, pp. 483-499.
  40. L. Ke, M. C. Chang, H. Qi, and S. Lyu, Multi-scale structure-aware network for human pose estimation, (Proc. European Conference on Computer Vision (ECCV), Munich, Germany), 2018, pp. 731-746.
  41. J. Fu, J. Liu, Y. Wang, J. Zhou, C. Wang, and H. Lu, Stacked deconvolutional network for semantic segmentation, IEEE Trans. Image Process. (2019), 1-1. https://doi.org/10.1109/TIP.2019.2895460
  42. B. Maxim, A. R. Triki, and M. B. Blaschko, The Lovasz-Softmax Loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks, (IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA), 2018. https://doi.org/10.1109/CVPR.2018.00464
  43. H. Pan, Y. Hong, W. Sun, and Y. Jia, Developed deep dual-resolution networks for real-time and accurate semantic segmentation of traffic scenes, IEEE Trans. Intell. Trans.ort. Syst. 24 (2022), no. 3, 3448-3460.
  44. A. Paszke, A. Chaurasia, S. Kim, and E. Culurciello, ENet: a deep neural network architecture for real-time semantic segmentation, arXiv preprint, 2016. https://doi.org/10.48550/arXiv.1606.02147
  45. J. Kang, S. J. Han, N. Kim, and K. W. Min, ETLi: efficiently annotated traffic LiDAR dataset using incremental and suggestive annotations, ETRI J. 43 (2021), no. 4, 630-639. https://doi.org/10.4218/etrij.2021-0055
  46. J. Behley, M. Garbade, A. Milioto, J. Quenzel, S. Behnke, C. Stachniss, and J. Gall, SemanticKITTI: a dataset for semantic scene understanding of LiDAR sequences, (IEEE/CVF International Conference on Computer Vision, Seoul, Rep of Korea), 2019. https://doi.org/10.1109/ICCV.2019.00939
  47. P. Kingma Diederik and J. Ba, Adam: a method for stochastic optimization, arXiv preprint, 2014. https://doi.org/10.48550/arXiv.1412.6980