Acknowledgement
This research work was partly supported by an Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korean government (MSIT) (No. 2020-0-00002, Development of standard SW platform-based autonomous driving technology to solve social problems of mobility and safety for public transport-marginalized communities, contribution rate: 50%) and by an IITP grant funded by the Korean government (MSIT) (No. 2021-0-00891, Development of AI Service Integrated Framework for Autonomous Driving, contribution rate: 50%).
References
- J. Long, E. Shelhamer, and T. Darrell, Fully convolutional networks for semantic segmentation, (Proc. IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA), 2015. https://doi.org/10.1109/CVPR.2015.7298965
- N. Hyeonwoo, S. Hong, and B. Han, Learning deconvolution network for semantic segmentation, (IEEE International Conference on Computer Vision, Santiago, Chile) 2015. https://doi.org/10.1109/ICCV.2015.178
- C. Farabet, C. Couprie, L. Najman, and Y. LeCun, Learning hierarchical features for scene labeling, IEEE Trans. Pattern Anal. Mach. Intell. 35 (2012), no. 8, 1915-1929.
- H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, Pyramid scene parsing network, (Proc. IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA), 2017. https://doi.org/10.1109/CVPR.2017.660
- O. Marin and S. Sinisa, Efficient semantic segmentation with pyramidal fusion, Pattern Recognition 110 (2021). https://doi.org/10.1016/j.patcog.2020.107611
- Y. Liu, K. Chen, C. Liu, Z. Qin, Z. Luo, and J. Wang, Structured knowledge distillation for semantic segmentation, (IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA), 2019. https://doi.org/10.1109/CVPR.2019.00271
- R. Sun, X. Zhu, C. Wu, C. Huang, J. Shi, and L. Ma, Not all areas are equal: Transfer learning for semantic segmentation via hierarchical region selection, (IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA), 2019. https://doi.org/10.1109/CVPR.2019.00449
- A. Kirillov, K. He, R. Girshick, C. Rother, and P. Dollar, Panoptic segmentation, (IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA), 2019. https://doi.org/10.1109/CVPR.2019.00963
- Y. Zeng, Y. Zhuge, H. Lu, and L. Zhang, Joint learning of saliency detection and weakly supervised semantic segmentation, (IEEE/CVF International Conference on Computer Vision, Seoul, Rep. of Korea), 2019. https://doi.org/10.1109/ICCV.2019.00732
- P. Chen Liang-Chieh, Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs, IEEE Trans. Pattern Anal. Mach. Intell. 40 (2017), no. 5, 834-848.
- Y. Fisher and V. Koltun, Multi-scale context aggregation by dilated convolutions, arXiv preprint, 2015. https://doi.org/10.48550/arXiv.1511.07122
- G. Ghiasi and C. C. Fowlkes, Laplacian pyramidal reconstruction and refinement for semantic segmentation, (European Conference on Computer Vision, Amsterdam, The Netherlands), 2016, pp. 519-534.
- K. He, X. Zhang, S. Ren, and J. Sun, Spatial pyramid pooling in deep convolutional networks for visual recognition, IEEE Trans. Pattern Anal. Mach. Intell. 37 (2015), no. 9, 1904-1916. https://doi.org/10.1109/TPAMI.2015.2389824
- J. Lafferty, A. McCallum, and F. C. N. Pereira, Conditional random fields: probabilistic models for segmenting and labeling sequence data, (ICML '01: Proceedings of the Eighteenth International Conference on Machine Learning), 2021, pp. 282-289.
- J. Ji, R. Shi, S. Li, P. Chen, and Q. Miao, Encoder-decoder with cascaded CRFs for semantic segmentation, IEEE Trans. Circ. Syst. Video Technol. 31 (2020), no. 5, 1926-1938.
- M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, The cityscapes dataset for semantic urban scene understanding, (Proc. IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA) 2016. https://doi.org/10.1109/CVPR.2016.350
- G. J. Brostow, J. Fauqueur, and R. Cipolla, Semantic object classes in video: high-definition ground-truth database, Pattern Recognit. Lett. 30 (2009), no. 2, 88-97.
- L. Yiyi, J. Xie, and A. Geiger, KITTI-360: A novel dataset and benchmarks for urban scene understanding in 2D and 3D, IEEE Trans. Pattern Anal. Mach. Intell. 45 (2022), no. 3, 3292-3310.
- O. Ronneberger, P. Fischer, and T. Brox, U-net: convolutional networks for biomedical image segmentation, (International Conference on Medical Image Computing and Computer-Assisted Interventions, Munich, Germany), 2015, pp. 234-241.
- L. C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, Encoder-decoder with atrous separable convolution for semantic image segmentation, (European Conference on Computer Vision (ECCV), Munich, Germany), 2018, pp. 833-851.
- L.-C. Chen, Y. Yang, J. Wang, W. Xu, and A. L. Yuille, Attention to scale: scale-aware semantic image segmentation, (Proc. IEEE Conference on Computer Vision and Pattern Recognition, Las Begas, NV, USA), 2016. https://doi.org/10.1109/CVPR.2016.396
- F. Yu, V. Koltun, and T. Funkhouser, Dilated residual networks, (Proc. IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA), 2017. https://doi.org/10.1109/CVPR.2017.75
- M. Rohit and A. Valada, EfficientPS: efficient panoptic segmentation, Int. J. Comput. Vision 129 (2021), no. 5, 1551-1579. https://doi.org/10.1007/s11263-021-01445-z
- Y. Yuan, X. Chen, and J. Wang, Object-contextual representations for semantic segmentation, (European Conference on Computer Vision, Glasgow, UK), 2020, pp. 173-190.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, and A. N. Gomez, Attention is all you need, Adv. Neural Inf. Process. Syst. 30 (2017).
- H. Jie, S. Li, and G. Sun, Squeeze-and-excitation networks, (Proc. IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA), 2018. https://doi.org/10.1109/CVPR.2018.00745
- B. Cheng, L. C. Chen, Y. Wei, Y. Zhu, Z. Huang, J. Xiong, T. S. Huang, W. M. Hwu, and H. Shi, SPGNet: semantic prediction guidance for scene parsing, (IEEE/CVF International Conference on Computer Vision, Seoul, Rep. of Korea), 2019. https://doi.org/10.1109/ICCV.2019.00532
- Y. Nirkin, L. Wolf, and T. Hassner, Patch-wise hypernetwork for real-time semantic segmentation, (IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA), 2021. https://doi.org/10.1109/CVPR46437.2021.00405
- E. Romera, J. M. Alvarez, L. M. Bergasa, and R. Arroyo, ERFNet: efficient residual factorized ConvNet for real-time semantic segmentation, IEEE Trans. Intell. Trans.ort. Syst. 19 (2017), no. 1, 263-272.
- Y. Changqian, J. Wang, C. Peng, C. Gao, G. Yu, and N. Sang, BiSeNet: bilateral segmentation network for real-time semantic segmentation, (European Conference on Computer Vision (ECCV)), 2018, pp. 334-349.
- S. Gould, R. Fulton, and D. Koller, Decomposing a scene into geometrically and semantically consistent regions, (IEEE 12th International Conference on Computer Vision, Kyoto, Japan), 2009. https://doi.org/10.1109/ICCV.2009.5459211
- L. U. Ladicka, C. Russell, P. Kohli, and P. H. Torr, Associative hierarchical CRFs for object class image segmentation, (IEEE 12th International Conference on Computer Vision, Kyoto, Japan), 2009. https://doi.org/10.1109/ICCV.2009.5459248
- Z. Liu, X. Li, P. Luo, C. C. Loy, and X. Tang, Deep learning Markov random field for semantic segmentation, IEEE Trans. Pattern Anal. Mach. Intell 40 (2017), no. 8, 1814-1828.
- S. Zheng, S. Jayasumana, B. Romera-Paredes, V. Vineet, Z. Su, D. Du, C. Huang, and P. H. Torr, Conditional random fields as recurrent neural networks, (IEEE International Conference on Computer Vision, Santiago, Chile), 2015. https://doi.org/10.1109/ICCV.2015.179
- T. Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, and S. Belongie, Feature pyramid networks for object detection, (Proc. IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA), 2017. https://doi.org/10.1109/CVPR.2017.106
- G. Lin, A. Milan, C. Shen, and I. Reid, RefineNet: multi-path refinement networks for high-resolution semantic segmentation, (Proc. IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA), 2017. https://doi.org/10.1109/CVPR.2017.549
- F. Visin, M. Ciccone, A. Romero, K. Kastner, K. Cho, Y. Bengio, M. Matteucci, and A. Courville, ReSeg: a recurrent neural network-based model for semantic segmentation, (IEEE Conference on Computer Vision and Pattern Recognition Workshops, Las Vegas, NV, USA), 2016. https://doi.org/10.1109/CVPRW.2016.60
- W. Byeon, T. M. Breuel, F. Raue, and M. Liwicki, Scene labeling with LSTM recurrent neural networks, (Proc. IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA), 2015. https://doi.org/10.1109/CVPR.2015.7298977
- A. Newell, K. Yang, and J. Deng, Stacked hourglass networks for human pose estimation, (European Conference on Computer Vision, Amsterdam, The Netherlands), 2016, pp. 483-499.
- L. Ke, M. C. Chang, H. Qi, and S. Lyu, Multi-scale structure-aware network for human pose estimation, (Proc. European Conference on Computer Vision (ECCV), Munich, Germany), 2018, pp. 731-746.
- J. Fu, J. Liu, Y. Wang, J. Zhou, C. Wang, and H. Lu, Stacked deconvolutional network for semantic segmentation, IEEE Trans. Image Process. (2019), 1-1. https://doi.org/10.1109/TIP.2019.2895460
- B. Maxim, A. R. Triki, and M. B. Blaschko, The Lovasz-Softmax Loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks, (IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA), 2018. https://doi.org/10.1109/CVPR.2018.00464
- H. Pan, Y. Hong, W. Sun, and Y. Jia, Developed deep dual-resolution networks for real-time and accurate semantic segmentation of traffic scenes, IEEE Trans. Intell. Trans.ort. Syst. 24 (2022), no. 3, 3448-3460.
- A. Paszke, A. Chaurasia, S. Kim, and E. Culurciello, ENet: a deep neural network architecture for real-time semantic segmentation, arXiv preprint, 2016. https://doi.org/10.48550/arXiv.1606.02147
- J. Kang, S. J. Han, N. Kim, and K. W. Min, ETLi: efficiently annotated traffic LiDAR dataset using incremental and suggestive annotations, ETRI J. 43 (2021), no. 4, 630-639. https://doi.org/10.4218/etrij.2021-0055
- J. Behley, M. Garbade, A. Milioto, J. Quenzel, S. Behnke, C. Stachniss, and J. Gall, SemanticKITTI: a dataset for semantic scene understanding of LiDAR sequences, (IEEE/CVF International Conference on Computer Vision, Seoul, Rep of Korea), 2019. https://doi.org/10.1109/ICCV.2019.00939
- P. Kingma Diederik and J. Ba, Adam: a method for stochastic optimization, arXiv preprint, 2014. https://doi.org/10.48550/arXiv.1412.6980