[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.3837/tiis.2022.11.002

Parallel Dense Merging Network with Dilated Convolutions for Semantic Segmentation of Sports Movement Scene

Huang, Dongya (Department of Physical Education, Nanjing Vocational Institute of Railway Technology)
Zhang, Li (Sports College, Nanchang Institute of Science and Technology)

Publication Information

KSII Transactions on Internet and Information Systems (TIIS) / v.16, no.11, 2022 , pp. 3493-3506 More about this Journal

Abstract

In the field of scene segmentation, the precise segmentation of object boundaries in sports movement scene images is a great challenge. The geometric information and spatial information of the image are very important, but in many models, they are usually easy to be lost, which has a big influence on the performance of the model. To alleviate this problem, a parallel dense dilated convolution merging Network (termed PDDCM-Net) was proposed. The proposed PDDCMNet consists of a feature extractor, parallel dilated convolutions, and dense dilated convolutions merged with different dilation rates. We utilize different combinations of dilated convolutions that expand the receptive field of the model with fewer parameters than other advanced methods. Importantly, PDDCM-Net fuses both low-level and high-level information, in effect alleviating the problem of accurately segmenting the edge of the object and positioning the object position accurately. Experimental results validate that the proposed PDDCM-Net achieves a great improvement compared to several representative models on the COCO-Stuff data set.

Keywords

Sports movement scene; convolutional neural network; semantic segmentation;

Citations & Related Records

Reference

1	O. Ronneberger, P. Fischer, and T. Brox, "U-net: Convolutional networks for biomedical image segmentation," in Proc. of International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234-241, 2015.
2	A. Mustafa, C. Russell, and A. Hilton, "4D Temporally Coherent Multi-Person Semantic Reconstruction and Segmentation," International journal of computer vision, vol. 130, no. 6, pp. 1583-1606, 2022. DOI
3	F. Mueller, F. Bernard, O. Sotnychenko, D. Mehta, S. Sridhar, D. Casas, and C. Theobalt, "Ganerated hands for real-time 3d hand tracking from monocular rgb," in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 49-59, 2018.
4	J. Xie, B. Shuai, JF. Hu, J. Lin, and WS. Zheng, "Improving fast segmentation with teacher-student learning," arXiv preprint, 2018.
5	P. F. Alcantarilla, J. J. Yebes, J. Almazan, and L. M. Bergasa, "On combining visual slam and dense scene flow to increase the robustness of localization and mapping in dynamic environments," in Proc. of the IEEE International Conference on Robotics and Automation, pp. 1290-1297, 2012.
6	P. Bideau, A. RoyChowdhury, R. R. Menon, and E. Learned-Miller, "The best of both worlds: combining cnns and geometric constraints for hierarchical motion segmentation," in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 508-517, 2018.
7	L. Ding, H. Tang and L. Bruzzone, "LANet: Local Attention Embedding to Improve the Semantic Segmentation of Remote Sensing Images," IEEE Transactions on Geoscience and Remote Sensing, vol. 59, no. 1, pp. 426-435, Jan. 2021. DOI
8	H. Caesar, J. Uijlings, and V. Ferrari, "Coco-stuff: thing and stuff classes in context," in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1209-1218, 2018.
9	V. Badrinarayanan, A. Kendall, and R. Cipolla, "Segnet: A deep convolutional encoder-decoder architecture for image segmentation," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 12, pp. 2481-2495, 1 Dec. 2017. DOI
10	H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, "Pyramid scene parsing network," in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2881-2890, 2017.
11	H. Li, P. Xiong, H. Fan, and J. Sun, "Dfanet: deep feature aggregation for real-time semantic segmentation," in Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9522-9531, 2019.
12	A. Bar, J. Lohdefink, N. Kapoor, S. J. Varghese, F. Huger, P. Schlicht, and T. Fingscheidt, "The vulnerability of semantic segmentation networks to adversarial attacks in autonomous driving: Enhancing extensive environment sensing," IEEE Signal Processing Magazine, vol. 38, no. 1, pp. 42-52, Jan. 2021. DOI
13	L. C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, "Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 4, pp. 834-848, 1 April 2018. DOI
14	Z. Wu, C. Shen, and A. Hengel, "Real-time semantic image segmentation via spatial sparsity," arXiv preprint, 2017.
15	A. Paszke, A. Chaurasia, S. Kim, and E. Culurciello, "Enet: a deep neural network architecture for real-time semantic segmentation," arXiv preprint, 2016.
16	H. Zhao, X. Qi, X. Shen, J. Shi, and J. Jia, "Icnet for real-time semantic segmentation on high-resolution images," in Proc. of the European Conference on Computer Vision (ECCV), pp. 418-434, 2018.
17	A. Cioppa, A. Deliege, M. Istasse, C.D. Vleeschouwer, and M.V.Droogenb, "ARTHuS: Adaptive Real-Time Human Segmentation in Sports Through Online Distillation," in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition Workshop, pp. 1-10, 2019.
18	K. Fragkiadaki, P. Arbelaez, P. Felsen, and J. Malik, "Learning to segment moving objects in videos," in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4083-4090, 2015.
19	A. Cioppa, A. Deliege, M. Istasse, C. De Vleeschouwer, and M. Van Droogenbroeck, "Arthus: adaptive real-time human segmentation in sports through online distillation," in Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 1-10, 2019.
20	J. Fu, J. Liu, H. Tian, Y. Li, Y. Bao, Z. Fang, and H. Lu, "Dual attention network for scene segmentation," in Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3146-3154, 2019.
21	C. Yu, J. Wang, C. Peng, C. Gao, G. Yu, and N. Sang, "Learning a discriminative feature network for semantic segmentation," in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1857-1866, 2018.
22	Q. Liu, M. Kampffmeyer, R. Jenssen and A. B. Salberg, "Dense Dilated Convolutions' Merging Network for Land Cover Classification," IEEE Transactions on Geoscience and Remote Sensing, vol. 58, no. 9, pp. 6309-6320, Sept. 2020. DOI
23	L. C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, "Semantic image segmentation with deep convolutional nets and fully connected crfs," arXiv preprint, 2014.
24	J. Hu, L. Shen, S. Albanie, G. Sun and E. Wu, "Squeeze-and-Excitation Networks," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 42, no. 8, pp. 2011-2023, 1 Aug. 2020. DOI
25	T. Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollar, and C. L. Zitnick, "Microsoft coco: common objects in context," in Proc. of European Conference on Computer Vision, pp. 740-755, 2014.
26	S. J. Pan and Q. Yang, "A Survey on Transfer Learning," IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 10, pp. 1345-1359, Oct. 2010. DOI
27	J. Long, E. Shelhamer, and T. Darrell, "Fully convolutional networks for semantic segmentation," in Proc. of the IEEE conference on Computer Vision and Pattern Recognition, pp. 3431-3440, 2015.
28	B. Yu, L. Yang and F. Chen, "Semantic Segmentation for High Spatial Resolution Remote Sensing Images Based on Convolution Neural Network and Pyramid Pooling Module," IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 11, no. 9, pp. 3252-3261, Sept. 2018. DOI
29	X. Cao and Y. Lin, "CAggNet: Crossing Aggregation Network for Medical Image Segmentation," in Proc. of 25th International Conference on Pattern Recognition (ICPR), pp. 1744-1750, 2021.
30	J. Ren, M. Chai, S. Tulyakov, C. Fang, X. Shen, and J. Yang, "Human motion transfer from poses in the wild," in Proc. of the European Conference on Computer Vision, pp. 262-279, 2020.
31	E. Romera, J. M. Alvarez, L. M. Bergasa, and R. Arroyo, "ERFNet: Efficient Residual Factorized ConvNet for Real-Time Semantic Segmentation," IEEE Transactions on Intelligent Transportation Systems, vol. 19, no. 1, pp. 263-272, Jan. 2018. DOI