[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.3837/tiis.2021.09.007

Higher-Order Conditional Random Field established with CNNs for Video Object Segmentation

Hao, Chuanyan (School of Education Science and Technology, Nanjing University of Posts and Telecommunications)
Wang, Yuqi (School of Education Science and Technology, Nanjing University of Posts and Telecommunications)
Jiang, Bo (School of Education Science and Technology, Nanjing University of Posts and Telecommunications)
Liu, Sijiang (School of Education Science and Technology, Nanjing University of Posts and Telecommunications)
Yang, Zhi-Xin (State Key Laboratory of Internet of Things for Smart City, Department of Electromechanical Engineering University of Macau)

Publication Information

KSII Transactions on Internet and Information Systems (TIIS) / v.15, no.9, 2021 , pp. 3204-3220 More about this Journal

Abstract

We perform the task of video object segmentation by incorporating a conditional random field (CRF) and convolutional neural networks (CNNs). Most methods employ a CRF to refine a coarse output from fully convolutional networks. Others treat the inference process of the CRF as a recurrent neural network and then combine CNNs and the CRF into an end-to-end model for video object segmentation. In contrast to these methods, we propose a novel higher-order CRF model to solve the problem of video object segmentation. Specifically, we use CNNs to establish a higher-order dependence among pixels, and this dependence can provide critical global information for a segmentation model to enhance the global consistency of segmentation. In general, the optimization of the higher-order energy is extremely difficult. To make the problem tractable, we decompose the higher-order energy into two parts by utilizing auxiliary variables and then solve it by using an iterative process. We conduct quantitative and qualitative analyses on multiple datasets, and the proposed method achieves competitive results.

Keywords

Video object segmentation; Conditional random field; Convolution Neural Networks; Higher-order potential;

Citations & Related Records

Reference

1	J. Cheng, Y. Liu, X. Tang, V. S. Sheng, and M. Li et al., "DDOS attack detection via multi-scale convolutional neural network," Comput. Mater. Contin., vol. 62, no. 3, pp. 1317-1333, 2020. DOI
2	L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, "Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 4, pp. 834-848, 2018. DOI
3	Q. Wang, L. Zhang, L. Bertinetto, W. Hu, and P. H. S. Torr, "Fast online object tracking and segmentation: A unifying approach," in Proc. of IEEE Conf. Comput. Vis. Pattern Recog., Long Beach, USA, pp. 1328-1338, 2019.
4	H. Y. Tsai, H. M. Yang, and J. M. Black, "Video segmentation via object flow," in Proc. of IEEE Conf. Comput. Vis. Pattern Recog., Las Vegas, USA, pp. 3899-3908, 2016.
5	V. Jampani, R. Gadde, and P. V. Gehler, "Video propagation networks," in Proc. of IEEE Conf. Comput. Vis. Pattern Recog., Honolulu, USA, pp. 3154-3164, 2017.
6	M. Duan, K. Li, A. Ouyang, K. N. Win, K. Li, and Q. Tian, "EGroupNet: A Feature-enhanced Network for Age Estimation with Novel Age Group Schemes," ACM Trans. Multim. Comput. Commun., vol. 16, no. 2, pp. 42:1-42:23, Jun. 2020.
7	M. Duan, K. Li, K. Li, and Q. Tian, "A Novel Multi-task Tensor Correlation Neural Network for Facial Attribute Prediction," ACM Trans. Intell. Syst. Technol., vol. 12, no. 1, pp. 3:1-3:22, Feb. 2021.
8	C. Chen, K. Li, S. G. Teo, X. Zou, K. Li, and Z. Zeng, "Citywide Traffic Flow Prediction Based on Multiple Gated Spatio-temporal Convolutional Neural Networks," ACM Trans. Knowl. Discov. Data, vol. 14, no. 4, pp. 42:1-42:23, July. 2020.
9	Y. Chen, C. Hao, A. X. Liu, and E. Wu, "Appearance-consistent video object segmentation based on a multinomial event model," ACM Trans. Multimed. Comput. Com., vol. 15, no. 2, pp. 40:1-40:15, 2019.
10	Y.-T. Hu, J.-B. Huang, and A. G. Schwing, "Unsupervised video objectsegmentation using motion saliency-guided spatio-temporal propagation," in Proc. of Eur. Conf. Comput. Vis., Munich, Germany, pp. 813-830, 2018.
11	N. S. Rani, M. Chandrajith, B. R. Pushpa and B. R. Pushpa, "A deep convolutional architectural framework for radiograph image processing at bit plane level for gender & age assessment," Comput. Mater. Contin., vol. 62, no. 2, pp. 679-694, 2020. DOI
12	F. Perazzi, O. Wang, M. Gross, and A. Sorkine-Hornung, "Fully connected object proposals for video segmentation," in Proc. of IEEE Int. Conf. Comput. Vis., Santiago, USA, pp. 3227-3234, 2015.
13	H. Xiao, J. Feng, G. Lin, Y. Liu, and M. Zhang, "Monet: Deep motion exploitation for video object segmentation," in Proc. of IEEE Conf. Comput. Vis. Pattern Recog., Salt Lake City, USA, pp. 1140-1148, 2018.
14	L. Bao, B. Wu, and W. Liu, "Cnn in mrf: Video object segmentation via inference in a cnn-based higher-order spatio-temporal mrf," in Proc. of IEEE Conf. Comput. Vis. Pattern Recog., Salt Lake City, USA, pp. 5977-5986, 2018.
15	N. Xu, L. Yang, Y. Fan, J. Yang, D. Yue, Y. Liang, B. Price, S. Cochen, and T. Huang, "Youtubevos: Sequence-to-sequence video object segmentation," in Proc. of Eur. Conf. Comput. Vis., Munich, Germany, pp. 603-619, 2018.
16	Y. Chen, C. Hao, W. Wen, and E. Wu, "Efficient frame-sequential label propagation for video object segmentation," Multimed. Tools Appl., vol. 77, no. 5, pp. 6117-6133, 2018. DOI
17	Y. Chen, C. Hao, A. X. Liu, and E. Wu, "Multi-level model for video object segmentation based on supervision optimization," IEEE Trans. Multimed., vol. 21, no. 8, pp. 1934-1945, 2019. DOI
18	J. K. Yeong, and C.-S. Kim, "Cdts: Collaborative detection, tracking, and segmentation for online multiple object segmentation in videos," in Proc. of IEEE Int. Conf. Comput. Vis., Venice, Italy, pp. 3621-3629, 2017.
19	W. D. Jang, and C. S. Kim, "Online video object segmentation vias convolutional trident network," in Proc. of IEEE Conf. Comput. Vis. Pattern Recog., Honolulu, USA, pp. 7474-7483, 2017.
20	F. Perazzi, A. Khoreva, R. Benenson, B. Schiele, and A. Sorkine-Hornung, "Learning video object segmentation from static images," in Proc. of IEEE Conf. Comput. Vis. Pattern Recog., Honolulu, USA, pp. 3491-3500, 2017.
21	S. Caelles, K. K. Maninis, J. Pont-Tuset, L. Leal-Taixe, D. Cremers, and L. V. Gool, "One-shot video object segmentation," in Proc. of IEEE Conf. Comput. Vis. Pattern Recog., Honolulu, USA, pp. 5320-5329, 2017.
22	P. Voigtlaender, and B. Leibe, "Online adaptation of convolutional nenural networks for video object segmentation," in Proc. of the 2017 British Mach. Vis. Conf., June 2017.
23	T.-Y. Lin, M. Maire, S. Belongie, L. Bourdev, R. Girshick, J. Hays, P. Perona, D. Ramanan, C. L. Zitnick, and P. Dollr, "Microsoft coco: Common objects in context," in Proc. of Eur. Conf. Comput. Vis., Zurich, Switzerland, pp. 740-755, 2014.
24	Y. Chen, J. Pont-Tuset, A. Montes, and L. V. Gool, "Blazingly fast video object segmentation with pixel-wise metric learning," in Proc. of IEEE Conf. Comput. Vis. Pattern Recog., Salt Lake City, USA, pp. 1189-1198, 2018.
25	T. Yang, S. Jia and H. Ma, "Research on the application of super resolution reconstruction algorithm for underwater image," Comput. Mater. Contin., vol. 62, no. 3, pp. 1249-1258, 2020. DOI
26	L. Pan, C. Li, S. Pouyanfar, R. Chen and Y. Zhou, "A novel combinational convolutional neural network for automatic food-ingredient classification," Comput. Mater. Contin., vol. 62, no. 2, pp. 731-746, 2020. DOI
27	S. W. Oh, J. Lee, K. Sunkavalli, and S. J. Kim, "Fast video object segmentation by reference-guided mask propagation," in Proc. of IEEE Conf. Comput. Vis. Pattern Recog., Salt Lake City, USA, pp. 7376-7385, 2018.
28	J. S. Yoon, F. Rameau, J. Kim, S. Lee, S. Shin, and I. S. Kweon, "Pixel-level matching for video object segmentation using convolutional neural networks," in Proc. of IEEE Int. Conf. Comput. Vis., Venice, Italy, pp. 2186-2195, 2017.
29	J. Chen, K. Li, K. Bilal, X. Zhou, K. Li, and P. S. Yu, "A bi-layered parallel training architecture for large-scale convolutional neural networks," IEEE Trans. Parallel Distributed Syst., vol. 30, no. 5, pp. 965-976, May 2019. DOI
30	L. Yang, Y. Wang, X. Xiong, J. Yang, and A. K. Katsaggelos, "Efficient video object segmentation via network modulation," in Proc. of IEEE Conf. Comput. Vis. Pattern Recog., Salt Lake City, USA, pp. 6499-6507, 2018.
31	S. W. Oh, J. Lee, N. Xu, and S. J. Kim, "Video object segmentation using space-time memory networks," in Proc. of IEEE Int. Conf. Comput. Vis., Seoul, Korea, pp. 9225-9234, 2019.
32	W. Wang, S. Bing, J. Xie, and F. Porikli, "Super-trajectory for video segmentation," in Proc. of IEEE Int. Conf. Comput. Vis., Venice, Italy, pp. 1680-1688, 2017.
33	N. Marki, F. Perazzi, O. Wang, and A. Sorkine-Homung, "Bilateral space video segmentation," in Proc. of IEEE Conf. Comput. Vis. Pattern Recog., Las Vegas, USA, pp. 743-751, 2016.
34	J. K. Yeong, and C.-S. Kim, "Primary object segmentation in videos based on region augmentation and reduction," in Proc. of IEEE Conf. Comput. Vis. Pattern Recog., Honolulu, USA, pp. 7417-7425, 2017.