[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.3837/tiis.2019.03.018

Video Object Segmentation with Weakly Temporal Information

Zhang, Yikun (School of Computer Science and Technology, CUMT)
Yao, Rui (School of Computer Science and Technology, CUMT)
Jiang, Qingnan (School of Computer Science and Technology, CUMT)
Zhang, Changbin (School of Computer Science and Technology, CUMT)
Wang, Shi (School of Computer Science and Technology, CUMT)

Publication Information

KSII Transactions on Internet and Information Systems (TIIS) / v.13, no.3, 2019 , pp. 1434-1449 More about this Journal

Abstract

Video object segmentation is a significant task in computer vision, but its performance is not very satisfactory. A method of video object segmentation using weakly temporal information is presented in this paper. Motivated by the phenomenon in reality that the motion of the object is a continuous and smooth process and the appearance of the object does not change much between adjacent frames in the video sequences, we use a feed-forward architecture with motion estimation to predict the mask of the current frame. We extend an additional mask channel for the previous frame segmentation result. The mask of the previous frame is treated as the input of the expanded channel after processing, and then we extract the temporal feature of the object and fuse it with other feature maps to generate the final mask. In addition, we introduce multi-mask guidance to improve the stability of the model. Moreover, we enhance segmentation performance by further training with the masks already obtained. Experiments show that our method achieves competitive results on DAVIS-2016 on single object segmentation compared to some state-of-the-art algorithms.

Keywords

Video object segmentation; temporal feature; feed-forward architecture; further training;

Citations & Related Records

Reference

1	S. Caelles, K. K. Maninis, J. Ponttuset, L. Lealtaixe, D. Cremers, and L. V. Gool, "One-shot video object segmentation," in Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5320-5329, July 21-26, 2017.
2	J. Gao, B. Wang, and Y. Qi, "DeepMask: Masking DNN models for robustness against adversarial samples," arXiv:1702.06763 [cs.LG], February 2017.
3	Pedro O. Pinheiro, Tsung-Yi Lin, Ronan Collobert and Piotr Dollar, "Learning to refine object segments," in Proc. of European Conference on Computer Vision, pp. 75-91, September 17, 2016.
4	F. Perazzi, A. Khoreva, R. Benenson, B. Schiele, and A. Sorkinehornung, "Learning video object segmentation from static images," in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3491-3500, July 21-26, 2017.
5	K. K. Maninis, S. Caelles, Y. Chen, J. Ponttuset, L. Lealtaixe, D. Cremers, and L. Van Gool, "Video object segmentation without temporal information," IEEE Transactions of Pattern Analysis & Machine Intelligence, pp. 1-1, 2018.
6	Sharir, Gilad, E. Smolyansky, and I. Friedman, "Video object segmentation using tracked object proposals," arXiv:1707.06545 [cs.CV], July 20, 2017.
7	Amos Newswanger and Chenliang Xu, "One-shot video object segmentation with iterative online fine-tuning," CVPRW, May 2017.
8	T. Bouwmans, S. Javed, H. Zhang, Z. Lin and R. Otazo, "On the applications of robust PCA in Image and video processing," Proceedings of the IEEE, vol. 106, no. 8, pp. 1427-1457, August 6, 2018. DOI
9	RA. Graciela and CM. Mario, "New trends on dynamic object segmentation in video sequences: a survey," DIEE&C, vol. 11, no. 1, pp. 29-42, Dec. 2013.
10	J. S. Yoon, F. Rameau, J. Kim, S. Lee, S. Shin, and I. S. Kweon, "Pixel-level matching for video object segmentation using convolutional neural networks," arXiv:1708.05137[cs.CV], August 17, 2017.
11	Taylor, Brian, V. Karasev, and S. Soattoc, "Causal video object segmentation from persistence of occlusions," in Proc. of 2015 IEEE Conference on Computer Vision and Pattern Recognition, pp. 4268-4276, June 7-12, 2015.
12	Tokmakov, Pavel, K. Alahari, and C. Schmid. "Learning motion patterns in videos," Computer Vision and Pattern Recognition, pp. 531-539, April 10, 2017.
13	Tsai, Yi Hsuan, M. H. Yang, and M. J. Black. "Video segmentation via object flow," Computer Vision and Pattern Recognition, pp. 3899-3908, June 27-30, 2016.
14	N. Marki, F. Perazzi, O.Wang, and A. Sorkine, "Bilateral space video segmentation," in Proc. of IEEE Conference on Computer Vision & Pattern Recognition, pp. 743-751, June 27-30, 2016.
15	F. Perazzi, O. Wang, M. Gross, and A. Sorkine-Hornung, "Fully connected object proposals for video segmentation," in Proc. of 2015 IEEE International Conference on Computer Vision (ICCV), pp. 3227-3234, Dec. 7-13, 2015.
16	Faktor Alon and Irani Michal, "Video segmentation by non-local consensus voting," British Machine Vision Conference, June 2014.
17	Keuper, Margret, B. Andres, and T. Brox, "Motion trajectory segmentation via minimum cost multicuts," in Proc. of 2015 IEEE International Conference on Computer Vision, pp. 3271-3279, Dec. 7-13, 2015.
18	Q. Fan, F. Zhong, D. Lischinski, D. Cohen-Or, and B. Chen, "Jumpcut: Non-successive mask transfer and interpolation for video cutout," Acm Transactions on Graphic, vol. 34, no. 6, pp. 195, November 2015.
19	Purkait, Pulak, C. Zhao, and C. Zach. "SPP-Net: Deep absolute pose regression with synthetic views," arXiv:1712.03452[cs.CV], December 09, 2017.
20	J. Xing, J. Gao, B. Li, W. Hu, and S. Yan, "Robust object tracking with online multi-lifespan dictionary learning," in Proc. of IEEE International Conference on Computer Vision, pp. 665-672, Dec. 1-8, 2013.
21	D. A. Ross, Lim, R. S. Lin and M. H. Yang, "Incremental learning for robust visual tracking," IEEE International Conference on Computer Vision, vol. 77, no. 1-3, pp. 125-141, May 2008.
22	X. Mei, Ling, "Robust visual tracking and vehicle classification via sparse representation," IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 33, no. 11, pp. 2259-2272, Nov. 2011. DOI
23	S. Hare, S. Golodetz, A. Saffari, V. Vineet, M. M. Cheng, S. L. Hicks, and P. H. Torr, "Struck: Structured output tracking with kernels," IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 38, no. 10, pp. 2096-2109, Oct. 1, 2016. DOI
24	B. Liu, J. Huang, L. Yang and C. Kulikowsk, "Robust tracking using local sparse appearance model and k-selection," in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, vol. 3619, pp. 1313-1320, June 20-25, 2011.
25	B. Babenko, M. H. Yang, and S. Belongie, "Robust object tracking with online multiple instance learning," IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 33, no. 8, pp. 1619-1632, Aug. 2011. DOI
26	R. T. Collins, Y. Liu, and M. Leordeanu, "Online selection of discriminative tracking features," IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 27, no. 10, pp. 1631-1643, Oct. 2005. DOI
27	L. Zhang, and Van Der Maaten, "Preserving structure in model-free tracking," IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 36, no. 4, pp. 756-769, April 2014. DOI
28	K. Zhang, L. Zhang, M. H. and Yang, "Real-time compressive tracking," European Conference on Computer Vision, vol. 7574, pp. 864-877, October 2012.
29	Y. Chen, C. Hao, W. Wu, and E. Wu, "Efficient frame-sequential label propagation for video object segmentation," Multimedia Tools and Applications, vol.77, no. 5, pp. 6117-6133, March 2018. DOI
30	Tokmakov, Pavel, K. Alahari, and C. Schmid, "Learning video object segmentation with visual memory," arXiv:1704.05737 [cs.CV], July 12, 2017.
31	S. Li, B. Seybold, A. Vorobyov, A. Fathi, Q. Huang, and C. Kuo, "Instance embedding transfer to unsupervised video object segmentation," arXiv:1801.00908 [cs.CV], February 2018.
32	Khoreva, Anna, A. Rohrbach, and B. Schiele, "Video Object Segmentation with Language Referring Expressions," arXiv:1803.08006[cs.CV], Feb. 5, 2019.
33	D. Farin, P. de With, W. Effelsberg, "Video-object segmentation using multi-sprite background subtraction," in Proc. of IEEE International Conference on Multimedia and Expo, ICME 2004, pp. 343-346, June 27-30, 2004.
34	S. Kumar, J. Yadav, "Video object extraction and its tracking using background subtraction in complex environments," Perspectives in Science, vol. 8, pp. 317-322, September 2016. DOI
35	C. Li, L. Lin, W. Zuo, W. Wang, and J. Tang, "SOLD: Sub-optimal low-rank decomposition for efficient video segmentation," in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, pp. 5519-5527, June 7-12, 2015.
36	J. Xing, J. Gao, B. Li, W. Hu, and S. Yan, "Robust object tracking with online multi-lifespan dictionary learning," in Proc. of IEEE International Conference on Computer Vision, pp. 665-672, Dec. 1-8, 2013.
37	C. Li, L. Lin, W. Zuo, W. Wang, and J. Tang, "An approach to streaming video segmentation with sub-optimal low-rank decomposition," IEEE Transactions on Image Processing, vol.25, no.5, pp.1947-1960, May 2016. DOI
38	S. Caelles, Y. Chen, J. Ponttuset, and L. Gool, "Semantically-guided video object segmentation," arXiv:1704.01926v2[cs.CV], Jul. 17, 2018.
39	W. Wang, J. Shen, R. Yang, and F. Porikli, "Saliency-aware video object segmentation," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 1, pp. 20-33, Jan. 1, 2018. DOI
40	F. Perazzi, J. Ponttuset, B. Mcwilliams, L. V. Gool, M. Gross, and A. Sorkinehornung, "A benchmark dataset and evaluation methodology for video object segmentation," in Proc. of Conference on Computer Vision and Pattern Recognition (CVPR), pp. 724-732, June 27-30, 2016.
41	L. Wang, W. Ouyang, X. Wang, and H. Lu, "Stct: Sequentially training convolutional networks for visual tracking," in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1373-1381, June 27-30, 2016.
42	L. Wang, W. Ouyang, X. Wang, and H. Lu, "Visual tracking with fully convolutional networks," in Proc. of Computer Vision and Pattern Recognition, pp. 3119-3127, Dec. 7-13, 2015.
43	Y. Qi, S. Zhang, L. Qin, H. Yao, Q. Huang, J. Lim, and M. H. Yang, "Hedged deep tracking," in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pp. 4303-4311, June 27-30, 2016.
44	C. Ma, J. B. Huang, X. Yang, and M. H. Yang, "Hierarchical convolutional features for visual tracking," in Proc. of IEEE International Conference on Computer Vision, pp. 3074-3082, Dec. 7-13, 2015.
45	R. Girshick, "Fast r-cnn," arXiv:1504.08083[cs.CV], September 27, 2015.
46	S. Hong, T. You, S. Kwak, and B.Han, "Online tracking by learning discriminative saliency map with convolutional neural network," arXiv:1502.06796 [cs.CV], February 24, 2015.
47	Hu, Yuan Ting, J. B. Huang, and A. G. Schwing, "Mask-RNN: Instance level video object segmentation," arXiv:1803.11187[cs.CV], March 29, 2018.
48	S. Ren, K.He, R. Girshick, J. and Sun, "Faster r-cnn: towards real-time object detection with region proposal networks," IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 39, no. 6, pp.1137-1149, June 1, 2017. DOI
49	R. Girshick, J. Donahue, T. Darrell, and J. Malik. "Rich feature hierarchies for accurate object detection and semantic segmentation," in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, June 23-28, 2014.