[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.3837/tiis.2020.03.012

Visual Object Tracking Fusing CNN and Color Histogram based Tracker and Depth Estimation for Automatic Immersive Audio Mixing

Park, Sung-Jun (School of Electronics and Information Engineering Korea Aerospace University)
Islam, Md. Mahbubul (School of Electronics and Information Engineering Korea Aerospace University)
Baek, Joong-Hwan (School of Electronics and Information Engineering Korea Aerospace University)

Publication Information

KSII Transactions on Internet and Information Systems (TIIS) / v.14, no.3, 2020 , pp. 1121-1141 More about this Journal

Abstract

We propose a robust visual object tracking algorithm fusing a convolutional neural network tracker trained offline from a large number of video repositories and a color histogram based tracker to track objects for mixing immersive audio. Our algorithm addresses the problem of occlusion and large movements of the CNN based GOTURN generic object tracker. The key idea is the offline training of a binary classifier with the color histogram similarity values estimated via both trackers used in this method to opt appropriate tracker for target tracking and update both trackers with the predicted bounding box position of the target to continue tracking. Furthermore, a histogram similarity constraint is applied before updating the trackers to maximize the tracking accuracy. Finally, we compute the depth(z) of the target object by one of the prominent unsupervised monocular depth estimation algorithms to ensure the necessary 3D position of the tracked object to mix the immersive audio into that object. Our proposed algorithm demonstrates about 2% improved accuracy over the outperforming GOTURN algorithm in the existing VOT2014 tracking benchmark. Additionally, our tracker also works well to track multiple objects utilizing the concept of single object tracker but no demonstrations on any MOT benchmark.

Keywords

Immersive Audio; GOTURN; Mean-Shift; CNN; Color Histogram; Depth Estimation;

Citations & Related Records

Reference

1	C. Yan, Y. Tu, X. Wang, Y. Zhang, X. Hao, Yongdong Zhang, and Q. Dai, "STAT: Spatial-Temporal Attention Mechanism for video Captioning," IEEE Transactions on Multimedia, vol. 22, no. 1, pp. 229-241, 2020. DOI
2	J. Severson, "Human-digital media interaction tracking," US Patent 9,713,444, 2017.
3	B. Tian, Q. Yao, Y. Gu, K. Wang, and Y. Li, "Video processing techniques for traffic flow monitoring: A survey," in Proc. of Intelligent Transportation Systems (ITSC), 2011 14th International IEEE Conference on. IEEE, pp. 1103-1108, 2011.
4	M. Brown, J. Funke, S. Erlien, and J. C. Gerdes, "Safe driving envelopes for path tracking in autonomous vehicles," Control Engineering Practice, vol. 61, pp. 307-316, 2017. DOI
5	V. A. Laurense, J. Y. Goh, and J. C. Gerdes, "Path-tracking for autonomous vehicles at the limit of friction," in Proc. of American Control Conference (ACC), IEEE, 2017.
6	A. Yilmaz, O. Javed, and M. Shah, "Object tracking: A survey," ACM Computing Surveys, 38(4), 2006.
7	T. Zhang, K. Jia, C. Xu, Y. Ma, and N. Ahuja, "Partial occlusion handling for visual tracking via robust part matching," in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1258-1265, 2014.
8	J. Pan and B. Hu, "Robust occlusion handling in object tracking," in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1-8, 2007.
9	D. Comaniciu, V. Ramesh and P. Meer, "Real-time tracking of non-rigid objects using mean shift," in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, Vol. 2, pp. 142-149, 2000.
10	A. Yilmaz, X. Li, and M. Shah, "Contour-based object tracking with occlusion handling in video acquired using mobile cameras," IEEE transactions on pattern analysis and machine intelligence, vol. 26, no. 11, pp. 1531-1536, 2004. DOI
11	W. Zhong, H. Lu, and M.-H. Yang, "Robust object tracking via sparsity based collaborative model," in Proc. of IEEE Conference on Computer vision and pattern recognition(CVPR), pp. 1838-1845, 2012.
12	J. F. Henriques, R. Caseiro, P. Martins, and J. Batista, "Highspeed tracking with kernelized correlation filters," IEEE Trans. Pattern Anal. Mach. Intell., 37(3), pp.583-596, 2015. DOI
13	A. Adam, E. Rivlin, and I. Shimshoni, "Robust fragments-based tracking using the integral histogram," in Proc. of IEEE Conference on Computer vision and pattern recognition(CVPR), pp. 798-805, 2006.
14	B. Babenko, M.H. Yang and S. Belongie, "Visual tracking with online multiple instance learning," in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pp. 983-990, 2009.
15	Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient based learning applied to document recognition," Proceedings of the IEEE, 86(11), pp. 2278-2324, 1998. DOI
16	K. Zhang, L. Zhang, Q. Liu, D. Zhang, and M.-H. Yang, "Fast visual tracking via dense Spatio-temporal context learning," in Proc. of European Conference on Computer Vision, pp. 127-141, 2014.
17	T. Vojir, N. Jana and M. Jiri., "Robust scale-adaptive mean-shift for tracking," Pattern Recognition Letters, vol. 49, pp. 250-258, 2014. DOI
18	S. Hare, A. Saffari and P.H. Torr, "Struck: Structured output tracking with kernels," in Proc. of IEEE International Conference on Computer Vision, pp. 263-270, 2011.
19	Doucet, D. N. Freitas, and N. Gordon, Sequential Monte Carlo Methods, Practice, Springer, New York, 2001.
20	D. Comaniciu, V. Ramesh, and P. Meer, "Kernel-based object tracking," IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(5), pp. 564-577, 2003. DOI
21	N. Wang and D.-Y. Yeung, "Learning a deep compact image representation for visual tracking," in Proc. of NIPS, pp. 809-817, 2013.
22	C. Godard, O. Mac Aodha, and G. J. Brostow, "Unsupervised monocular depth estimation with left-right consistency," in Proc. of 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017.
23	N. Wang, J. Shi, D.Y. Yeung and J. Jia, "Understanding and diagnosing visual tracking systems," in Proc. of 2015 IEEE International Conference on Computer Vision (ICCV), 2015.
24	Z. Kalal, K. Mikolajczyk and, J. Matas, "Tracking-learning-detection," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 34(7), pp.1409-1422, 2012. DOI
25	Nam, H. Hyeonseob and Bohyung, "Learning Multi-Domain Convolutional Neural Networks for Visual Tracking," in Proc. of The IEE Conference on Computer Vision and Pattern Recognition, June 2016.
26	M. Danelljan, A. Robinson, F.S. Khan and M. Felsberg, "Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking," in Proc. of the European Conference on Computer Vision(ECCV), pp. 472-488, 2016.
27	D. Held, S. Thrun, and S. Savarese, "learning to track at 100 fps with deep regression networks," in Proc. of European Conference on Computer Vision, Springer, Cham, pp. 749-765, October 2016.
28	M. Kristan, R. Pugfelder, A. Leonardis, J. Matas, L. Cehovin, G. Nebehay, T. Vojir, G. Fernandez, A. Luke_zi_c, A. Dimitriev, et al., "The visual object tracking VOT2014 challenge results," in Proc. of Computer Vision-ECCV 2014 Workshops, Springer, pp. 191-217, 2014.
29	M. Kristan, J. Matas, A. Leonardis, M. Felsberg, L. Cehovin, G. Fernandez, and R. Pflugfelder, "The visual object tracking VOT2015 challenge results," in Proc. of the IEEE international conference on computer vision workshops, pp. 1-23, 2015.
30	L. Leal-TaixAe, C. Canton-Ferrer, and K. Schindler, "Learning by tracking: Siamese CNN for robust target association," in Proc. of CVPRW, 2016.
31	J. Kwon, K.M. Lee, "Tracking by sampling and integrating multiple trackers," IEEE Trans. Pattern Anal. Mach. Intell., vol. 36(7), pp.1428-1441, 2014. DOI
32	A. Milan, S. H. Rezatofighi, A. Dick, I. Reid and K. Schindler, "Online multi-target tracking using recurrent neural networks," Computer Vision and Pattern Recognition (cs.CV), 2016.
33	B. Wang, L. Wang, B. Shuai, Z. Zuo, T. Liu, K. Luk Chan and G. Wang, "Joint learning of convolutional neural networks and temporally constrained metrics for tracklet association," in Proc. of CVPRW, 2016.
34	P. Perez, C. Hue, J. Vermaak, M. Gangnet, "Color-based probabilistic tracking," in Proc. of the European Conference on Computer Vision, pp. 661-675, 2002.
35	Y. Li, H. Ai, T. Yamashita, S. Lao, M. Kawade, "Tracking in low frame rate video: a cascade particle filter with discriminative observers of different life spans," IEEE Trans. Pattern Anal. Mach. Intell., vol. 30(10), pp.1728-1740, 2008. DOI
36	X. Li, W. Hu, C. Shen, Z. Zhang, A.R. Dick, A. van den Hengel, "A survey of appearance models in visual object tracking," ACM Trans. Intell. Syst. Technol., vol. 4(4), pp.1-48, 2013.
37	N. Wang, J. Wang, and D.-Y. Yeung, "Online Robust Non-negative Dictionary Learning for Visual Tracking," in Proc. of ICCV, 2013.
38	X. Mei and H.Ling, "Robust visual tracking using ${\iota}_{1}$ minimization," in Proc. of ICCV, 2009.
39	T. Zhang, B. Ghanem, S. Liu, and N. Ahuja, "Robust visual tracking via multi-task sparse learning," in Proc. of CVPR, 2012.
40	B. Han, D. Comaniciu, Y. Zhu, and L. Davis, "Sequential kernel density approximation and its application to real-time visual tracking," IEEE Trans. Pattern Anal. Mach. Intell., vol. 30(7), pp.1186-1197, 2008. DOI
41	H. Grabner, C. Leistner, and H. Bischof, "Semi-supervised on-line boosting for robust tracking," in Proc. of ECCV, pp. 234-247, 2008.
42	D. A. Ross, J. Lim, R.-S. Lin, and M.-H. Yang, "Incremental learning for robust visual tracking," IJCV, 77(1-3), pp.125-141, 2008. DOI
43	B. Babenko, M. H. Yang, and S. Belongie, "Robust object tracking with online multiple instance learning," IEEE Trans. Pattern Anal. Mach. Intell., vol. 33(8), pp.1619-1632, 2011. DOI
44	H. Grabner, M. Grabner, and H. Bischof, "Real-time tracking via on-line boosting," in Proc. of BMVC, pp. 6.1-6.10, 2006.
45	D. S Bolme, J R. Beveridge, B. A Draper, and Y. M Lui. 2010, "Visual object tracking using adaptive correlation filters," in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, 2010.
46	Z. Cai, L. Wen, J. Yang, Z. Lei, and S. Z. Li, "Structured visual tracking with dynamic graph," in Proc. of ACCV, pp. 86-97, 2012.
47	M. Danelljan, G. Hager, F. Khan, and M. Felsberg, "Accurate scale estimation for robust visual tracking," in Proc. of BMVC, 2014.
48	Z. Hong, Z. Chen, C. Wang, X. Mei, D. Prokhorov, and D. Tao, "MUlti-Store Tracker (MUSTer): a cognitive psychology inspired approach to object tracking," in Proc. of CVPR, 2015.
49	H. Possegger, T. Mauthner, and H. Bischof, "In defense of color-based model free tracking," in Proc. of IEEE International Conference on Computer Vision and Pattern Recognition, 2015.
50	A. D. Jepson, D. J. Fleet, and T. F. El-Maraghi, "Robust online appearance models for visual tracking," IEEE Trans. Pattern Anal. Mach. Intell., vol. 25(10), pp.1296-1311, 2003. DOI
51	J. Fan, W. Xu, Y. Wu, and Y. Gong, "Human tracking using convolutional neural networks," IEEE Trans. Neural Networks, vol. 21(10), pp.1610-1623, 2010. DOI
52	S. Sivanantham, N. N. Paul, and R. S. Iyer, "Object tracking algorithm implementation for security applications," Far East Journal of Electronics and Communications, vol. 16, no. 1, pp. 1-13, 2016. DOI
53	L. Bertinetto, J. Valmadre, S. Golodetz, O. Miksik, and P.H.S. Torr, "Staple: Complementary learners for real-time tracking," in Proc. of IEEE International Conference on Computer Vision and Pattern Recognition, 2016.
54	S. Hong, T. You, S. Kwak, and B. Han, "Online tracking by learning discriminative saliency map with convolutional neural network," in Proc. of ICML, pp. 597-606, 2015.
55	H. Li, Y. Li, and F. Porikli, "DeepTrack: Learning discriminative feature representations by convolutional neural networks for visual tracking," in Proc. of BMVC, 2014.
56	N.Wang, S. Li, A. Gupta, and D.-Y. Yeung, "Transferring rich feature hierarchies for robust visual tracking," arXiv preprint arXiv:1501.04587, 2015.
57	M. Danelljan, G. Bhat, F.S. Khan, M. Felsberg, "ECO: efficient convolution operators for tracking," in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6638-6646, 2017.
58	R. Tao, E. Gavves, and A. Smeulders, "Siamese instance search for tracking," in Proc. of CVPR, pp.1420-1429, 2016.
59	K. Chen and W. Tao, "Once for all: a two-flow convolutional neural network for visual tracking," IEEE T-CSVT, vol. 28(12), pp. 3377-3386, 2017.
60	D. Eigen, C. Puhrsch, and R. Fergus., "Depth map prediction from a single image using a multi-scale deep network," NIPS, 2014.
61	B. E. Boser, I. M. Guyon, V. N. Vapnik, "A training algorithm for optimal margin classifiers," in in Proc. of the fifth annual workshop on Computational learning theory, pp. 144-152, 1992.
62	F. Liu, C. Shen, G. Lin, and I. Reid., "Learning depth from single monocular images using deep convolutional neural fields," PAMI, vol. 38(10), pp. 2024-2039, 2015.