Video Representation via Fusion of Static and Motion Features Applied to Human Activity Recognition |
Arif, Sheeraz
(School of Information and Electronics, Beijing Institute of Technology)
Wang, Jing (School of Information and Electronics, Beijing Institute of Technology) Fei, Zesong (School of Information and Electronics, Beijing Institute of Technology) Hussain, Fida (School of Electrical and Information Engineering, Jiangsu University) |
1 | L. Wang, Y. Qiao and X. Tang, "Mofap: a multi-level representation for action recognition," International Journal of Computer Vision, vol.119, no.3, pp. 254-271, 2016. DOI |
2 | L. Wang, Y. Qiao and X. Tang, "Action recognition with trajectory-pooled deep-convolutional descriptors," in Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4305-4314, June 7-12, 2015. |
3 | J. Wang, W. Wang and R. Wang, "Deep alternative neural network: exploring contexts as early as possible for action recognition," Advances in Neural Information Processing Systems (NIPS), pp.811-819, December, 2016. |
4 | H. Bilen, B. Fernando and E. Gavves, "Dynamic image networks for action recognition," in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3034-3042, June 27-30, 2016. |
5 | Z. Li, E. Gavves, M. Jain and C.G.M. Snoek, "VideoLSTM convolves, attends and flows for action recognition," Computer Vision and Image Understanding, vol. 166, pp. 41-50, January 2018. DOI |
6 | X. Wang, L. Gao, and P. Wang, "Two-stream 3D convNet Fusion for Action Recognition in Videos with Arbitrary Size and Length," IEEE transaction on multimedia, vol.20, no. 3, pp. 634-644, March 2018. DOI |
7 | S. Yu, Y. Cheng and L. Xie, "Fully convolutional networks for action recognition," Institution of Engineering and Technology (IET) Computer vision, vol.11, no.8, pp. 744 -749, December, 2017. |
8 | Y. Zhu, Z. Lan and S. Newsam, "Hidden two-stream convolutional networks for action recognition," ArXiv, April, 2017. |
9 | B. Ni, P. Moulin, X. Yang and S. Yan, "Motion part regularization: Improving action recognition via trajectory selection," in Proc. of IEEE conference on (CVPR), Boston, MA, USA, pp. 3698-3706, June 7-12, 2015. |
10 | L. Wang and Z. Wang, "Temporal segment networks: towards good practices for deep action recognition," in Proc. of Euro. Conf. on Computer Vision, pp. 20-36, October 11-14, 2016. |
11 | N. Dalal and B. Triggs, "Histograms of oriented gradients for human detection," in Proc. of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 886-893, June 20-25, 2005. |
12 | A. Klaser, M. Marszalek, and C. Schmid, "A Spatio-Temporal Descriptor Based on 3D-Gradients," in Proc. of 19th British Machine Vision Conference, British Machine Vision Association: Leeds, United Kingdom, pp.1-10, September, 2008. |
13 | P. Scovanner, S. Ali and M. Shah, "A 3-Dimensional SIFT Descriptor and its Application to Action recognition," in Proc. of the 15th International Conference on Multimedia, pp. 357-360,September 25-29, 2007. |
14 | I. Laptev, "On Space-Time Interest Points," International Journal of Computer Vision, vol. 64, no. 2-3, pp.107-123, September, 2005. DOI |
15 | H. Wang, A. Klaser, and C. Schimid, "Action recognition by dense trajectories," in Proc. of IEEE conference on computer vision and pattern recognition, pp.3169-3176, June 20-25, 2011. |
16 | O.V.R. Murthy and R.Goecke, "Ordered trajectories for large scale human action recognition," in Proc. of IEEE conference on computer vision and pattern recognition, pp. 412-419, December 2-8, 2013. |
17 | Y. Wang, S. Wang and J. Tang, "Hierarchical attention network for action recognition in videos," ArXiv, July, 2016. |
18 | H. Wang, A. Klaser A and C. Schimid, "Dense trajectories and motion boundary descriptor for action recognition," in Proc. of international journal of computer vision, vol. 103, pp. 60-79, March, 2013. DOI |
19 | S. Yeung, O. Russakovsky and N. Jin, "Every moment counts: Dense detailed labeling of actions in complex videos," International Journal of Computer Vision, vol.126, no.2-4, pp. 375-389, April, 2018. DOI |
20 | S. Sharma, R. Kiros and R. Salakhutdinov, "Action recognition using visual attention," in Proc. of Neural Information Processing Systems (NIPS) Time Series Workshop, December, 2015. |
21 | H. Zhu, J. Weibel and S.Lu, "Discriminative multi-modal feature fusion for RGBD indoor scene recognition," in Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2969-2976, June 27-30, 2016. |
22 | K. Simonyan and A. Zisserman, "Two-stream convolutional networks for action recognition in videos," Advances in neural information processing systems, vol. 1, pp. 568-576, June, 2014. |
23 | H. Wang and C. Schmid, "Action recognition with improved trajectories," in Proc. of IEEE International conference on computer vision, pp. 3551-3558, December 1-8, 2013. |
24 | N. Dalal, B. Triggs and C. Schmid, "Human detection using oriented histograms of flow and appearance," in Proc. of European Conference on Computer Vision , pp 428-441, May 7-13, 2006. |
25 | A. Karpathy, G. Toderici, S. Shetty and T. Leung, "Large-scale video classification with convolutional neural networks," in Proc. of IEEE conference on computer vision and pattern recognition, pp. 1725-1732, June 23-28, 2014. |
26 | D.G. Lowe, "Distinctive Image Features from Scale-Invariant Key points," International Journal of Computer Vision, vol. 60, no. 2, pp. 91-110, November, 2004. DOI |
27 | Z. Wu, X. Wang and Y. Jiang, "Modeling spatial temporal clues in a hybrid deep learning framework for video classification," in Proc. of ACM international conference on Multimedia, pp. 461-470, October 26-30, 2015. |
28 | C. Feichtenhofer, A. Pinz and R.P. Wildes, "Spatiotemporal residual networks for video action recognition," in Proc. of Conference on Neural Information Processing Systems, pp. 1-9, December, 2016. |
29 | D.G. Lowe, "Object Recognition from Local Scale-Invariant Features," in Proc. of international Conference on Computer Vision, pp. 1150-1157, September 20-27, 1999. |
30 | Farneback, "Two-frame motion estimation based on polynomial expansion," in Proc. of the Scandinavian Conference on Image Analysis (SCIA), pp 363-370, June 29 -July 2, 2003. |
31 | T. Brox and J. Malik, "Large displacement optical flow: Descriptor matching in variational motion estimation," in Proc. of IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no.3, pp.500-513, August, 2011. DOI |
32 | L. Sun, K. Jia, and D. Yeung, "Human action recognition using factorized spatio-temporal convolutional networks," in Proc. of IEEE International Conference on computer vision (ICCV), pp. 4597-4605, December 7-13, 2015. |
33 | G.W Taylor, R. Fergus and Y. LeCun, "Convolutional learning of spatio-temporal features," in Proc. of 11th European conference on Computer vision, pp. 140-153, September 5-11, 2010. |
34 | Ji Si, Xu W, Yang M, et al., "3d convolutional neural networks for human action recognition," IEEE Trans. Pattern Anal. Mach. Intell., vol.35, no.1, pp.221-231, January, 2013. DOI |
35 | D. Tran, L. Bourdev and Fergus, "Learning spatiotemporal features with 3d convolutional networks," in Proc. of IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, pp. 4489-4497, December 7-13, 2015. |
36 | J. Donahue, L.A. Hendricks and S. Guadarrama, "Long-term recurrent convolutional networks for visual recognition and description," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol, 39, no. 4, pp. 677-691, September, 2016. DOI |
37 | J. Delhumeau, P.H. Gosselin and H. Jegou, "Revisiting the VLAD image representation," in Proc. of the 21st ACM international conference on Multimedia, Barcelona, Spain, pp. 653-656, October 21-25, 2013. |
38 | C. Liu, J. Yuen and A. Torralba, "SIFT Flow: Dense Correspondence across Different Scenes," in Proc. of European Conference on Computer Vision (ECCV), pp. 28-42, October 12-18, 2008. |
39 | C. Liu, J. Yuen and A. Torralba, "SIFT Flow: Dense Correspondence across Scenes and its Applications," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 5, pp. 978-994, May, 2011. DOI |
40 | Y. Boykov, O. Veksler and R. Zabih, "Fast approximate energy minimization via graph cut," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no.11, pp. 1222-1239, November, 2001. DOI |
41 | K. Soomro, A.R. Zamir and M. Shah, "UCF101: A dataset of 101 human actions classes from videos in the wild," Published in OALib journal, 2012. |
42 | H. Kuehne, H. Jhuang and H. Garrote, "HMDB: a large video database for human motion recognition," in Proc. of IEEE International Conference on Computer Vision, pp. 2556-2563, November 6-13, 2011. |
43 | J. Liu, J. Luo and Shah, "Recognizing realistic actions from videos in the wild," in Proc. of IEEE conference on computer vision and pattern recognition, pp. 1996-2003, June 20-25, 2009. |
44 | Y.G. Jiang, J. Liu and A.R. Zamir, "THUMOS challenge: Action recognition with a large number of classes," 2013. |
45 | P. Wang, Y. Cao and C. Shen, "Temporal pyramid pooling based convolutional neural networks for action recognition," IEEE Transactions on Circuits and Systems for Video Technology, vol. 27, no. 12, pp. 2613-2622, June, 2017. DOI |
46 | J.J Seo, H.I. Kim and DE. Neve, "Effective and efficient human action recognition using dynamic frame skipping and trajectory rejection," Journal of Image and Vision Computing, vol.58, pp. 76-85, February, 2017. DOI |
47 | G. Willems, T. Tuytelaars and L.J.V. Gool, "An efficient dense and scale - variant spatio-temporal interest point detector," in Proc. of European Conference on Computer Vision (ECCV), pp. 650-663, October 12-18, 2008. |
48 | G. Varol, I. Laptev, and C. Schmid, "Long-term Temporal Convolutions for Action Recognition," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 6, pp. 1510-1517, June ,2017. DOI |
49 | Z. Wu, X. Wang, and Y.G. Jiang, "Modeling spatial-temporal clues in a hybrid deep learning framework for video classification," in Proc. of the ACM international conference on Multimedia, pp. 461-470, October 27-30, 2015. |
50 | S. Hochreiter and J. Schmidhuber, "Long short-term memory," neural computation, vol.9, no.8, pp. 1735-1780, November, 1997. DOI |
51 | P. Dollar, V. Rabaud and G. Cottrell, "Behavior recognition via sparse spatio-temporal features," IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65-72, October 15-16, 2005. |
52 | N. Srivastava, E. Mansimov and R. Salakhutdinov, "Unsupervised Learning of Video Representations using LSTMs," in Proc. of the International Conference on Machine Learning, pp. 843-852, July 6-11, 2015. |
53 | N. Ballas, L. Yao and C. Pal C, "Delving deeper into convolutional networks for learning video representations," in Proc. of IEEE International Conference on Robotics and Automation (ICRA), March, 2016. |
54 | J.Y. Ng, M. Hausknecht and S. Vijayanarasimhan , "Beyond short snippets: Deep networks for video classification," in Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4694-4702, June 7-12, 2015. |
55 | H. Gammulle, S. Denman and S. Sridharan, "Two stream lstm: A deep fusion framework for human action recognition," in Proc. of IEEE Winter Conference on Applications of Computer Vision (WACV), Santa Rosa, USA, pp. 177-186, March 24-31, 2017. |