Browse > Article
http://dx.doi.org/10.3837/tiis.2019.03.019

3D Res-Inception Network Transfer Learning for Multiple Label Crowd Behavior Recognition  

Nan, Hao (EE School, Shanghai University of Electric Power)
Li, Min (EE School, Shanghai University of Electric Power)
Fan, Lvyuan (EE School, Shanghai University of Electric Power)
Tong, Minglei (EE School, Shanghai University of Electric Power)
Publication Information
KSII Transactions on Internet and Information Systems (TIIS) / v.13, no.3, 2019 , pp. 1450-1463 More about this Journal
Abstract
The problem towards crowd behavior recognition in a serious clustered scene is extremely challenged on account of variable scales with non-uniformity. This paper aims to propose a crowed behavior classification framework based on a transferring hybrid network blending 3D res-net with inception-v3. First, the 3D res-inception network is presented so as to learn the augmented visual feature of UCF 101. Then the target dataset is applied to fine-tune the network parameters in an attempt to classify the behavior of densely crowded scenes. Finally, a transferred entropy function is used to calculate the probability of multiple labels in accordance with these features. Experimental results show that the proposed method could greatly improve the accuracy of crowd behavior recognition and enhance the accuracy of multiple label classification.
Keywords
Densely crowed group; 3D Convolutional Neural Network (3D CNN); 3D Res-Inception; Transfer Learning;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Huang, J. T., Li, J., Yu, D., Deng, L., Gong, Y., "Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers," in Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 7304-7308, 2013.
2 Long, M., Cao, Y., Wang, J., Jordan, M. I., "Learning Transferable Features with Deep Adaptation Networks," in Proc. of International Conference on Machine Learning, pp. 97-105, 2015.
3 Helbing, D., Molnar, P., "Social force model for pedestrian dynamics," Physical Review E, vol. 51, no. 5, pp. 4282-4286, 1995.   DOI
4 Ali, S., Shah, M., "A lagrangian particle dynamics approach for crowd flow segmentation and stability analysis," in Proc. of IEEE Conference Computer Vision and Pattern Recognition, pp. 1-6, 2007.
5 Girshick, R., Donahue, J., Darrell, T., & Malik, J., "Rich feature hierarchies for accurate object detection and semantic segmentation," in Proc. of IEEE Conference Computer Vision and Pattern Recognition, pp. 580-587, 2014.
6 Long, M., Wang, J., Ding, G., Sun, J., Yu, P. S., "Transfer feature learning with joint distribution adaptation," in Proc. of the IEEE international conference on computer vision, pp. 2200-2207, 2013.
7 Oquab, M., Bottou, L., Laptev, I., Sivic, J., "Learning and transferring mid-level image representations using convolutional neural networks," in Proce. of the IEEE conference on computer vision and pattern recognition, pp. 1717-1724, 2014.
8 Shao, J., Change Loy, C., Wang, X., "Scene-Independent Group Profiling in Crowd," in Proc. of IEEE Conference Computer Vision and Pattern Recognition, pp. 2227-2234, 2014.
9 LeCun, Y., Bottou, L., Bengio, Y., Haffner, P., "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, November 1998.   DOI
10 Russakovsky, O., Deng, J., et al., "Imagenet large scale visual recognition challenge," International Journal of Computer Vision, vol. 115, no. 3, pp. 211-252, December 2015.   DOI
11 Long, J., Shelhamer, E., & Darrell, T., "Fully convolutional networks for semantic segmentation," in Proc. of IEEE Conference Computer Vision and Pattern Recognition, pp. 3431-3440, 2015.
12 Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L., "Large-scale video classification with convolutional neural networks," in Proc. of IEEE Conference Computer Vision and Pattern Recognition, pp. 1725-1732, 2014.
13 Wang, N., Yeung, D. Y., "Learning a deep compact image representation for visual tracking," in Proc. of Advances in Neural Information Processing Systems, 2013.
14 Kang, K. and Wang, X., "Fully Convolutional Neural Networks for Crowd Segmentation," Computer Science, vol. 49, no. 1, pp. 25-30, 2014.
15 Saleemi, I., Hartung, L., Shah, M., "Scene understanding by statis- tical modeling of motion patterns," in Proc. of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2069-2076, 2010.
16 Hochreiter, S., Schmidhuber, J., "Long short-term memory." Neural Computation, vol. 9, no. 8, pp. 1735-1780, November 1997.   DOI
17 Simonyan, K., Zisserman, A., "Two-Stream Convolutional Networks for Action Recognition in Videos," in Proc. of Advances in Neural Information Processing Systems, vol. 1, no. 4, pp. 568-576, 2014.
18 Yue-Hei Ng, J., Hausknecht, M., Vijayanarasimhan, S. et al., "Beyond short snippets: Deep networks for video classification," in Proc. of IEEE Conference Computer Vision and Pattern Recognition, pp. 4694-4702, 2015.
19 Zeng, L., Xu, X., Cai, B., Qiu, S., Zhang, T., "Multi-scale convolutional neural networks for crowd counting," in Proc. of IEEE Conference Image Processing (ICIP), pp. 465-469, 2017.
20 Shao, J., Loy, C. C., Kang, K., Wang, X., "Slicing Convolutional Neural Network for Crowd Video Understanding," in Proc. of IEEE Conference Computer Vision and Pattern Recognition, pp. 5620-5628, 2016.
21 Dai, W., Yang, Q., Xue, G. R.,Yu, Y. "Boosting for transfer learning," in Proc. of International Conference on Machine Learning ACM, pp. 193-200, 2007.
22 Horn, B. K., Schunck, B. G., "Determining optical flow," Artificial Intelligence, vol. 17, no. 1-3, pp. 185-203, 1981.   DOI
23 Yang, Y., Liu, J., Shah, M., "Video scene understanding using multi- scale analysis," in Proc. of IEEE International Conference on Computer Vision, pp. 1669-1676, 2009.
24 Zhou, B., Wang, X., Tang, X., "Understanding collective crowd behaviors: Learning a mixture model of dynamic pedestrian-agents," in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pp. 2871-2878, 2012.
25 Jodoin, P. M., Benezeth, Y., Wang, Y., "Meta-tracking for video scene understanding," in Proc. of IEEE International Conference on Advanced Video and Signal Based Surveillance, pp. 1-6, 2013.
26 Liu, C., "Beyond pixels: exploring new representations and applications for motion analysis," Ph.D. dissertation, Massachusetts Institute of Technology, 2009.
27 Chen, C. L., Xiang, T., Gong, S., "Salient motion detection in crowded scenes," in Proc. of 5th Int. Symposium on Communications Control and Signal Processing, pp. 1-4, 2012.
28 Wu, S., Moore, B. E., Shah, M., "Chaotic invariants of Lagrangian particle trajectories for anomaly detection in crowded scenes," in Proc. of IEEE Conference Computer Vision and Pattern Recognition, pp. 2054-2060, 2010.
29 Tzeng, E., Hoffman, J., Darrell, T., Saenko, K., "Simultaneous Deep Transfer Across Domains and Tasks," in Proc. of IEEE International Conference on Computer Vision, pp. 4068-4076, 2015.
30 Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M., "Learning Spatiotemporal Features with 3D Convolutional Networks," in Proc. of IEEE Conference Computer Vision and Pattern Recognition, pp. 4489-4497, 2015.
31 Cong, Y., Yuan, J., Liu, J., "Abnormal event detection in crowded scenes using sparse representation," Pattern Recognition, vol. 46, no. 7, pp. 1851-1864, 2013.   DOI
32 Chongjing, W., Xu, Z., Yi, Z., Yuncai, L., "Analyzing motion patterns in crowded scenes via automatic tracklets clustering," China Communications, vol. 10, no. 4, pp. 144-154, 2013.   DOI
33 Moore, B. E., Ali, S., Mehran, R., Shah, M., "Visual crowd surveillance through a hydrodynamics lens," Communications of the Acm, vol. 54, no. 12, pp. 64-73, December 2011.   DOI