Browse > Article
http://dx.doi.org/10.5909/JBE.2020.25.3.374

Preprocessing Technique for Improving Action Recognition Performance in ERP Video with Multiple Objects  

Park, Eun-Soo (Department of Computer Education, Sungkyunkwan University)
Kim, Seunghwan (Department of Computer Education, Sungkyunkwan University)
Ryu, Eun-Seok (Department of Computer Education, Sungkyunkwan University)
Publication Information
Journal of Broadcast Engineering / v.25, no.3, 2020 , pp. 374-385 More about this Journal
Abstract
In this paper, we propose a preprocessing technique to solve the problems of action recognition with Equirectangular Projection (ERP) video. The preprocessing technique proposed in this paper assumes the person object as the subject of action, that is, the Object of Interest (OOI), and the surrounding area of the OOI as the ROI. The preprocessing technique consists of three modules. I) Recognize person object in the image with object recognition model. II) Create a saliency map from the input image. III) Select subject of action using recognized person object and saliency map. The subject boundary box of the selected action is input to the action recognition model in order to improve the action recognition performance. When comparing the performance of the proposed preprocessing method to the action recognition model and the performance of the original ERP image input method, the performance is improved up to 99.6%, and the action is obtained when only the OOI is detected. It can also see the effects of related video summaries.
Keywords
Action recognition; Equirectangular projection; Preprocessing;
Citations & Related Records
Times Cited By KSCI : 4  (Citation Analysis)
연도 인용수 순위
1 J. Gutierrez, E. J. David, A. Coutrot, M. P. Da Silva, and P. L. Callet. 2018. Introducing UN Salient360! Benchmark: A platform for evaluating visual attention models for $360^{\circ}$ contents. In 2018 Tenth International Conference on Quality of Multimedia Experience (QoMEX). 1-3. https://doi.org/10.1109/QoMEX.2018.8463369
2 Hou-Ning H., Yen-Chen L., Ming-Yu L., Hsien-Tzu C., Yung-Ju C., Min Sun. Deep 360 pilot: Learning a deep agent for piloting through 360 sports videos. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp. 1396-1405. 2017.
3 Hyun-Joon R, SungWon H, Eun-Seok R. "Prediction complexitybased HEVC parallel processing for asymmetric multicores." Multimedia Tools and Applications 76, 23, pp.25271-25284. 2017.   DOI
4 Hyun-Joon R, Bok-Gi L, Eun-Seok R. "Tile Partitioning and Allocation for HEVC Parallel Decoding on Asymmetric Multicores." The Journal of Korean Institute of Communications and Information Sciences (J-KICS), Vol.43, No.05, pp. 791-800. 2018.   DOI
5 Seehwan Y, Eun-Seok R. "Parallel HEVC decoding with asymmetric mobile multicores." Multimedia Tools and Applications 76, 16, pp.17337-17352. 2017.   DOI
6 Robert S, Yago S, Karsten S, Thomas S, Eun-Seok R, Jangwoo S. "Temporal MCTS Coding Constraints Implementation." 122th MPEG meeting of ISO/IEC JTC1/SC29/ WG11, MPEG 122/m42423. 2018.
7 Jang-Woo S, Dongmin J, Eun-Seok R. "Implementing Motion-Constrained Tile and Viewport Extraction for VR Streaming." ACM Network and Operating System Support for Digital Audio and Video 2018 (NOSSDAV2018). 2018.
8 Jang-Woo S, Eun-Seok R. "Tile-Based 360-Degree Video Streaming for Mobile Virtual Reality in Cyber Physical System." Elsevier, Computers and Electrical Engineering. 2018.
9 Jong-Beom J., Soonbin L., Dongmin J, Il-Woong R., Tuan T. L., Jaesung R., Eun-Seok R."Implementing Multi-view 360 Video Compression System for Immersive Media", The Korean Institute of Broadcast and Media Engineers (KIBME) Summer Conference, pp.139-142, Jun. pp.19-21, 2019.
10 JongBeom J, Dongmin J, Jangwoo S, Eun-Seok R, "3DoF+ 360 Video Location based Asymmetric Down-sampling for View Synthesis to Immersive VR Video Streaming", MDPI, Sensors, 18(9):3148, Sep. 2018.   DOI
11 Itti, L., Koch, C., & Niebur, E. (1998). A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis & Machine Intelligence, (11), 1254-1259.
12 Hou, X., Harel, J., & Koch, C. (2011). Image signature: Highlighting sparse salient regions. IEEE transactions on pattern analysis and machine intelligence, 34(1), 194-201.   DOI
13 Itti L. Koch C. Niebur E. (1998). A model for saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20, 1254-1259.   DOI
14 Parkhurst D. Law K. Niebur E. (2002). Modeling the role of salience in the allocation of overt visual attention. Vision Research, 42, 107-123.   DOI
15 Hou, X., & Zhang, L. (2007, June). Saliency detection: A spectral residual approach. In 2007 IEEE Conference on Computer Vision and Pattern Recognition (pp. 1-8). Ieee.
16 Schauerte, B., & Stiefelhagen, R. (2012, October). Quaternion-based spectral saliency detection for eye fixation prediction. In European Conference on Computer Vision (pp. 116-129). Springer, Berlin, Heidelberg.
17 Li, J., Levine, M. D., An, X., Xu, X., & He, H. (2012). Visual saliency based on scale-space analysis in the frequency domain. IEEE transactions on pattern analysis and machine intelligence, 35(4), 996-1010.   DOI
18 Huang, X., Shen, C., Boix, X., & Zhao, Q. (2015). Salicon: Reducing the semantic gap in saliency prediction by adapting deep neural networks. In Proceedings of the IEEE International Conference on Computer Vision (pp. 262-270).
19 Itti, L., & Koch, C. (2000). A saliency-based search mechanism for overt and covert shifts of visual attention. Vision research, 40(10-12), 1489-1506.   DOI
20 Kruthiventi, S. S., Ayush, K., & Babu, R. V. (2017). Deepfix: A fully convolutional neural network for predicting human eye fixations. IEEE Transactions on Image Processing, 26(9), 4446-4456.   DOI
21 Wang, L., Qiao, Y., & Tang, X. (2015). Action recognition with trajectory-pooled deep-convolutional descriptors. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4305-4314).
22 Pan, J., Sayrol, E., Giro-i-Nieto, X., McGuinness, K., & O'Connor, N. E. (2016). Shallow and deep convolutional networks for saliency prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 598-606).
23 Wang, L., Wang, L., Lu, H., Zhang, P., & Ruan, X. (2016, October). Saliency detection with recurrent fully convolutional networks. In European conference on computer vision (pp. 825-841). Springer, Cham.
24 Cornia, M., Baraldi, L., Serra, G., & Cucchiara, R. (2018). Predicting human eye fixations via an lstm-based saliency attentive model. IEEE Transactions on Image Processing, 27(10), 5142-5154.   DOI
25 Simonyan, K., & Zisserman, A. (2014). Two-stream convolutional networks for action recognition in videos. In Advances in neural information processing systems (pp. 568-576).
26 Du, Y., Wang, W., & Wang, L. (2015). Hierarchical recurrent neural network for skeleton based action recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1110-1118).
27 Su, Y. C., & Grauman, K. (2017). Learning spherical convolution for fast features from 360 imagery. In Advances in Neural Information Processing Systems (pp. 529-539).
28 Li, C., Wang, P., Wang, S., Hou, Y., & Li, W. (2017, July). Skeleton-based action recognition using LSTM and CNN. In 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW) (pp. 585-590). IEEE.
29 Soo-Yeun S., Joo-Heon C. Human Action Recognition System Using Multi-Mode Sensor and LSTM-based Deep Learning. Transactions of the Korean Society of Mechanical Engineers A, 42(2), pp.111-121. 2018.   DOI
30 Janghak C., Jeongmin S., Sang-il C. "Analysis of Action Recognition Performance According to Depth of Deep Neural Network." Korean Institute of Information Scientists and Engineers (KIISE), pp.1827-1829. 2018.
31 Su, Y. C., & Grauman, K. (2019). Kernel transformer networks for compact spherical convolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 9442-9451).
32 Redmon, J., & Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767.
33 Cornia, M., Baraldi, L., Serra, G., & Cucchiara, R. (2018). SAM: Pushing the Limits of Saliency Prediction Models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 1890-1892).
34 Soomro, K., Zamir, A. R., & Shah, M. (2012). UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402.
35 Bregonzio, M., Li, J., Gong, S., & Xiang, T. (2010, September). Discriminative Topics Modelling for Action Feature Selection and Recognition. In BMVC (pp. 1-11).
36 Sang-Jo K., Shao-Heng K., Eui-Young C. "Improved the action recognition performance of hierarchical RNNs through reinforcement learning." Korea Society of Computer Information. 26(2), pp. 360-363. 2018.
37 Rouast, P. V., & Adam, M. T. (2019). Learning deep representations for video-based intake gesture detection. arXiv preprint arXiv: 1909.10695.   DOI
38 Jhuang, H., Gall, J., Zuffi, S., Schmid, C., & Black, M. J. (2013). Towards understanding action recognition. In Proceedings of the IEEE international conference on computer vision (pp. 3192-3199).
39 Arseneau, S., & Cooperstock, J. R. (1999, August). Real-time image segmentation for action recognition. In 1999 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM 1999). Conference Proceedings (Cat. No. 99CH36368) (pp. 86-89). IEEE.
40 Niu, F., & Abdel-Mottaleb, M. (2004, December). View-invariant human activity recognition based on shape and motion features. In IEEE Sixth International Symposium on Multimedia Software Engineering (pp. 546-556). IEEE.
41 Sudhakaran, S., Escalera, S., & Lanz, O. (2019). Lsta: Long short-term attention for egocentric action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 9954-9963).
42 Sharma, S., Kiros, R., & Salakhutdinov, R. (2015). Action recognition using visual attention. arXiv preprint arXiv:1511.04119.
43 Berlin, S. J., & John, M. (2016, October). Human interaction recognition through deep learning network. In 2016 IEEE International Carnahan Conference on Security Technology (ICCST) (pp. 1-4). IEEE.
44 Sydorov, V., Alahari, K., & Schmid, C. (2019, September). Focused Attention for Action Recognition.