DOI QR코드

DOI QR Code

A Proposal of Shuffle Graph Convolutional Network for Skeleton-based Action Recognition

  • Received : 2021.08.13
  • Accepted : 2021.08.20
  • Published : 2021.08.30

Abstract

Skeleton-based action recognition has attracted considerable attention in human action recognition. Recent methods for skeleton-based action recognition employ spatiotemporal graph convolutional networks (GCNs) and have remarkable performance. However, most of them have heavy computational complexity for robust action recognition. To solve this problem, we propose a shuffle graph convolutional network (SGCN) which is a lightweight graph convolutional network using pointwise group convolution rather than pointwise convolution to reduce computational cost. Our SGCN is composed of spatial and temporal GCN. The spatial shuffle GCN contains pointwise group convolution and part shuffle module which enhances local and global information between correlated joints. In addition, the temporal shuffle GCN contains depthwise convolution to maintain a large receptive field. Our model achieves comparable performance with lowest computational cost and exceeds the performance of baseline at 0.3% and 1.2% on NTU RGB+D and NTU RGB+D 120 datasets, respectively.

Keywords

References

  1. Utkarsh Gaur, Yingying Zhu, Bi Song, and A Roy-Chowdhury,"A "string of feature graphs" model for recognition of complex activities in natural videos," in 2011 International Conference on Computer Vision. IEEE, 2011, pp. 2595-2602.
  2. Zoran Duric, Wayne D Gray, Ric Heishman, Fayin Li, Azriel Rosenfeld, Michael J Schoelles, Christian Schunn, and Harry Wechsler, "Integrating perceptual and cognitive modeling for adaptive and intelligent human-computer interaction," Proceedings of the IEEE, vol. 90, no. 7, pp. 1272-1289, 2002. https://doi.org/10.1109/JPROC.2002.801449
  3. Sijie Yan, Yuanjun Xiong, and Dahua Lin, "Spatial temporal graph convolutional networks for skeleton-based action recognition," in Proceedings of the AAAI Conference on Artificial Intelligence, 2018, vol. 32.
  4. Lei Shi, Yifan Zhang, Jian Cheng, and Hanqing Lu, "Twostream adaptive graph convolutional networks for skeletonbased action recognition," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 12026-12035.
  5. Chenyang Si, Wentao Chen, Wei Wang, Liang Wang, and Tieniu Tan, "An attention enhanced graph convolutional lstm network for skeleton-based action recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2019, pp. 1227-1236.
  6. Lei Shi, Yifan Zhang, Jian Cheng, and Hanqing Lu, "Skeletonbased action recognition with directed graph neural networks," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 7912-7921.
  7. Maosen Li, Siheng Chen, Xu Chen, Ya Zhang, Yanfeng Wang, and Qi Tian, "Actional-structural graph convolutional networks for skeleton-based action recognition," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 3595-3603.
  8. Ke Cheng, Yifan Zhang, Xiangyu He, Weihan Chen, Jian Cheng, and Hanqing Lu, "Skeleton-based action recognition with shift graph convolutional network," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 183-192.
  9. Raviteja Vemulapalli, Felipe Arrate, and Rama Chellappa, "Human action recognition by representing 3d skeletons as points in a lie group," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 588-595.
  10. Amir Shahroudy, Jun Liu, Tian-Tsong Ng, and Gang Wang, "Ntu rgb+ d: A large scale dataset for 3d human activity analysis," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 1010-1019.
  11. Jun Liu, Amir Shahroudy, Dong Xu, and Gang Wang, "Spatiotemporal lstm with trust gates for 3d human action recognition," in European conference on computer vision. Springer, 2016, pp. 816-833.
  12. Sijie Song, Cuiling Lan, Junliang Xing, Wenjun Zeng, and Jiaying Liu, "An end-to-end spatio-temporal attention model for human action recognition from skeleton data," in Proceedings of the AAAI conference on artificial intelligence, 2017, vol. 31.
  13. Shuai Li, Wanqing Li, Chris Cook, Ce Zhu, and Yanbo Gao, "Independently recurrent neural network (indrnn): Building a longer and deeper rnn," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 5457-5466.
  14. Hong Liu, Juanhui Tu, and Mengyuan Liu, "Two-stream 3d convolutional neural network for skeleton-based action recognition," arXiv preprint arXiv: 1705. 08106, 2017.
  15. Tae Soo Kim and Austin Reiter, "Interpretable 3d human action analysis with temporal convolutional networks," in 2017 IEEE conference on computer vision and pattern recognition workshops (CVPRW). IEEE, 2017, pp. 1623-1631.
  16. Chao Li, Qiaoyong Zhong, Di Xie, and Shiliang Pu, "Skeletonbased action recognition with convolutional neural networks," in 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW). IEEE, 2017, pp. 597-600.
  17. Chao Li, Qiaoyong Zhong, Di Xie, and Shiliang Pu, "Cooccurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation," arXiv preprint arXiv: 1804. 06055, 2018.
  18. Sangwoo Cho, Muhammad Maqbool, Fei Liu, and Hassan Foroosh, "Self-attention network for skeleton-based human action recognition," in The IEEE Winter Conference on Applications of Computer Vision, 2020, pp. 635-644.
  19. Ziyu Liu, Hongwen Zhang, Zhenghao Chen, Zhiyong Wang, and Wanli Ouyang, "Disentangling and unifying graph convolutions for skeleton-based action recognition," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 143-152.
  20. Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun, "Shufflenet: An extremely efficient convolutional neural network for mobile devices," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 6848-6856.
  21. Jun Liu, Gang Wang, Ling-Yu Duan, Kamila Abdiyeva, and Alex C Kot, "Skeleton-based human action recognition with global context-aware attention lstm networks," IEEE Transactions on Image Processing, vol. 27, no. 4, pp. 1586-1599, 2017. https://doi.org/10.1109/TIP.2017.2785279
  22. Qiuhong Ke, Mohammed Bennamoun, Senjian An, Ferdous Sohel, and Farid Boussaid, "Learning clip representations for skeleton-based 3d action recognition," IEEE Transactions on Image Processing, vol. 27, no. 6, pp. 2842-2855, 2018. https://doi.org/10.1109/tip.2018.2812099
  23. Mengyuan Liu and Junsong Yuan, "Recognizing human actions as the evolution of pose estimation maps," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 1159-1168.
  24. Jun Liu, Gang Wang, Ping Hu, Ling-Yu Duan, and Alex C Kot, "Global context-aware attention lstm networks for 3d action recognition," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 1647- 1656.
  25. Jun Liu, Amir Shahroudy, Mauricio Lisboa Perez, Gang Wang, Ling-Yu Duan, and Alex Kot Chichung, "Ntu rgb+ d 120: A large-scale benchmark for 3d human activity understanding," IEEE transactions on pattern analysis and machine intelligence, 2019.