Acknowledgement
이 논문은 2020년도 정부(과학기술정보통신부)의 재원으로 정보통신기획평가원의 지원을 받아 수행된 연구임[No. B0101-15-0266, 실시간 대규모 영상 데이터 이해·예측을 위한 고성능 비주얼 디스커버리 플랫폼 개발과 No. 2020-0-00004, 장기 시각 메모리 네트워크기반의 예지형 시각지능 핵심기술 개발].
References
- K. Soomro et al., "UCF101: A dataset of 101 human actions classes from videos in the wild," CoRR, abs/1212.0402, 2012.
- X. Peng et al., "Bag of visual words and fusion methods for action recognition: Comprehensive study and good practice," CoRR, abs/1405.4506, 2014.
- K. Simonyan and A. Zisserman, "Two-stream convolutional networks for action recognition in videos," NIPS, 2014, pp. 568-576.
- J. Carreira and A. Zisserman, "Quo vadis, action recognition? a new model and the kinetics dataset," CVPR, 2017, pp. 4724-4733.
- S. Asghari-Esfeden et al., "Dynamic motion representation for human action recognition," WACV, 2020, pp. 557-566.
- L. Wang et al., "Action recognition and detection by combining motion and appearance features," ECCV THUMOS Workshop, 2014.
- D. Oneasta et al., "The LEAR submission at THUMOS 2014," ECCV THUMOS Workshop, 2014.
- S. Karaman et al., "Fast saliency-based pooling of fisher encoded dense trajectories," ECCV THUMOS Workshop, 2014.
- Y.-G. Jiang et al., "Challenge: Action recognition with a large number of classes," ECCV THUMOS Workshop, http://crcv.ucf.edu/THUMOS14/, 2014.
- A. Montes et al., "Temporal activity detection in untrimmed videos with recurrent neural networks," the 1st NIPS Workshop on Large Scale Computer Vision Systems, 2016.
- S. Ma et al., "Learning activity progression in LSTMs for activity detection and early detection," CVPR, 2016, pp. 1942-1950.
- B. Singh et al., "A multi-Stream bi-directional recurrent neural network for fine-grained action detection," CVPR, 2016, pp. 1961-1970.
- R. Girshick et al., "Rich feature hierarchies for accurate object detection and semantic segmentation," CVPR, 2014, pp. 580-587.
- R. Girshick, "Fast R-CNN," ICCV, 2015, pp. 1440-1448.
- S. Ren et al., "Faster R-CNN: Towards real-time object detection with region proposal networks," NIPS 2015.
- Z. Shou et al., "Temporal action localization in untrimmed videos via multi-stage CNNs," CVPR 2016, pp. 1049-1058.
- D. Tran et al., "Learning spatiotemporal features with 3D convolutional networks," ICCV, 2015, pp. 4489-4497.
- Y. Zhao et al., "Temporal action detection with structured segment networks," ICCV, 2017, pp. 2914-2923.
- K. He et al., "Spatial pyramid pooling in deep convolutional networks for visual recognition," ECCV, 2014, pp. 346-361.
- J. Gao et al., "Cascaded boundary regression for temporal action detection," BMVC, 2017.
- J. Gao et al., "TURN TAP: Temporal unit regression network for temporal action proposals," ICCV, 2017, pp. 3628-3636.
- H. Xu et al., "R-C3D: Region convolutional 3D network for temporal activity detection," ICCV, 2017, pp. 5783-5792.
- X. Dai et al., "Temporal context network for activity localization in videos," ICCV, 2017, pp. 5793-5802.
- J. Gao et al., "CTAP: complementary temporal action proposal generation," ECCV, 2018.
- T. Lin et al., "BSN: Boundary sensitive network for temporal action proposal generation," ECCV, 2018.
- Y. Liu et al., "Multi-granularity generator for temporal action proposal," CVPR, 2019, pp. 3604-3613.
- T. Lin et al., "BMN: Boundary-matching network for temporal action proposal generation," ICCV, 2019, pp. 3889-3898.
- H. Eun et al., "SRG: Snippet relatedness-based temporal action proposal generator," IEEE Trans. circuits and systems for video technology(TCSVT), Early Access, 2019.
- C. Lin et al., "Fast learning of temporal action proposal via dense boundary generator," AAAI, 2020.
- W. Liu et al., "SSD: Single shot multibox detector," ECCV, 2016.
- T. Lin et al., "Single shot temporal action detection," MM, 2017.
- D. Zhang et al., "S3D: single shot multi-span detector via fully 3D convolutional network," BMVC, 2018.
- T. N. Kipf and M. Welling, "Semi-supervised classification with graph convolutional networks," ICRL, 2017.
- S. Yan et al., "Spatial temporal graph convolutional networks for skeleton-based action recognition," AAAI, 2018.
- X. Wang and A. Gupta, "Videos as space-time region graphs," ECCV, 2018.
- R. Zeng et al., "Graph Covolutional networks for temporal action localization," ICCV 2019, pp. 7094-7103.
- C. Zhai et al., "Action co-lociazation in an untrimmed video by graph neural networks," MMM, 2020.
- http://activity-net.org/challenges/2019/challenge.html
- F.C. Heilbron et al., "ActivityNet: A large-scale video benchmark for human activity understanding," CVPR, 2015.
- L. Wang et al., "Untrimmednets for weakly supervised action recognition and detection," CVPR, 2017, pp. 4325-4334.