Extensible Hierarchical Method of Detecting Interactive Actions for Video Understanding |
Moon, Jinyoung
(SW & Contents Research Laboratory, ETRI)
Jin, Junho (Hyper-connected Communication Research Laboratory, ETRI) Kwon, Yongjin (SW & Contents Research Laboratory, ETRI) Kang, Kyuchang (School of IT Information and Control Engineering, Kunsan National University) Park, Jongyoul (SW & Contents Research Laboratory, ETRI) Park, Kyoung (Memory System Research Lab., SK Hynix) |
1 | G. Lavee, E. Rivlin, and M. Rudzsky, "Understanding Video Events: A Survey of Methods for Automatic Interpretation of Semantic Occurrences in Video," IEEE Trans. Syst., Man, Cybern., Syst. -Part C, vol. 39, no. 5, Sept. 2009, pp. 489-504. |
2 | R. Poppe, "A Survey on Vision-Based Human Action Recognition," Image Vision Comput., vol. 28, no. 6, June 2010, pp. 976-990. DOI |
3 | J.K. Aggarwal and M.S. Ryoo, "Human Activity Analysis: A Review," ACM Comput. Surv., vol. 43, no. 3, Apr. 2011, pp. 1-43. |
4 | D. Weinland, R. Ronfard, and E. Boyer, "A Survey of Vision-based Methods for Action Representation, Segmentation, and Recognition," Comput. Vis. Image Understanding., Feb. 2011, pp. 224-241. |
5 | I. Laptev et al., "Learning Realistic Human Actions from Movies," IEEE Conf. Comput. Vis. Pattern Recogn., Anchorage, Alaska, June 23-28, 2008, pp. 1-8. |
6 | H. Wang and C. Schmid, "Action Recognition with Improved Trajectories," IEEE Int. Conf. Comput. Vision, Sydney, Australia, Dec. 1-8, 2013, pp. 3551-3558. |
7 | H. Wang and C. Schmid, "LEAR-INRIA Submission for the THUMOS Workshop," Int. Conf. Comput. Vision, Workshop Action Recogn. Large Number Classes, Sydney, Australia, Dec. 1-8, 2013. |
8 | X. Peng et al., "Bag of Visual Words and Fusion Methods for Action Recognition: Comprehensive Study and Good Practice," Comput. Vis. Image Underst., vol. 150, Sept. 2016, pp. 109-125. DOI |
9 | M. Baccouche et al., "Sequential Deep Learning for Human Action Recognition," Int. Workshop Human Behav. Underst., Amsterdam, Netherlands, Nov. 16, 2011, pp. 29- 39. |
10 | S. Ji et al., "3D Convolutional Neural Networks for Human Action Recognition," IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no.1, Jan. 2013, pp. 221-231. DOI |
11 | A. Karpathy et al., "Large-Scale Video Classification with Convolutional Neural Networks." IEEE Conf. Comput. Vis. Pattern Recogn., Columbus, USA, June 24-27, 2014, pp. 1725-1732. |
12 | K. Simonyan, "Two-Stream Convolutional Networks for Action Recognition in Videos," Int. Conf. Neural Inform. Process. Syst., Montreal, Canada, Dec. 8-13, 2014, pp. 568- 576. |
13 | J. Ng et al., "Beyond Short Snippets: Deep Networks for Video Classification," IEEE Conf. Comput. Vis. Pattern Recogn., Boston, USA, June 7-12, 2015, pp. 4694-4702. |
14 | J.M. Chaquet, E.J. Carmona, and A. Fernandez-Caballero, "A Survey of Video Dataset for Human Action and Activity Recognition," Comput. Vis. Image Understanding (CVIU), vol. 117, no. 6, June 2013, pp. 633-659. DOI |
15 | L. Wang, Y. Qiao, and X. Tang, "Action Recognition with Trajectory-Pooled Deep-Convolutional Descriptors," IEEE Conf. Comput. Vis. Pattern Recogn., Boston, USA, June 7- 12, 2015, pp. 4305-4314. |
16 | D. Tran et al., "Learning Spatiotemporal Features with 3D Convolutional Networks," IEEE Int. Conf. Compt. Vis., Santiago, Chile, Dec. 13-16, 2015, pp. 4489-4497. |
17 | C. Feichtenhofer, A. Pinz, and A. Zisserman, "Convolutional Two-Stream Network Fusion for Video Action Recognition," IEEE Conf. Comput. Vis. Pattern Recogn., Las Vegas, USA, June 26-July1, 2016, pp. 1933-1941. |
18 | Z. Shou, D. Wang, and S.-F. Chang, "Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs," IEEE Conf. Comput. Vis. Pattern Recogn., Las Vegas, USA, June 26-July 1, 2016, pp. 1049-1058. |
19 | S. Yeung, "Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos," Preprint, submitted July 31, 2015. http://arxiv.org/abs/1507.05738v2. |
20 | G. Gkioxari and J. Malik, "Finding Action Tubes," IEEE Conf. Comput. Vis. Pattern Recogn., Boston, USA, June 7-12, 2015, pp. 759-768. |
21 | P. Weinzaepfel et al., "Learning to Track for Spatio- Temporal Action Localization," IEEE Int. Conf. Compt. Vis., Santiago, Chile, Dec. 13-16, 2015, pp. 3164-3172. |
22 | J. Gall et al., "Hough Forests for Object Detection, Tracking, and Action Recognition," IEEE Trans. Pattern. Anal. Mach. Intell., vol. 33, no. 11, Nov. 2011, pp. 2188- 2202. DOI |
23 | S.-C. Cheng, K.-Y. Cheng, and Y.-P. Chen, "GHT-Based Associative Memory Learning and Its Application to Human Action Detection and Classification," Pattern Recogn., vol. 46, no. 11, Nov. 2013, pp. 3117-3128. DOI |
24 | J. Moon et al., "A Knowledge-Driven Approach to Interactive Event Recognition for Semantic Video Understanding," Int. Conf. IT Convergence Security, Prague, Czech Rep., Sept. 26-29, 2016, pp. 37-39. |
25 | S. Ma. et al., "Action Recognition and Localization by Hierarchical Space-time Segments," IEEE Int. Conf. Compt. Vis., Sydney, Australia, Dec. 3-6, 2013, pp. 2744- 2751. |
26 | T. Lan, Y. Wang, and G. Mori, "Discriminative Figurecentric Models for Joint Action Localization and Recognition," IEEE Int. Conf. Compt. Vis., Barcelona, Spain, Nov. 6-13, 2011, pp. 2003-2010. |
27 | Y.S. Sefidgar et al., "Discriminative Key-component Models for Interaction Detection and Recognition," Comput. Vis. Image Understanding., vol. 135, no. C, June 2015, pp. 16-30. DOI |
28 | C. Schuldt, I. Laptev, and B. Caputo, "Recognizing Human Actions: A Local SVM Approach," Int. Conf. Pattern Recogn., Cambridge, UK, Aug. 23-26, 2004, pp. 32-36. |
29 | M. Blank et al., "Actions as Space-Time Shapes," IEEE Int. Conf. Compt. Vis., Beijing, China, Oct. 17-21, 2005. pp. 1395-1402. |
30 | H. Kuehne et al., "HMDB: A Large Video Database for Human Motion Recognition," IEEE Int. Conf. Compt. Vis., Barcelona, Spain, Nov. 6-13, 2011, pp. 2556-2563. |
31 | K. Soomro, A.R. Zamir, and M. Shah, "UCF101: A Dataset of 101 Human Actions Classes from Videos in the Wild," Center for Research in Computer Vision, UCF, Orlando, Tech. Prep. CRCV-TR-12-01, Nov. 2012. |
32 | K. Soomro and A.R. Zamir, "Action Recognition in Realistic Sports Videos, Computer Vision in Sports," Advances in Comput. Vis. Pattern Recogn., Springer International Publishing, Jan. 2015, pp. 181-208. |
33 | H. Jhuang et al., "Towards Understanding Action Recognition," IEEE Int. Conf. Compt. Vis., Sydney, Australia, Dec. 3-6, 2013, pp. 3192-3199. |
34 | D. Riboni and C. Bettini, "OWL 2 Modeling and Reasoning with Complex Human Actions," Pervasive Mobile Comput., vol. 7, no. 3, 2011, pp. 379-395. DOI |
35 | Y.-G. Jiang et al., THUMOS Challenge 2014, Center for Research in Computer Vision, UCF, 2014, Accessed Aug. 8, 2016. http://crcv.ucf.edu/THUMOS14/ |
36 | A.B. James, "Activities of Daily Living and Instrumental Activities of Daily Living," in Willard and Spackman's Occupational Therapy, Philadelphia, USA: Wolters Kluwer Health/Lippincott Williams & Wilkins, 2014. |
37 | L. Chen, C.D. Nugent, and H. Wang, "A Knowledge- Driven Approach to Activity Recognition in Smart Homes," IEEE Trans. Knowl. Data Eng., vol. 24, no. 6, June 2012, pp. 961-974. DOI |
38 | I.H. Bae "An Ontology-Based Approach to ADL Recognition in Smart Homes," Future Gener. Comput. Syst. vol. 33, Apr. 2014, pp. 32-41. DOI |
39 | G. Okeyo, L. Chen, and H. Wnag, "Combining Ontological and Temporal Formalisms for Composite Activity Modelling and Recognition in Smart Homes," Future Gener. Comput. Syst., vol. 39, Oct. 2014, pp. 29-43. DOI |
40 | G. Meditskos, S. Dasiopoulou, and I. Kompatsiaris, "Meta- Q: A Knowledge-Driven Framework for Context-Aware Activity," Pervasive Mobile Comput., vol. 25, Jan. 2016, pp. 104-124. DOI |
41 | S. Oh et al., Instruction for VIRAT Video Dataset Release 2.0, KITWARE, Sept. 30, 2011, Accessed Aug. 8, 2016. https://data.kitware.com/#collection/56f56db28d777f75320 9ba9f/folder/56f581c78d777f753209c9c2 |
42 | "Recognition Combining SPARQL and OWL2 Activity Patterns," Pervasive Mob. Comput., vol. 25, Jan. 2016, pp. 104-124. DOI |
43 | G. Baryannis, P. Woznowski, and G. Antoniou, "Rule- Based Real-Time ADL Recognition in a Smart Home Environment," Rule Technol., Res., Tools, Applicat., Int. Web Rule Symp., June 28, 2016, pp. 325-340. |
44 | L. Ballan et al., "Video Annotation and Retrieval Using Ontologies and Rule Learning," IEEE Trans. Multimedia, vol. 17, no. 4, Oct. 2010, pp. 80-88. DOI |
45 | Y. Yildirim, A. Yazici, and T. Yilmaz, "Automatic Semantic Content Extraction in Videos Using a Fuzzy Ontology and Rule-Based Model," IEEE Trans. Knowl. Data Eng., vol. 25, no. 1, Jan. 2013, pp. 47-61. DOI |
46 | U. Akdemir, P. Turaga, and R. Chellappa, "An Ontology based Approach for Activity Recognition from Video," ACM Int. Conf. Multimedia, Vancouver, Canada, Oct. 27- Nov. 1, 2008, pp. 709-712. |
47 | M. Bertini, A.D. Bimbo, and G. Serra, "Learning Ontology Rules for Semantic Video Annotation," ACM Int. Conf. Multimed., Workshop Multimed. Semantics, Vancouver, Canada, Oct. 26-31, 2008 |
48 | L. Ballan et al., "Event Detection and Recognition for Semantic Annotation of Video," Multimed. Tools. Appl., vol. 51, no. 1, Jan. 2011, pp. 279-302. DOI |
49 | G. Antoniou and F.V. Harmelen, "Web Ontology Language: OWL," in Handbook on Ontologies, Heidelberg: Springer, 2004, pp. 67-92. |
50 | W3C Std., SWRL: A Semantic Web Rule Language Combining OWL and RuleML, May 2004. |
51 | J. Moon et al., "ActionNet-VE Dataset: A Dataset for Describing Visual Events by b ng VIRAT Ground 2.0," Conf. Sign. Pro. Image Proc. Pattern Recogn., Nov. 2015, pp. 1-4. |
52 | S. Oh et al., "A Large-Scale Benchmark Dataset for Event Recognition in Surveillance Video," IEEE Conf. Comput. Vis. Pattern Recogn., Colorado, USA, June 20-25, 2011, pp. 3153-3160. |
53 | X. Wang and Q. Ji, "Hierarchical Context Modeling for Video Event Recognition," IEEE Trans. Pattern Anal. Mach. Intell., Epub, Oct. 2016. |
54 | O. Russakovsky and J. Deng, ImageNet Large Scale Visual Recognition Challenge 2016 (ILSVRC2016), Accessed Feb. 1, 2017. http://image-net.org/challenges/LSVRC/2016/ |
![]() |