1 |
A. Gupta, A. Kembhavi, and L. S. Davis, Observing human-object interactions: Using spatial and functional compatibility for recognition, IEEE Trans. Pattern Anal. Mach. Intell. 31 (2009), no. 10, 1775-1789.
DOI
|
2 |
V. Delaitre, I. Laptev, and J. Sivic, Recognizing human actions in still images: a study of bag-of-features and part-based representations, in Proc. BMVC 2010-21st British Mach. Vision Conf., 2010, pp. 97:1-11.
|
3 |
B. Yao and L. Fei-Fei, Modeling mutual context of object and human pose in human-object interaction activities, in Proc. IEEE Comput. Soc. Conf. Comput. Vision Pattern Recogn., San Francisco, CA, USA, June 2010, pp.17-24.
|
4 |
J. W. Choi, D. Moon, and J. H. Yoo, Robust multi-person tracking for real-time intelligent video surveillance, ETRI J. 37 (2015), no. 3, 551-561.
DOI
|
5 |
C. Y. Chen and K. Grauman, Predicting the location of interactees in novel human-object interactions, Asian conference on computer vision, Springer, Cham, Switzerland, 2014, pp. 351-367.
|
6 |
S. Gupta and J. Malik, Visual semantic role labeling, arXiv preprint arXiv:1505.04474, 2015.
|
7 |
L. Wang and D. Sng, Deep learning algorithms with applications to video analytics for a smart city: a survey, arXiv preprint arXiv:1512.03131, 2015.
|
8 |
J. Moon et al., Extensible hierarchical method of detecting interactive actions for video understanding, ETRI J. 39 (2017), no. 4, 502-513.
DOI
|
9 |
K. Yun et al., Vision-based garbage dumping action detection for real-world surveillance platform, ETRI J. 41 (2019), no. 4, 494-505.
DOI
|
10 |
Y. Licheng et al., Visual madlibs: fill in the blank image generation and question answering, arXiv preprint arXiv:1506.00278, 2015.
|
11 |
G. Gkioxari et al., Detecting and recognizing human-object interactions, in Proc. IEEE Conf. Comput. Vision Pattern Recogn., Salt Lake City, UT, USA, June 2018, pp. 8359-8367.
|
12 |
Y. W. Chao et al., Learning to detect human-object interactions, in Proc. IEEE Winter Conf. Applicat. Comput. Vision, Lake Tahoe, NV, USA, Mar. 2018, pp. 381-389.
|
13 |
L. Shen et al., Scaling human-object interaction recognition through zero-shot learning, in Proc. IEEE Winter Conf. Applicat. Comput. Vision, Lake Tahoe, NV, USA, Mar. 2018, pp. 1568-1576.
|
14 |
L. Cewu et al., Visual relationship detection with language priors, European Conference on Computer Vision, Springer, Cham, Switzerland, 2016, pp. 852-869.
|
15 |
C. Gao, Y. Zou, and J. B. Huang, iCAN: Instance-centric attention network for human-object interaction detection, British Machine Vision Conference, 2018.
|
16 |
C. Peng et al., Large kernel matters-improve semantic segmentation by global convolutional network, in Proc. IEEE Conf. Comput. Vision Pattern Recogn., Honolulu, HI, USA, 2017, pp. 4353-4361.
|
17 |
M. A. Sadeghi and A. Farhadi, Recognition using visual phrases, in Proc. IEEE Conf. Comput. Vision Pattern Recogn., Providence, RI, USA, 2011, pp. 1745-1752.
|
18 |
M. Yatskar, L. Zettlemoyer, and A. Farhadi, Situation recognition: Visual semantic role labeling for image understanding, in Proc. IEEE Conf. Comput. Vision Pattern Recogn., Las Vegas, NV, USA, 2016, pp. 5534-5542.
|
19 |
B. Dai, Y. Zhang, and D. Lin, Detecting visual relationships with deep relational networks, in Proc. IEEE Conf. Comput. Vision Pattern Recogn., Honolulu, HI, USA, 2017, pp. 3076-3086.
|
20 |
H. Zhang et al., Visual translation embedding network for visual relation detection, in Proc. IEEE Conf. Comput. Vision Pattern Recogn., Honolulu, HI, USA, 2017, pp. 5532-5540.
|
21 |
H. Ronghang et al., Modeling relationships in referential expressions with compositional modular networks, in Proc. IEEE Conf. Comput. Vision Pattern Recogn., Honolulu, HI, USA, 2017, pp. 1115-1124.
|
22 |
J. Peyre et al., Weakly-supervised learning of visual relations, in Proc. IEEE Int. Conf. Comput. Vision, Venice, Italy, 2017, pp. 5179-5188.
|
23 |
A. Kolesnikov, C. H. Lampert, and V. Ferrari. Detecting visual relationships using box attention, arXiv preprint arXiv:1807.02136, 2018.
|
24 |
K. He et al., Deep residual learning for image recognition, in Proc. IEEE Conf. Comput. Vision Pattern Recogn., Las Vegas, NV, USA, June 2016, pp. 770-778.
|
25 |
M. Mostajabi, P. Yadollahpour, and G. Shakhnarovich, Feedforward semantic segmentation with zoom-out features, in Proc. IEEE Conf. Comput. Vision Pattern Recogn., Boston, MA, USA, 2015, pp. 3376-3385.
|
26 |
W. Liu, A. Rabinovich, and A. C. Berg, Parsenet: Looking wider to see better, arXiv preprint arXiv:1506.04579, 2015.
|
27 |
F. Yu and V. Koltun, Multi-scale context aggregation by dilated convolutions, arXiv preprint arXiv:1511.07122, 2015.
|
28 |
R. Girshick et al., Detectron, https://github.com/facebookresearch/detectron, 2018.
|
29 |
T. Y. Lin et al., Microsoft COCO: Common objects in context, in Proc. Computer Vision-ECCV, Zurich, Switzerland, Sept. 2014, pp. 740-755.
|
30 |
Y. W. Chao et al., HICO: A benchmark for recognizing human-object interactions in images, in Proc. IEEE Int. Conf. Comput. Vision, Santiago, Chile, 2015, pp. 1017-1025.
|
31 |
T. Y. Lin et al., Feature pyramid networks for object detection, in Proc. IEEE Conf. Comput. Vision Pattern Recogn., Honolulu, HI, USA, July 2017, pp. 2117-2125.
|
32 |
S. Qi et al., Learning human-object interactions by graph parsing neural networks, in Proc. Eur. Conf. Comput. Vision (ECCV), 2018, pp. 401-417.
|
33 |
X. Bingjie et al., Interact as you intend: Intention-driven human- object interaction detection, CoRR abs/1808.09796, 2018.
|