References
- K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, 2014, CoRR abs/1409.1556.
- H. Kaiming et al., Deep residual learning for image recognition, in Proc. IEEE Conf. Comput. Vision Pattern Recogn., Las Vegas, NV, USA, June 2016, pp. 770-778.
- G. Ross et al., Region-based convolutional networks for accurate object detection and segmentation, IEEE Trans. Pattern Anal. Mach. Intell. 38 (2016), no. 1, 142-158. https://doi.org/10.1109/TPAMI.2015.2437384
- R. Girshick, Fast R-CNN, in Proc. IEEE Int. Conf. Comput. Vision, Santiago, Chile, Dec. 2015, pp. 1440-1448.
- S. Ren et al., Faster R-CNN: Towards real-time object detection with region proposal networks, Adv. Neural Inf. Process. Syst., Montreal, Canada, Dec. 2015, pp. 91-99.
- K. He et al., Mask R-CNN, in IEEE Int. Conf. Comput. Vision (ICCV), Venice, Italy, Oct. 2017, pp. 2980-2988.
- Z. Ren et al., Deep reinforcement learning-based image captioning with embedding reward, in Proc. IEEE Conf. Comput. Vision Pattern Recogn. (CVPR), Honolulu, HI, USA, July 2017, pp. 1151-1159.
- S. Li et al., Person search with natural language description, in Proc. IEEE Conf. Comput. Vision Pattern Recogn. (CVPR), Honolulu, HI, USA, July 2017, pp. 5187-5196.
- J. Johnson et al., Image retrieval using scene graphs, in Proc. IEEE Conf. Comput. Vision Pattern Recogn., Boston, MA, USA, June 2015, pp. 3668-3678.
- Y. Li et al., Scene graph generation from objects, phrases and region captions, in Proc. IEEE Conf. Comput. Vision Pattern Recogn., Venice, Italy, Oct. 2017, pp. 1261-1270.
- X. Danfei et al., Scene graph generation by iterative message passing, in Proc. IEEE Conf. Comput. Vision Pattern Recogn., Honolulu, HI, USA, July 2017, pp. 3097-3106.
- Y. Goyal et al., Making the V in VQA matter: Elevating the role of image understanding in visual question answering, in IEEE Conf. Comput. Vision Pattern Recogn. (CVPR), Honolulu, HI, USA, July 2017, pp. 6325-6334.
- S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural Comput 9 (1997), no. 8, 1735-1780. https://doi.org/10.1162/neco.1997.9.8.1735
- M. A. Sadeghi and A. Farhadi, Recognition using visual phrases, in Proc. IEEE Conf. Comput. Vision Pattern Recogn. (CVPR), Providence, RI, USA, June 2011, pp. 1745-1752.
- L. Cewu et al., Visual relationship detection with language priors, in Proc. Eur. Conf. Comput. Vision, Amsterdam, Netherlands, Oct. 2016, pp. 852-869.
- R. Krishna et al., Visual genome: Connecting language and vision using crowdsourced dense image annotations, Int. J. Comput. Vision 123 (2017), no. 1, 32-73. https://doi.org/10.1007/s11263-016-0981-7
- T. Mikolov et al., Efficient estimation of word representations in vector space, in Proc. Int. Conf. Learn. Representations (ICLR) Workshop, Scottsdale, AZ, USA, 2013, pp. 1-12.
- Y. Ruichi et al., Visual relationship detection with internal and external linguistic knowledge distillation, in Proc. IEEE Int. Conf. Comput. Vision (ICCV), Venice, Italy, Oct. 2017, pp. 1068-1076.
- Y. Zhu, S. Jiang, and X. Li, Visual relationship detection with object spatial distribution, in Proc. IEEE Int. Conf. Multimedia Expo (ICME), Venice, Italy, Oct. 2017, pp. 379-384.
- Y. W. Chao et al., Learning to detect human-object interactions, in Proc. IEEE Winter Conf. Applicat. Comput. Vision, Lake Tahoe, NV, USA, Mar. 2018, pp. 381-389.
- G. Gkioxari et al., Detecting and recognizing human-object interactions, in Proc. Conf. Vision Pattern Recong., Salt Lake City, UT, USA, June 2018, pp. 8359-8367.
- T. Y. Lin et al., Microsoft coco: Common objects in context, in Proc. Comput. Vision - ECCV, Zurich, Switzerland, Sept. 2014, pp. 740-755.
- B. Dai, Y. Zhang, and D. Lin, Detecting visual relationships with deep relational networks, in Proc. IEEE Conf. Comput. Vision Pattern Recogn. (CVPR), Honolulu, HI, USA, July 2017, pp. 3298-3308.
- Y. Li et al., ViP-CNN: Visual phrase guided convolutional neural network, in Proc. IEEE Conf. Comput. Vision Pattern Recogn. (CVPR), Honolulu, HI, USA, July 2017, pp. 7244-7253.
- H. Zhang et al., Visual translation embedding network for visual relation detection, in Proc. IEEE Conf. Comput. Vision Pattern Recogn. (CVPR), Honolulu, HI, USA, July 2017, pp. 3107-3115.
- B. A. Plummer et al., Phrase localization and visual relationship detection with comprehensive image-language cues, in Proc. IEEE Conf. Comput. Vision Pattern Recogn., Venice, Italy, Oct. 2017, pp. 1928-1937.
- X. Liang, L. Lee, and E. P. Xing, Deep variation-structured reinforcement learning for visual relationship and attribute detection, 2017, CoRR abs/1703.03054.
- X. Shang et al., Video visual relation detection, in Proc. ACM Int. Conf. Multimedia, Mountain View, CA, USA, Oct. 2017, pp. 1300-1308.
- X. Liang, L. Lee, and E. P. Xing, Deep variation-structured reinforcement learning for visual relationship and attribute detection, in Proc. IEEE Conf. Comput. Vision Pattern Recogn. (CVPR), Honolulu, HI, USA, July 2017, pp. 4408-4417.
Cited by
- Automated optimization for memory-efficient high-performance deep neural network accelerators vol.42, pp.4, 2020, https://doi.org/10.4218/etrij.2020-0125