Browse > Article
http://dx.doi.org/10.3745/KTSDE.2020.9.3.91

KG_VCR: A Visual Commonsense Reasoning Model Using Knowledge Graph  

Lee, JaeYun (경기대학교 컴퓨터과학과)
Kim, Incheol (경기대학교 컴퓨터과학과)
Publication Information
KIPS Transactions on Software and Data Engineering / v.9, no.3, 2020 , pp. 91-100 More about this Journal
Abstract
Unlike the existing Visual Question Answering(VQA) problems, the new Visual Commonsense Reasoning(VCR) problems require deep common sense reasoning for answering questions: recognizing specific relationship between two objects in the image, presenting the rationale of the answer. In this paper, we propose a novel deep neural network model, KG_VCR, for VCR problems. In addition to make use of visual relations and contextual information between objects extracted from input data (images, natural language questions, and response lists), the KG_VCR also utilizes commonsense knowledge embedding extracted from an external knowledge base called ConceptNet. Specifically the proposed model employs a Graph Convolutional Neural Network(GCN) module to obtain commonsense knowledge embedding from the retrieved ConceptNet knowledge graph. By conducting a series of experiments with the VCR benchmark dataset, we show that the proposed KG_VCR model outperforms both the state of the art(SOTA) VQA model and the R2C VCR model.
Keywords
Visual Commonsense Reasoning; Deep Neural Network; Graph Convolutional Network; Knowledge Graph Embedding;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Y. Cao, M, Fang and D. Tao, et al., "BAG: Bi-directional Attention Entity Graph Convolutional Network for Multi-hop Reasoning Question Answering," arXiv preprint arXiv:1904.04969, 2019.
2 J. Devlin, M, Chang and K. Lee, et al., "BBert: Pre-training of Deep Bidirectional Transformers for Language Understanding," arXiv preprint arXiv:1810.04805, 2018.
3 J. Zhou, G. Cui, and Z. Zhang, et al., "Graph Neural Network: A Review of Methods and Applications," arXiv preprint arXiv preprint arXiv:1812.08434, 2018.
4 S. Antol, A. Agrawal, and J. Lu, et al., "VQA: Visual Question Answering," in Proceedings of the International Conference on Computer Vision (ICCV), pp.2425-2433, 2015.
5 R. Zellers, Y. Bisk, and A. Farhadi, et al., "From Recognition to Cognition: Visual Commonsense Reasoning," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.6720-6731, 2019.
6 P. Anderson, X. He, and C. Buehler, et al., "Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.6077-6086, 2018.
7 P. Wang, Q. Wu, and C. Shen, et al., "FVQA: Fact-based Visual Question Answering," in Proceedings of the IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), Vol.40, pp.2413-2427, 2017.
8 S. Shah, A. Mishra, and N. Yadati, et al., "KVQA: Knowledge-aware Visual Question Answering," in Proceedings of Association for the Advancement of Artificial Intelligence (AAAI), 2019.
9 M. Narasimhan, S. Lazebnik, and A. G.Schwing, "Out of the Box: Reasoning with Graph Convolution Nets for Factual Visual Question Answering," in Proceedings of the Conference on Neural Information Processing Systems (NIPS), pp.2654-2665, 2018.
10 Z. Yang, X. He, J. Gao, L. Deng, and A. Smola, "Stacked Attention Networks for Image Question Answering," in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.21-29, 2016.
11 J. Lu, J. Yang, and D. Batra, et al., "Hierarchical Question-Image Co-Attention for Visual Question Answering," in Proceedings of the Conference on Neural Information Processing Systems (NIPS), pp.289-297, 2016.
12 M. Lao, Y. Guo, H. Wang, and X. Zhang, "Cross-Modal Multistep Fusion Network With Co-Attention for Visual Question Answering," in Proceedings of IEEE Access, Vol.6, pp.31516-41524, June. 2018.   DOI
13 C. Yang, M. Jiang, B. Jiang, W. Zhou, and K. Li, "Co-Attention Network with Question Type for Visual Question Answering," in Proceedings of IEEE Access, Vol.7, pp.40771-40781, Mar. 2019.   DOI
14 P. Wang, Q. Wu, and C. Shen, et al., "Explicit Knowledge-based Reasoning for Visual Question Answering," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
15 A. Soren, C. Bizer, and G. Kovilarov, et al., "DBpedia: A Nucleus for a Web of Open Data," in Proceedings of The semantic web. Springer, Berlin, Heidelberg, 2007.
16 K. Bollacker, C. Evans, and P. Paritosh, et al., "Freebase: A Collaboratively Created Graph Database for Structing Human Knowledge," in Proceedings of ACM SIGMOD International Conference on Management of Data, pp.1247-1250, 2008.
17 L. Hugo, and S. Singh, "ConceptNet-A Practical Commonsense Reasoning Tool-kit," British Telecommunications (BT) Technology Journal, Vol.22, pp.211-226, 2004.
18 K. Marino, M. Rastegari, and A. Farhadi, et al., "OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.3195-3204, 2019.
19 Y. LeCun, B. Boser, and J. Denker, et al., "Backpropagation Applied to Handwritten Zip Code Recognition," Neural Computation, Vol.1, Issue 4, pp.541-551, 1989.   DOI
20 S. Hochreiter, and J. Schmidhuber, "Long Short-Term Memory," Neural Computation, Vol.9, Issue 8, pp.1735-1780, 1997.   DOI
21 T. N, and M. Welling, "Semi-Superviced Classification with Graph Convolutional Networks," in Proceedings of the International Conference on Learning Representations (ICLR), 2017.
22 J. Yang, J, Lu and S. Lee, et al., "Graph R-CNN for Scene Graph Generation," in Proceedings of the European Conference on Computer Vision (ECCV), pp.670-685, 2018.