Browse > Article
http://dx.doi.org/10.3745/KTSDE.2019.8.5.205

Deep Neural Network-Based Scene Graph Generation for 3D Simulated Indoor Environments  

Shin, Donghyeop (경기대학교 컴퓨터과학과)
Kim, Incheol (경기대학교 컴퓨터과학과)
Publication Information
KIPS Transactions on Software and Data Engineering / v.8, no.5, 2019 , pp. 205-212 More about this Journal
Abstract
Scene graph is a kind of knowledge graph that represents both objects and their relationships found in a image. This paper proposes a 3D scene graph generation model for three-dimensional indoor environments. An 3D scene graph includes not only object types, their positions and attributes, but also three-dimensional spatial relationships between them, An 3D scene graph can be viewed as a prior knowledge base describing the given environment within that the agent will be deployed later. Therefore, 3D scene graphs can be used in many useful applications, such as visual question answering (VQA) and service robots. This proposed 3D scene graph generation model consists of four sub-networks: object detection network (ObjNet), attribute prediction network (AttNet), transfer network (TransNet), relationship prediction network (RelNet). Conducting several experiments with 3D simulated indoor environments provided by AI2-THOR, we confirmed that the proposed model shows high performance.
Keywords
Scene Graph; 3D Indoor Environment; Deep Neural Network; AI2-THOR;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Y. Guo, Y. Liu, and A. Oerlemans et al., "Deep Learning for Visual Understanding: A Review," Neurocomputing, Vol. 187, pp. 27-48, 2016.   DOI
2 S. Aditya, Y. Yang, and C. Baral et al., "Image Understanding using Vision and Reasoning through Scene Description Graph," Computer Vision and Image Understanding, In Press, Available online 18 December, 2017.
3 E. Kolve, R. Mottaghi, and D. Gordon et al., "AI2-THOR: An Interactive 3d Environment for Visual AI," arXiv preprint arXiv:1712.05474, 2017.
4 D. Xu, Y. Zhu, and C. B. Choy et al., "Scene Graph Generation by Iterative Message Passing," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5410-5419, 2017.
5 Y. Li, W. Ouyang, and B. Zhou et al., "Scene Graph Generation from Objects, Phrases and Region Captions," Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 1261-1270, 2017.
6 S. Ren, K. He, and R. Girshick et al., "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," Proceedings of the Neural Information Processing Systems (NIPS), pp. 91-99, 2015.
7 C. Lu, R. Krishna, and M. Bernstein et al., "Visual Relationship Detection with Language Priors," Proceedings of the European Conference on Computer Vision(ECCV), pp. 852-869, 2016.
8 B. Dai, Y. Zhang, and D. Lin, "Detecting Visual Relationships with Deep Relational Networks," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3298-3308. 2017.
9 P. Gay, J. Stuart, and A. D. Bue, "Visual Graphs from Motion (VGfM): Scene understanding with Object Geometry Reasoning," arXiv preprint arXiv:1807.05933, 2018.
10 S. Song and J. Xiao, "Deep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp. 808-816. 2016.
11 A. Dai, A. X. Chang, and M. Savva et al., "ScanNet: Richlyannotated 3D Reconstructions of Indoor Scenes," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp. 5828-5839. 2018.
12 D. Goron, A. Kembhavi, and M. Rastegari et al., "IQA: Visual Question Answering in Interactive Environments," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp. 4089-4098, 2018.
13 J. Redmon and A. Farhadi, "YOLOv3: An Incremental Improvement," arXiv preprint arXiv:1804.02767, 2018.