References
- A. Agrawal, J. and Lu, S. Antol, et al., "VQA: Visual Question Answering," in Proceedings of the International Conference on Computer Vision(ICCV), pp.2425-2433, 2015.
- A. Das, S. Kottur, and K. Gupta, et al., "Visual Dialog," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
- A. Das, S. Kottur, and K. Gupta, et al., "Embodied Question Answering," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vol.5. 2018.
- D. Gordon, A. Kembhavi, and M. Rastegari, et al., "IQA: Visual Question Answering in Interactive Environments," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
- P. Anderson, Q. Wu, and D. Teney, et al., "Vision-and-Language Navigation: Interpreting Visually-grounded Navigation Instructions in Real Environments," in Proceedings of the Conference on Computer Vision and Pattern Recognition(CVPR), 2018.
- A. Chang, A. Dai, and T. Funkhouser, et al., "Matterport3D: Learning from RGB-D Data in Indoor Environments," in Proceedings of the International Conference on 3D Vision, Vol.5, 2017.
- X. Wang, W. Xiong, and H. Wang, et al., "Look Before You Leap: Bridging Model-Free and Model-Based Reinforcement Learning for Planned-Ahead Vision-and-Language Navigation," in Proceedings of the European Conference on Computer Vision(ECCV), pp.696-711, 2018.
- X. Wang, Q. Huang, and A. Celikyilmaz, et al., "Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation," in Proceedings of the Conference on Computer Vision and Pattern Recognition(CVPR), 2019.
- D. Fried, R. Hu, and A. Rohrbach, et al., "Speaker-Follower Models for Vision-and-Language Navigation," in Proceedings of the Conference on Neural Information Processing Systems(NIPS), Vol.28, 2018.
- C. Ma, J. Lu, Z. and Z. wu, et al., "Self-Monitoring Navigation Agent via Auxiliary Progress Estimation," in Proceedings of the International Conference on Learning Representations (ICLR), 2019.
- C. Ma, Z. Wu, and G. Alregib, et al., "The Regretful Agent: Heuristic-Aided Navigation through Progress Estimation," in Proceedings of the Conference on Computer Vision and Pattern Recognition(CVPR), 2019.
- L. Ke, X. Li, and Y. Bisk, et al., "Tactical Rewind: Self-Correction via Backtracking in Vision-and-Language Navigation," in Proceedings of the Conference on Computer Vision and Pattern Recognition(CVPR), 2019.
- K. Wang, X. Long, and R. Li, et al., "A Discriminative Algorithm for Indoor Place Recognition based on Clustering of Features and Images," International Journal of Automation and Computing, Vol.14, pp.407-419, 2017. https://doi.org/10.1007/s11633-017-1081-z
- A. Hanni, S. Chickerur, and I. Bidari, "Deep learning Framework for Scene based Indoor Location Recognition," in Proceedings of the International Conference on Technological Advancements in Power and Energy (TAP Energy), IEEE, 2017.
- B. Zhou, A. Lapedriza and A. Khosla, et al., "Places: A 10 million Image Database for Scene Recognition," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.40, pp.1452-1464, 2017. https://doi.org/10.1109/tpami.2017.2723009
- C. Szegedy, W. Liu, and Y. Jia, et al., "Going Deeper with Convolutions," in Proceedings. of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp.1-9, 2015.
- K. Simonyan, and A. Zisserman, "Very Deep Convolutional Networks for Large-Scale Image Recognition," in Proceedings of the International Conference on Learning Representations(ICLR), 2015.
- K. He, X. Zhang, and S. Ren, et al., "Deep Residual Learning for Image Recognition," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.770-778, 2016.
- J. Deng, W. Dong, and R. Socher, et al, "ImageNet:A Large-Scale Hierarchical Image Database," in Proceedings of the Conference on Neural Information Processing Systems(NIPS), 2009.
- N. Silberman, D. Hoiem, and P. Kohli, et al., "Indoor Segmentation and Support Inference from RGBD Images," in Proceedings of the European Conference on Computer Vision(ECCV), pp.746-760, 2012.
- R. Grishick, J. Donahue, and T. Darrell, et al., "Rich Feature Hierarchies for Accurate Oobject Detection and Semantic Segmentation," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 2014.
- R. Girshick, "Fast R-CNN," in Proceedings of the IEEE International Conference on Computer Vision(ICCV), 2015.
- S. Ren, K. He, and R. Girshick, et al., "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," in Proceedings of the Conference on Neural Information Processing Systems(NIPS), 2015.
- K. He, G. Gkioxari, and P. Dollar, "Mask R-CNN," in Proceedings of the IEEE International Conference on Computer Vision(ICCV), 2017.
- J. Redmon, S. Divvala, and R. Girshick, et al., "You Only Look Once: Unified, Real-Time Object Detection," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 2016.
- W. Liu, D. Anguelov, and D. Erhan, et al., "Ssd: Single Shot Multibox Detector," in Proceedings of European Conference on Ccomputer Vision(ECCV), pp.21-37, Springer, Cham. 2016.
- J. Redmon, and A. Farhadi, "YOLO9000: Better, Faster, Stronger," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2017.
- J. Redmon, and A. Farhadi, "YOLOv3: An Incremental Improvement," arXiv preprint arXiv:1804.02767, 2018.
- T.-Y. Lin, M. Maire, and S. Belongie, et al., "Microsoft COCO: Common Objects in Context," in Proceedings of the European Conference on Computer Vision(ECCV). vol 13, pp.740-755, 2014.