[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.3745/KTSDE.2020.9.9.281

Hybrid Learning for Vision-and-Language Navigation Agents

Oh, Suntaek (경기대학교 컴퓨터과학과)
Kim, Incheol (경기대학교 컴퓨터과학과)

Publication Information

KIPS Transactions on Software and Data Engineering / v.9, no.9, 2020 , pp. 281-290 More about this Journal

Abstract

The Vision-and-Language Navigation(VLN) task is a complex intelligence problem that requires both visual and language comprehension skills. In this paper, we propose a new learning model for visual-language navigation agents. The model adopts a hybrid learning that combines imitation learning based on demo data and reinforcement learning based on action reward. Therefore, this model can meet both problems of imitation learning that can be biased to the demo data and reinforcement learning with relatively low data efficiency. In addition, the proposed model uses a novel path-based reward function designed to solve the problem of existing goal-based reward functions. In this paper, we demonstrate the high performance of the proposed model through various experiments using both Matterport3D simulation environment and R2R benchmark dataset.

Keywords

Vision-and-Language Navigation; Hybrid Learning; Path-Based Reward Function;

Citations & Related Records

Reference

1	P. Anderson, Q. Wu, D. Teney, J. Bruce, M. Johnson, N. Sunderhauf, I. Reid, S. Gould, and A. V. D. Hengel, "Visionand-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
2	X. Wang, Q. Huang, A. Celikyilmaz, J. Gao, D. Shen, Y. F. Wang, W. Y. Wang, and L. Zhang, "Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
3	H. Tan, L. Yu and M. Bansal, "Learning to Navigate Unseen Environments: Back Translation with Environmental Dropout," in Proceedings of North American Chapter of the Association for Computational Linguistics (NAACL), 2019.
4	A. Chang, A. Dai, T. Funkhouser, M. Halber, M. Niessner, M. Savva, S. Song, A. Zeng, and Y. Zhang, "Matterport3D: Learning from RGB-D Data in Indoor Environments," in Proceedings of the International Conference on 3D Vision, 2017.
5	D. Fried, R. Hu, V. Cirik, A. Rohrbach, J. Andreas, L. P. Morency, T. Berg-Kirkpatrick, K. Saenko, D. Klein and T. Darrell, "Speaker-Follower Models for Vision-and-Language Navigation," in Proceedings of the Neural Information Processing Systems (NIPS), Vol. 28, 2018.
6	W. Xiong, X. Wang, H. Wang, and W. Y. Wang, "Look Before You Leap: Bridging Model-Free and Model-Based Reinforcement Learning for Planned-Ahead Vision-and-Language Navigation," in Proceedings of the European Conference on Computer Vision (ECCV), pp. 696-711, 2018.
7	G. Ilharco, V. Jain, A. Ku, E. Ie, and J. Baldridge, "General Evaluation for Instruction Conditioned Navigation using Dynamic Time Warping," in Proceedings of Neural Information Processing Systems (NeurIPS), 2019.
8	M. A. Ranzato, S. Chopra, M. Auli, and W. Zaremba, "Sequence level training with recurrent neural networks." in Proceedings of the International Conference on Learning Representations (ICLR), 2015.
9	R. Paulus, C. Xiong and R. Socher, "A Deep Reinforced Model for Abstractive Summarization," in Proceedings of the International Conference on Learning Representations (ICLR), 2018.
10	V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. P. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu, "Asynchronous Methods for Deep Reinforcement Learning," in Proceedings of the International Conference on Machine Learning (ICML), pp. 1928-1937, 2018.
11	D. J. Berndt and J. Clifford, "Using Dynamic Time Warping to Find Patterns in Time Series," in KDD Workshop, pp. 359-370, 1994.

KSCI

Hybrid Learning for Vision-and-Language Navigation Agents 시각-언어 이동 에이전트를 위한 복합 학습

Hybrid Learning for Vision-and-Language Navigation Agents