• Title/Summary/Keyword: Vision-and-Language Navigation

Search Result 6, Processing Time 0.02 seconds

Hybrid Learning for Vision-and-Language Navigation Agents (시각-언어 이동 에이전트를 위한 복합 학습)

  • Oh, Suntaek;Kim, Incheol
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.9 no.9
    • /
    • pp.281-290
    • /
    • 2020
  • The Vision-and-Language Navigation(VLN) task is a complex intelligence problem that requires both visual and language comprehension skills. In this paper, we propose a new learning model for visual-language navigation agents. The model adopts a hybrid learning that combines imitation learning based on demo data and reinforcement learning based on action reward. Therefore, this model can meet both problems of imitation learning that can be biased to the demo data and reinforcement learning with relatively low data efficiency. In addition, the proposed model uses a novel path-based reward function designed to solve the problem of existing goal-based reward functions. In this paper, we demonstrate the high performance of the proposed model through various experiments using both Matterport3D simulation environment and R2R benchmark dataset.

LVLN : A Landmark-Based Deep Neural Network Model for Vision-and-Language Navigation (LVLN: 시각-언어 이동을 위한 랜드마크 기반의 심층 신경망 모델)

  • Hwang, Jisu;Kim, Incheol
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.8 no.9
    • /
    • pp.379-390
    • /
    • 2019
  • In this paper, we propose a novel deep neural network model for Vision-and-Language Navigation (VLN) named LVLN (Landmark-based VLN). In addition to both visual features extracted from input images and linguistic features extracted from the natural language instructions, this model makes use of information about places and landmark objects detected from images. The model also applies a context-based attention mechanism in order to associate each entity mentioned in the instruction, the corresponding region of interest (ROI) in the image, and the corresponding place and landmark object detected from the image with each other. Moreover, in order to improve the success rate of arriving the target goal, the model adopts a progress monitor module for checking substantial approach to the target goal. Conducting experiments with the Matterport3D simulator and the Room-to-Room (R2R) benchmark dataset, we demonstrate high performance of the proposed model.

Path finding via VRML and VISION overlay for Autonomous Robotic (로봇의 위치보정을 통한 경로계획)

  • Sohn, Eun-Ho;Park, Jong-Ho;Kim, Young-Chul;Chong, Kil-To
    • Proceedings of the KIEE Conference
    • /
    • 2006.10c
    • /
    • pp.527-529
    • /
    • 2006
  • In this paper, we find a robot's path using a Virtual Reality Modeling Language and overlay vision. For correct robot's path we describe a method for localizing a mobile robot in its working environment using a vision system and VRML. The robt identifies landmarks in the environment, using image processing and neural network pattern matching techniques, and then its performs self-positioning with a vision system based on a well-known localization algorithm. After the self-positioning procedure, the 2-D scene of the vision is overlaid with the VRML scene. This paper describes how to realize the self-positioning, and shows the overlap between the 2-D and VRML scenes. The method successfully defines a robot's path.

  • PDF

A Path tracking algorithm and a VRML image overlay method (VRML과 영상오버레이를 이용한 로봇의 경로추적)

  • Sohn, Eun-Ho;Zhang, Yuanliang;Kim, Young-Chul;Chong, Kil-To
    • Proceedings of the IEEK Conference
    • /
    • 2006.06a
    • /
    • pp.907-908
    • /
    • 2006
  • We describe a method for localizing a mobile robot in its working environment using a vision system and Virtual Reality Modeling Language (VRML). The robot identifies landmarks in the environment, using image processing and neural network pattern matching techniques, and then its performs self-positioning with a vision system based on a well-known localization algorithm. After the self-positioning procedure, the 2-D scene of the vision is overlaid with the VRML scene. This paper describes how to realize the self-positioning, and shows the overlap between the 2-D and VRML scenes. The method successfully defines a robot's path.

  • PDF

Implementation of Path Finding Method using 3D Mapping for Autonomous Robotic (3차원 공간 맵핑을 통한 로봇의 경로 구현)

  • Son, Eun-Ho;Kim, Young-Chul;Chong, Kil-To
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.14 no.2
    • /
    • pp.168-177
    • /
    • 2008
  • Path finding is a key element in the navigation of a mobile robot. To find a path, robot should know their position exactly, since the position error exposes a robot to many dangerous conditions. It could make a robot move to a wrong direction so that it may have damage by collision by the surrounding obstacles. We propose a method obtaining an accurate robot position. The localization of a mobile robot in its working environment performs by using a vision system and Virtual Reality Modeling Language(VRML). The robot identifies landmarks located in the environment. An image processing and neural network pattern matching techniques have been applied to find location of the robot. After the self-positioning procedure, the 2-D scene of the vision is overlaid onto a VRML scene. This paper describes how to realize the self-positioning, and shows the overlay between the 2-D and VRML scenes. The suggested method defines a robot's path successfully. An experiment using the suggested algorithm apply to a mobile robot has been performed and the result shows a good path tracking.

Image Caption Generation using Recurrent Neural Network (Recurrent Neural Network를 이용한 이미지 캡션 생성)

  • Lee, Changki
    • Journal of KIISE
    • /
    • v.43 no.8
    • /
    • pp.878-882
    • /
    • 2016
  • Automatic generation of captions for an image is a very difficult task, due to the necessity of computer vision and natural language processing technologies. However, this task has many important applications, such as early childhood education, image retrieval, and navigation for blind. In this paper, we describe a Recurrent Neural Network (RNN) model for generating image captions, which takes image features extracted from a Convolutional Neural Network (CNN). We demonstrate that our models produce state of the art results in image caption generation experiments on the Flickr 8K, Flickr 30K, and MS COCO datasets.