• Title/Summary/Keyword: 멀티 모달 순환 신경망

Search Result 3, Processing Time 0.019 seconds

Learning and Transferring Deep Neural Network Models for Image Caption Generation (이미지 캡션 생성을 위한 심층 신경망 모델 학습과 전이)

  • Kim, Dong-Ha;Kim, Incheol
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2016.10a
    • /
    • pp.617-620
    • /
    • 2016
  • 본 논문에서는 이미지 캡션 생성과 모델 전이에 효과적인 심층 신경망 모델을 제시한다. 본 모델은 멀티 모달 순환 신경망 모델의 하나로서, 이미지로부터 시각 정보를 추출하는 컨볼루션 신경망 층, 각 단어를 저차원의 특징으로 변환하는 임베딩 층, 캡션 문장 구조를 학습하는 순환 신경망 층, 시각 정보와 언어 정보를 결합하는 멀티 모달 층 등 총 5 개의 계층들로 구성된다. 특히 본 모델에서는 시퀀스 패턴 학습과 모델 전이에 우수한 LSTM 유닛을 이용하여 순환 신경망 층을 구성하고, 컨볼루션 신경망 층의 출력을 임베딩 층뿐만 아니라 멀티 모달 층에도 연결함으로써, 캡션 문장 생성을 위한 매 단계마다 이미지의 시각 정보를 이용할 수 있는 연결 구조를 가진다. Flickr8k, Flickr30k, MSCOCO 등의 공개 데이터 집합들을 이용한 다양한 비교 실험을 통해, 캡션의 정확도와 모델 전이의 효과 면에서 본 논문에서 제시한 멀티 모달 순환 신경망 모델의 우수성을 입증하였다.

Design of a Deep Neural Network Model for Image Caption Generation (이미지 캡션 생성을 위한 심층 신경망 모델의 설계)

  • Kim, Dongha;Kim, Incheol
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.4
    • /
    • pp.203-210
    • /
    • 2017
  • In this paper, we propose an effective neural network model for image caption generation and model transfer. This model is a kind of multi-modal recurrent neural network models. It consists of five distinct layers: a convolution neural network layer for extracting visual information from images, an embedding layer for converting each word into a low dimensional feature, a recurrent neural network layer for learning caption sentence structure, and a multi-modal layer for combining visual and language information. In this model, the recurrent neural network layer is constructed by LSTM units, which are well known to be effective for learning and transferring sequence patterns. Moreover, this model has a unique structure in which the output of the convolution neural network layer is linked not only to the input of the initial state of the recurrent neural network layer but also to the input of the multimodal layer, in order to make use of visual information extracted from the image at each recurrent step for generating the corresponding textual caption. Through various comparative experiments using open data sets such as Flickr8k, Flickr30k, and MSCOCO, we demonstrated the proposed multimodal recurrent neural network model has high performance in terms of caption accuracy and model transfer effect.

Multimodal Sentiment Analysis Using Review Data and Product Information (리뷰 데이터와 제품 정보를 이용한 멀티모달 감성분석)

  • Hwang, Hohyun;Lee, Kyeongchan;Yu, Jinyi;Lee, Younghoon
    • The Journal of Society for e-Business Studies
    • /
    • v.27 no.1
    • /
    • pp.15-28
    • /
    • 2022
  • Due to recent expansion of online market such as clothing, utilizing customer review has become a major marketing measure. User review has been used as a tool of analyzing sentiment of customers. Sentiment analysis can be largely classified with machine learning-based and lexicon-based method. Machine learning-based method is a learning classification model referring review and labels. As research of sentiment analysis has been developed, multi-modal models learned by images and video data in reviews has been studied. Characteristics of words in reviews are differentiated depending on products' and customers' categories. In this paper, sentiment is analyzed via considering review data and metadata of products and users. Gated Recurrent Unit (GRU), Long Short-Term Memory (LSTM), Self Attention-based Multi-head Attention models and Bidirectional Encoder Representation from Transformer (BERT) are used in this study. Same Multi-Layer Perceptron (MLP) model is used upon every products information. This paper suggests a multi-modal sentiment analysis model that simultaneously considers user reviews and product meta-information.