[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.3745/KTSDE.2017.6.4.203

Design of a Deep Neural Network Model for Image Caption Generation

Kim, Dongha (경기대학교 컴퓨터과학과)
Kim, Incheol (경기대학교 컴퓨터과학과)

Publication Information

KIPS Transactions on Software and Data Engineering / v.6, no.4, 2017 , pp. 203-210 More about this Journal

Abstract

In this paper, we propose an effective neural network model for image caption generation and model transfer. This model is a kind of multi-modal recurrent neural network models. It consists of five distinct layers: a convolution neural network layer for extracting visual information from images, an embedding layer for converting each word into a low dimensional feature, a recurrent neural network layer for learning caption sentence structure, and a multi-modal layer for combining visual and language information. In this model, the recurrent neural network layer is constructed by LSTM units, which are well known to be effective for learning and transferring sequence patterns. Moreover, this model has a unique structure in which the output of the convolution neural network layer is linked not only to the input of the initial state of the recurrent neural network layer but also to the input of the multimodal layer, in order to make use of visual information extracted from the image at each recurrent step for generating the corresponding textual caption. Through various comparative experiments using open data sets such as Flickr8k, Flickr30k, and MSCOCO, we demonstrated the proposed multimodal recurrent neural network model has high performance in terms of caption accuracy and model transfer effect.

Keywords

Image Caption Generation; Deep Neural Network Model; Model Transfer; Multi-Modal Recurrent Neural Network;

Citations & Related Records

Times Cited By KSCI : 1 (Citation Analysis)

Reference
Cited By KSCI

1	Lisa Anne Hendricks et al., "Deep Compositional Captioning: Describing Novel Object Categories without Paired Training Data," Proc. of IEEE Conf. on CVPR, 2016.
2	Oriol Vinyals and Alexander Toshev et al., "Show and Tell: A Neural Image Caption Generator," Proc. of the IEEE, Conf. on CVPR, 2015.
3	Kevin Xu and Jimmy Lei Ba et al., "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention," Proc. of. ICML. 2015.
4	Junhua Mao, Wei Xu, and Yi Yang et al., "Deep Captioning with Multimodal Recurrent Neural Networks (M-RNN)," Proc. of. ICLR, 2015.
5	Changki Lee, "Image Caption Generation using Recurrent Neural Network," Journal of KIISE, Vol.43, No.8, pp.878-882, 2016. DOI
6	Hochreiter, Sepp, and Jürgen Schmidhuber, "Long Short- Term Memory," Neural Computation, Vol.9, No.8, pp.1735- 1780, 1997. DOI
7	Chung, Junyoung et al., "Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling," arXiv preprint arXiv:1412.3555, 2014.
8	Szegedy, Christian, Sergey Ioffe et al., "Inception-v4, Inception- Resnet and The Impact of Residual Connections on Learning," arXiv preprint arXiv:1602.07261, 2016.
9	Papineni Kishore, Rouskos Salim et al., "BLEU: a Method for Automatic Evaluation of Machine Translation," Proc. of ACL, pp.311-318, 2002.
10	Lin Tsung-Yi and Maire Michael et al., "Microsoft COCO: Common Objects in Context," Proc. of ECCV, Springer International Publishing, 2014.

KSCI

Design of a Deep Neural Network Model for Image Caption Generation 이미지 캡션 생성을 위한 심층 신경망 모델의 설계

Design of a Deep Neural Network Model for Image Caption Generation