Browse > Article
http://dx.doi.org/10.9717/kmms.2019.22.11.1223

Parallel Injection Method for Improving Descriptive Performance of Bi-GRU Image Captions  

Lee, Jun Hee (Dept. of Electrical and Electronics Engineering, Korea Maritime and Ocean University)
Lee, Soo Hwan (Dept. of Electrical and Electronics Engineering, Korea Maritime and Ocean University)
Tae, Soo Ho (Dept. of Electrical and Electronics Engineering, Korea Maritime and Ocean University)
Seo, Dong Hoan (Div. of Electronics and Electrical Information Engineering, Korea Maritime and Ocean University)
Publication Information
Abstract
The injection is the input method of the image feature vector from the encoder to the decoder. Since the image feature vector contains object details such as color and texture, it is essential to generate image captions. However, the bidirectional decoder model using the existing injection method only inputs the image feature vector in the first step, so image feature vectors of the backward sequence are vanishing. This problem makes it difficult to describe the context in detail. Therefore, in this paper, we propose the parallel injection method to improve the description performance of image captions. The proposed Injection method fuses all embeddings and image vectors to preserve the context. Also, We optimize our image caption model with Bidirectional Gated Recurrent Unit (Bi-GRU) to reduce the amount of computation of the decoder. To validate the proposed model, experiments were conducted with a certified image caption dataset, demonstrating excellence in comparison with the latest models using BLEU and METEOR scores. The proposed model improved the BLEU score up to 20.2 points and the METEOR score up to 3.65 points compared to the existing caption model.
Keywords
Image Caption; Parallel Injection; Bi-GRU; Injection Method;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 E. Maggiori, Y. Tarabalka, G. Charpiat, and P. Alliez, “Convolutional Neural Networks for Large-scale Remote-sensing Image Classification,” IEEE Transactions on Geoscience and Remote Sensing, Vol. 55, No. 2, pp. 645- 657, 2017.   DOI
2 D.H. Kim, J.E. Kim, J.H. Song, Y.J. Shin, and S.S. Hwang, “Image-based Intelligent Surveillance System Using Unmanned Aircraft,” Journal of Korea Multimedia Society, Vol. 20, No. 3, pp. 437-445, 2017.   DOI
3 S. Yu, S. Jia, and C. Xu, "Convolutional Neural Networks for Hyperspectral Image Classification," Neurocomputing, Vol. 219, pp. 88-98, 2017.   DOI
4 P. Morales-Alvarez, A. Perez-Suay, R. Molina, and G. Camps-Valls, “Remote Sensing Image Classification with Large-scale Gaussian Processes,” IEEE Transactions on Geoscience and Remote Sensing, Vol. 56, No. 2, pp. 1103-1114, 2018.   DOI
5 B. Gecer, G. Azzopardi, and N. Petkov, "Colorblob-based COSFIRE Filters for Object Recognition," Image and Vision Computing, Vol. 57, pp. 165-174, 2017.   DOI
6 L.C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A.L. Yuille, “Deeplab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFS,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 40, No. 4, pp. 834-848, 2018.   DOI
7 K. Grm, V. Stuc, A. Artiges, M. Caron, and H.K. Ekenel, “Strengths and Weaknesses of Deep Learning Models for Face Recognition Against Image Degradations,” The Institution of Engineering and Technology Biometrics, Vol. 7, No. 1, pp. 81-89, 2017.
8 J. Cleveland, D. Thakur, P. Dames, C. Phillips, T. Kientz, K. Daniilidis, et al., “Automated System for Semantic Object Labeling with Soft-object Recognition and Dynamic Programming Segmentation,” IEEE Transactions on Automation Science and Engineering, Vol. 14, No. 2, pp. 820-833, 2017.   DOI
9 X. Yang, W. Wu, K. Liu, P.W. Kim, A.K. Sangaiah, and G. Jeon, "Long-distance Object Recognition with Image Super Resolution: A Comparative Study," IEEE Access, Vol. 6, pp. 13429-13438, 2018.   DOI
10 D. Marmanis, K. Schindler, J.D. Wegner, S. Galliani, M. Datcu, and U. Stilla, "Classification with an Edge: Improving Semantic Image Segmentation with Boundary Detection," Journal of International Society for Photogrammetry and Remote Sensing, Vol. 135, pp. 158-172, 2018.   DOI
11 K. Simonyan and A. Zisserman, "Very Deep Convolutional Networks for Large-scale Image Recognition," arXiv Preprint arXive:1409.1556, 2014.
12 C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelow, et al., "Going Deeper with Convolutions," Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1-9, 2015.
13 K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," Proceeding of IEEE Conference on Computer Vision and Pattern Recognition, pp. 770-778, 2016.
14 J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You only Look once: Unified, Realtime Object Detection," Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779-788, 2016.
15 S. Ren, K. He, R. Girshick, and J. Sun, "Faster R-cnn: Towards Real-time Object Detection with Region Proposal Networks," Advances in Neural Information Processing Systems, pp. 91-99, 2015.
16 O. Vinyals, A. Toshev, S. Bengio, and D. Erhan, "Show and Tell: A Neural Image Caption Generator," Proceeding of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3156-3164, 2015.
17 J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, "Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling," arXiv Preprint arXiv:1412.3555, 2014.
18 K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhudinov, et al., "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention," Proceeding of International Conference on Machine Learning, pp. 2048-2057, 2015.
19 J. Mao, W. Xu, Y. Yang, J. Wang, Z. Huang, and A. Yuille, "Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)," arXiv Preprint arXiv:1412.6632, 2014.
20 M. Schuster and K.K. Paliwal, “Bidirectional Recurrent Neural Networks,” IEEE Transactions on Signal Processing, Vol. 45, No. 11, pp. 2673-2681, 1997.   DOI
21 M. Hodosh, P. Young, and J. Hockenmaier, "Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics," Journal of Artificial Intelligence Research, Vol. 47, pp. 853-899, 2013.   DOI
22 P. Young, A. Lai, M. Hodosh, and J. Hockenmaier, "From Image Descriptions to Visual Denotations: New Similarity Metrics for Semantic Inference over Event Descriptions," Transactions of the Association for Computational Linguistics, Vol. 2, pp. 67-78, 2014.   DOI
23 J. Guan and E. Wang, "Repeated Review Based Image Captioning for Image Evidence Review," Signal Processing: Image Communication, Vol. 63, pp. 141-148, 2018.   DOI
24 T.Y. Lin, M. Maire, S. Belongje, J. Hays, P. Perona, D. Ramanan, et al., "Microsoft Coco: Common Objects in Context," arXiv Preprint arXiv:1405.0312, 2014.
25 K. Papineni, S. Roukos, T. Ward, and W.J. Zhu, "BLEU: a Method for Automatic Evaluation of Machine Translation," Asscociation for Computational Linguistics, pp. 311-318, 2002.
26 S. Banerjee and A. Lavie, "METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments," Proceeding of the Association for Computational Linguistics Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/ or Summarization, pp. 65-72, 2005.