[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.9717/kmms.2019.22.11.1223

Parallel Injection Method for Improving Descriptive Performance of Bi-GRU Image Captions

Lee, Jun Hee (Dept. of Electrical and Electronics Engineering, Korea Maritime and Ocean University)
Lee, Soo Hwan (Dept. of Electrical and Electronics Engineering, Korea Maritime and Ocean University)
Tae, Soo Ho (Dept. of Electrical and Electronics Engineering, Korea Maritime and Ocean University)
Seo, Dong Hoan (Div. of Electronics and Electrical Information Engineering, Korea Maritime and Ocean University)

Publication Information

Journal of Korea Multimedia Society / v.22, no.11, 2019 , pp. 1223-1232 More about this Journal

Abstract

The injection is the input method of the image feature vector from the encoder to the decoder. Since the image feature vector contains object details such as color and texture, it is essential to generate image captions. However, the bidirectional decoder model using the existing injection method only inputs the image feature vector in the first step, so image feature vectors of the backward sequence are vanishing. This problem makes it difficult to describe the context in detail. Therefore, in this paper, we propose the parallel injection method to improve the description performance of image captions. The proposed Injection method fuses all embeddings and image vectors to preserve the context. Also, We optimize our image caption model with Bidirectional Gated Recurrent Unit (Bi-GRU) to reduce the amount of computation of the decoder. To validate the proposed model, experiments were conducted with a certified image caption dataset, demonstrating excellence in comparison with the latest models using BLEU and METEOR scores. The proposed model improved the BLEU score up to 20.2 points and the METEOR score up to 3.65 points compared to the existing caption model.

Keywords

Image Caption; Parallel Injection; Bi-GRU; Injection Method;

Citations & Related Records

Times Cited By KSCI : 1 (Citation Analysis)

Reference
Cited By KSCI

1	E. Maggiori, Y. Tarabalka, G. Charpiat, and P. Alliez, “Convolutional Neural Networks for Large-scale Remote-sensing Image Classification,” IEEE Transactions on Geoscience and Remote Sensing, Vol. 55, No. 2, pp. 645- 657, 2017. DOI
2	D.H. Kim, J.E. Kim, J.H. Song, Y.J. Shin, and S.S. Hwang, “Image-based Intelligent Surveillance System Using Unmanned Aircraft,” Journal of Korea Multimedia Society, Vol. 20, No. 3, pp. 437-445, 2017. DOI
3	S. Yu, S. Jia, and C. Xu, "Convolutional Neural Networks for Hyperspectral Image Classification," Neurocomputing, Vol. 219, pp. 88-98, 2017. DOI
4	P. Morales-Alvarez, A. Perez-Suay, R. Molina, and G. Camps-Valls, “Remote Sensing Image Classification with Large-scale Gaussian Processes,” IEEE Transactions on Geoscience and Remote Sensing, Vol. 56, No. 2, pp. 1103-1114, 2018. DOI
5	B. Gecer, G. Azzopardi, and N. Petkov, "Colorblob-based COSFIRE Filters for Object Recognition," Image and Vision Computing, Vol. 57, pp. 165-174, 2017. DOI
6	L.C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A.L. Yuille, “Deeplab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFS,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 40, No. 4, pp. 834-848, 2018. DOI
7	K. Grm, V. Stuc, A. Artiges, M. Caron, and H.K. Ekenel, “Strengths and Weaknesses of Deep Learning Models for Face Recognition Against Image Degradations,” The Institution of Engineering and Technology Biometrics, Vol. 7, No. 1, pp. 81-89, 2017.
8	J. Cleveland, D. Thakur, P. Dames, C. Phillips, T. Kientz, K. Daniilidis, et al., “Automated System for Semantic Object Labeling with Soft-object Recognition and Dynamic Programming Segmentation,” IEEE Transactions on Automation Science and Engineering, Vol. 14, No. 2, pp. 820-833, 2017. DOI
9	X. Yang, W. Wu, K. Liu, P.W. Kim, A.K. Sangaiah, and G. Jeon, "Long-distance Object Recognition with Image Super Resolution: A Comparative Study," IEEE Access, Vol. 6, pp. 13429-13438, 2018. DOI
10	D. Marmanis, K. Schindler, J.D. Wegner, S. Galliani, M. Datcu, and U. Stilla, "Classification with an Edge: Improving Semantic Image Segmentation with Boundary Detection," Journal of International Society for Photogrammetry and Remote Sensing, Vol. 135, pp. 158-172, 2018. DOI
11	K. Simonyan and A. Zisserman, "Very Deep Convolutional Networks for Large-scale Image Recognition," arXiv Preprint arXive:1409.1556, 2014.
12	C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelow, et al., "Going Deeper with Convolutions," Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1-9, 2015.
13	K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," Proceeding of IEEE Conference on Computer Vision and Pattern Recognition, pp. 770-778, 2016.
14	J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You only Look once: Unified, Realtime Object Detection," Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779-788, 2016.
15	S. Ren, K. He, R. Girshick, and J. Sun, "Faster R-cnn: Towards Real-time Object Detection with Region Proposal Networks," Advances in Neural Information Processing Systems, pp. 91-99, 2015.
16	O. Vinyals, A. Toshev, S. Bengio, and D. Erhan, "Show and Tell: A Neural Image Caption Generator," Proceeding of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3156-3164, 2015.
17	J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, "Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling," arXiv Preprint arXiv:1412.3555, 2014.
18	K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhudinov, et al., "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention," Proceeding of International Conference on Machine Learning, pp. 2048-2057, 2015.
19	J. Mao, W. Xu, Y. Yang, J. Wang, Z. Huang, and A. Yuille, "Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)," arXiv Preprint arXiv:1412.6632, 2014.
20	M. Schuster and K.K. Paliwal, “Bidirectional Recurrent Neural Networks,” IEEE Transactions on Signal Processing, Vol. 45, No. 11, pp. 2673-2681, 1997. DOI
21	M. Hodosh, P. Young, and J. Hockenmaier, "Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics," Journal of Artificial Intelligence Research, Vol. 47, pp. 853-899, 2013. DOI
22	P. Young, A. Lai, M. Hodosh, and J. Hockenmaier, "From Image Descriptions to Visual Denotations: New Similarity Metrics for Semantic Inference over Event Descriptions," Transactions of the Association for Computational Linguistics, Vol. 2, pp. 67-78, 2014. DOI
23	J. Guan and E. Wang, "Repeated Review Based Image Captioning for Image Evidence Review," Signal Processing: Image Communication, Vol. 63, pp. 141-148, 2018. DOI
24	T.Y. Lin, M. Maire, S. Belongje, J. Hays, P. Perona, D. Ramanan, et al., "Microsoft Coco: Common Objects in Context," arXiv Preprint arXiv:1405.0312, 2014.
25	K. Papineni, S. Roukos, T. Ward, and W.J. Zhu, "BLEU: a Method for Automatic Evaluation of Machine Translation," Asscociation for Computational Linguistics, pp. 311-318, 2002.
26	S. Banerjee and A. Lavie, "METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments," Proceeding of the Association for Computational Linguistics Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/ or Summarization, pp. 65-72, 2005.

KSCI

Parallel Injection Method for Improving Descriptive Performance of Bi-GRU Image Captions Bi-GRU 이미지 캡션의 서술 성능 향상을 위한 Parallel Injection 기법 연구

Parallel Injection Method for Improving Descriptive Performance of Bi-GRU Image Captions