[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.30693/SMJ.2021.10.1.63

A Study on Image Generation from Sentence Embedding Applying Self-Attention

Yu, Kyungho (조선대학교 컴퓨터공학과 대학원)
No, Juhyeon (조선대학교 컴퓨터공학과 대학원)
Hong, Taekeun (조선대학교 컴퓨터공학과 대학원)
Kim, Hyeong-Ju (조선대학교 컴퓨터공학과 대학원)
Kim, Pankoo (조선대학교 컴퓨터공학과)

Publication Information

Smart Media Journal / v.10, no.1, 2021 , pp. 63-69 More about this Journal

Abstract

When a person sees a sentence and understands the sentence, the person understands the sentence by reminiscent of the main word in the sentence as an image. Text-to-image is what allows computers to do this associative process. The previous deep learning-based text-to-image model extracts text features using Convolutional Neural Network (CNN)-Long Short Term Memory (LSTM) and bi-directional LSTM, and generates an image by inputting it to the GAN. The previous text-to-image model uses basic embedding in text feature extraction, and it takes a long time to train because images are generated using several modules. Therefore, in this research, we propose a method of extracting features by using the attention mechanism, which has improved performance in the natural language processing field, for sentence embedding, and generating an image by inputting the extracted features into the GAN. As a result of the experiment, the inception score was higher than that of the model used in the previous study, and when judged with the naked eye, an image that expresses the features well in the input sentence was created. In addition, even when a long sentence is input, an image that expresses the sentence well was created.

Keywords

Natural Language Processing; Image generation; Generative Adversarial Network;

Citations & Related Records

Reference

1	Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B. & Lee, H., "Generative adversarial text to image synthesis," In International Conference on Machine Learning, pp. 1060-1069, 2016.
2	Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang, X. & Metaxas, D. N., "Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks," In Proceedings of the IEEE international conference on computer vision, pp. 5907-5915, 2017.
3	Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J. & Wojna, Z., "Rethinking the inception architecture for computer vision," In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2818-2826, 2016.
4	Xu, T., Zhang, P., Huang, Q., Zhang, H., Gan, Z., Huang, X. & He, X, "Attngan: Fine-grained text to image generation with attentional generative adversarial networks," In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1316-1324, 2018.
5	Zhu, J. Y., Park, T., Isola, P. & Efros, A. A, "Unpaired image-to-image translation using cycle-consistent adversarial networks," In Proceedings of the IEEE international conference on computer vision, pp. 2223-2232, 2017.
6	Qiao, T., Zhang, J., Xu, D. & Tao, D, "Mirrorgan: Learning text-to-image generation by redescription," In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1505-1514, 2019.
7	Mikolov, T., Chen, K., Corrado, G. & Dean, J, "Efficient estimation of word representations in vector space," arXiv preprint arXiv:1301.3781, 2013.
8	임명진, 박원호, 신주현, "Word2Vec과 LSTM을 활용한 이별 가사 감정 분류," 스마트미디어저널, 제9권 제3호, 90-97쪽, 2020년 9월 DOI
9	Devlin, J., Chang, M. W., Lee, K. & Toutanova, K, "Bert: Pre-training of deep bidirectional transformers for language understanding," arXiv preprint arXiv:1810.04805, 2018.
10	Vaswani, Ashish, et al. "Attention is all you need," Advances in neural information processing systems, pp. 5998-6008. 2017.
11	Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D. & Stoyanov, V, "Roberta: A robustly optimized bert pretraining approach," arXiv preprint arXiv:1907.11692, 2019.
12	Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R. & Le, Q. V, "Xlnet: Generalized autoregressive pretraining for language understanding," arXiv preprint arXiv:1906.08237, 2019.
13	Haseeb Nazki, Jaehwan Lee, Sook Yoon, Dong Sun Park, "Image-to-Image Translation with GAN for Synthetic Data Augmentation in Plant Disease Datasets," 스마트미디어저널, 제8권, 제2호, 46-57쪽, 2019년 06월
14	이태석, 강승식, "LSTM 기반의 sequence-to-sequence 모델을 이용한 한글 자동 띄어쓰기," 스마트미디어저널, 제7권, 제4호, 17-23쪽, 2018년 DOI

KSCI

A Study on Image Generation from Sentence Embedding Applying Self-Attention Self-Attention을 적용한 문장 임베딩으로부터 이미지 생성 연구

A Study on Image Generation from Sentence Embedding Applying Self-Attention