[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.3837/tiis.2020.08.001

PC-SAN: Pretraining-Based Contextual Self-Attention Model for Topic Essay Generation

Lin, Fuqiang (College of Computer, National University of Defense Technology)
Ma, Xingkong (College of Computer, National University of Defense Technology)
Chen, Yaofeng (College of Computer, National University of Defense Technology)
Zhou, Jiajun (College of Computer, National University of Defense Technology)
Liu, Bo (College of Computer, National University of Defense Technology)

Publication Information

KSII Transactions on Internet and Information Systems (TIIS) / v.14, no.8, 2020 , pp. 3168-3186 More about this Journal

Abstract

Automatic topic essay generation (TEG) is a controllable text generation task that aims to generate informative, diverse, and topic-consistent essays based on multiple topics. To make the generated essays of high quality, a reasonable method should consider both diversity and topic-consistency. Another essential issue is the intrinsic link of the topics, which contributes to making the essays closely surround the semantics of provided topics. However, it remains challenging for TEG to fill the semantic gap between source topic words and target output, and a more powerful model is needed to capture the semantics of given topics. To this end, we propose a pretraining-based contextual self-attention (PC-SAN) model that is built upon the seq2seq framework. For the encoder of our model, we employ a dynamic weight sum of layers from BERT to fully utilize the semantics of topics, which is of great help to fill the gap and improve the quality of the generated essays. In the decoding phase, we also transform the target-side contextual history information into the query layers to alleviate the lack of context in typical self-attention networks (SANs). Experimental results on large-scale paragraph-level Chinese corpora verify that our model is capable of generating diverse, topic-consistent text and essentially makes improvements as compare to strong baselines. Furthermore, extensive analysis validates the effectiveness of contextual embeddings from BERT and contextual history information in SANs.

Keywords

Natural language generation; Essay generation; Pretraining-based method; Self-attention network; Deep learning;

Citations & Related Records

Reference

1	Wen, T.H., Gasic, M., Mrkši'c, N., Su, P.H., Vandyke, D., Young, S., "Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems," in Proc. of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1711-1721, 2015.
2	Mazare, P.E., Humeau, S., Raison, M., Bordes, A., "Training Millions of Personalized Dialogue Agents," in Proc. of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 2775-2779, 2018.
3	Zhang, S., Dinan, E., Urbanek, J., Szlam, A., Kiela, D., Weston, J., "Personalizing Dialogue Agents: I have a dog, do you have pets too?," in Proc. of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 2204-2213, 2018.
4	Meng, F., Lu, Z., Wang, M., Li, H., Jiang, W., Liu, Q., "Encoding Source Language with Convolutional Neural Network for Machine Translation," in Proc. of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 20-30, 2015.
5	Wang, X., Tu, Z., Wang, L., Shi, S., "Exploiting Sentential Context for Neural Machine Translation," in Proc. of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 6197-6203, 2019.
6	Hong, R., Li, L., Cai, J., Tao, D., Wang, M., Tian, Q., "Coherent semantic-visual indexing for large-scale image retrieval in the cloud," IEEE Transactions on Image Processing, 26, 4128-4138, 2017. DOI
7	Tosa, N., Obara, H., Minoh, M., "Hitch haiku : An interactive supporting system for composing haiku poem," in Proc. of International Conference on Entertainment Computing. Springer, pp. 209-216, 2008.
8	Shekhar, R., Takmaz, E., Fernandez, R., Bernardi, R., "Evaluating the Representational Hub of Language and Vision Models," in Proc. of the 13th International Conference on Computational Semantics - Long Papers, pp.211-222, 2019.
9	Hong, R., Yang, Y., Wang, M., Hua, X.S., "Learning visual semantic relationships for efficient visual retrieval," IEEE Transactions on Big Data, 1, 152-161, 2015. DOI
10	Zhang, J., Zhang, D., Hao, J., "Local translation prediction with global sentence representation," in Proc. of Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015.
11	Yan, R., Jiang, H., Lapata, M., Lin, S.D., Lv, X., Li, X., "i, poet : automatic Chinese poetry composition through a generative summarization framework under constrained optimization," in Proc. of the Twenty-Third international joint conference on Artificial Intelligence. AAAI Press, pp. 2197-2203, 2013.
12	Xing, C., Wu, W., Wu, Y., Liu, J., Huang, Y., Zhou, M., Ma, W.Y., "Topic aware neural response generation," in Proc. of Thirty-First AAAI Conference on Artificial Intelligence, 2017.
13	Feng, X., Liu, M., Liu, J., Qin, B., Sun, Y., Liu, T., "Topic-to-essay generation with neural networks," in Proc. of the 27th International Joint Conference on Artificial Intelligence. AAAI Press, pp. 4078-4084, 2018.
14	Devlin, J., Chang, M.W., Lee, K., Toutanova, K., "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding," in Proc. of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171-4186, 2019.
15	Sutskever, I., Vinyals, O., Le, Q.V., "Sequence to sequence learning with neural networks," Advances in neural information processing systems, pp. 3104-3112, 2014.
16	He, J., Zhou, M., Jiang, L, "Generating chinese classical poems with statistical machine translation models," in Proc. of Twenty-Sixth AAAI Conference on Artificial Intelligence, 2012.
17	Yi, X., Sun, M., Li, R., Yang, Z., "Chinese poetry generation with a working memory model," in Proc. of the 27th International Joint Conference on Artificial Intelligence. AAAI Press, pp. 4553-4559, 2018.
18	Yi, X., Sun, M., Li, R., Li, W., "Automatic poetry generation with mutual reinforcement learning," in Proc. of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 3143-3153, 2018.
19	Wang, Q., Luo, T., Wang, D., Xing, C., "Chinese song iambics generation with neural attention-based model," in Proc. of the Twenty-Fifth International Joint Conference on Artificial Intelligence. AAAI Press, pp. 2943-2949, 2016.
20	Wang, Z., He, W., Wu, H., Wu, H., Li, W., Wang, H., Chen, E., "Chinese Poetry Generation with Planning based Neural Network," in Proc. of COLING the 26th International Conference on Computational Linguistics: Technical Papers, pp. 1051-1060, 2016.
21	Cho, K., van Merrienboer, B., Gulcehre, C., Bougares, F., Schwenk, H., Bengio, Y., "Learning phrase representations using RNN encoder-decoder for statistical machine translation," in Proc. of Conference on Empirical Methods in Natural Language Processing (EMNLP 2014), pp. 1724-1734, 2014.
22	Wu, Y., Wei, F., Huang, S., Wang, Y., Li, Z., Zhou, M., "Response generation by context-aware prototype editing," in Proc. of the AAAI Conference on Artificial Intelligence, Vol. 33, pp. 7281-7288, 2019.
23	Bahdanau, D., Cho, K., Bengio, Y., "Neural machine translation by jointly learning to align and translate," arXiv preprint arXiv:1409.0473, 2014.
24	Zhang, J., Feng, Y., Wang, D., Wang, Y., Abel, A., Zhang, S., Zhang, A., "Flexible and creative chinese poetry generation using neural memory," in Proc. of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1364-1373, 2017.
25	Yang, X., Lin, X., Suo, S., Li, M., "Generating thematic Chinese poetry using conditional variational autoencoders with hybrid decoders," in Proc. of the Twenty-Seventh International Joint Conference on Artificial Intelligence, pp. 4539-4545, 2017.
26	Tu, Z., Lu, Z., Liu, Y., Liu, X., Li, H., "Modeling Coverage for Neural Machine Translation," in Proc. of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 76-85, 2016.
27	Dziri, N., Kamalloo, E.,Mathewson, K.W., Zaiane, O., "Augmenting Neural Response Generation with Context-Aware Topical Attention," in Proc. of the First Workshop on NLP for Conversational AI, pp. 18-31, 2018.
28	Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I., "Attention is all you need," Advances in neural information processing systems, pp. 5998-6008, 2017.
29	Lin, Z., Feng, M., Santos, C.N.d., Yu, M., Xiang, B., Zhou, B., Bengio, Y., "A structured self-attentive sentence embedding," arXiv preprint arXiv:1703.03130, 2017.
30	Peters, M.E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., Zettlemoyer, L., "Deep contextualized word representations," in Proc. of NAACL-HLT, pp. 2227-2237, 2018.
31	Radford, A., Narasimhan, K., Salimans, T., Sutskever, I., "Improving language understanding by generative pre-training," Technical report, OpenAI, 2018.
32	Pennington, J., Socher, R., Manning, C. Glove., "Global vectors for word representation," in Proc. of the 2014 conference on empirical methods in natural language processing (EMNLP), pp. 1532-1543, 2014.
33	Raganato, A., Tiedemann, J., "An analysis of encoder representations in transformer-based machine translation," in Proc. of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pp. 287-297, 2018.
34	Papineni, K., Roukos, S., Ward, T., Zhu, W.J., "BLEU: a method for automatic evaluation of machine translation," in Proc. of the 40th annual meeting on association for computational linguistics, Association for Computational Linguistics, pp. 311-318, 2002.
35	Yang, B., Li, J., Wong, D.F., Chao, L.S., Wang, X., Tu, Z., "Context-aware self-attention networks," in Proc. of the AAAI Conference on Artificial Intelligence, 2019.
36	Pereyra, G., Tucker, G., Chorowski, J., Kaiser, L., Hinton, G., "Regularizing neural networks by penalizing confident output distributions," arXiv preprint arXiv:1701.06548, 2017.
37	Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R. Dropout, "a simple way to prevent neural networks from overfitting," The journal of machine learning research, 15, 1929-1958, 2014.