[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.3745/JIPS.02.0172

Improving Abstractive Summarization by Training Masked Out-of-Vocabulary Words

Lee, Tae-Seok (Korea Institute of Science and Technology Information)
Lee, Hyun-Young (Dept. of Computer Science, Kookmin University)
Kang, Seung-Shik (Dept. of Computer Science, Kookmin University)

Publication Information

Journal of Information Processing Systems / v.18, no.3, 2022 , pp. 344-358 More about this Journal

Abstract

Text summarization is the task of producing a shorter version of a long document while accurately preserving the main contents of the original text. Abstractive summarization generates novel words and phrases using a language generation method through text transformation and prior-embedded word information. However, newly coined words or out-of-vocabulary words decrease the performance of automatic summarization because they are not pre-trained in the machine learning process. In this study, we demonstrated an improvement in summarization quality through the contextualized embedding of BERT with out-of-vocabulary masking. In addition, explicitly providing precise pointing and an optional copy instruction along with BERT embedding, we achieved an increased accuracy than the baseline model. The recall-based word-generation metric ROUGE-1 score was 55.11 and the word-order-based ROUGE-L score was 39.65.

Keywords

BERT; Deep Learning; Generative Summarization; Selective OOV Copy Model; Unknown Words;

Citations & Related Records

Reference

1	K. Goyal, G. Neubig, C. Dyer, and T. Berg-Kirkpatrick, "A continuous relaxation of beam search for end-toend training of neural sequence models," in Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), New Orleans, LA, 2018, pp. 3045-3052.
2	G. Huang, Y. Li, G. Pleiss, Z. Liu, J. E. Hopcroft, and K. Q. Weinberger, "Snapshot ensembles: train 1, get m for free," 2017 [Online]. Available: https://arxiv.org/abs/1704.00109.
3	M. Allahyari, S. Pouriyeh, M. Assefi, S. Safaei, E. D. Trippe, J. B. Gutierrez, and K. Kochut, "Text summarization techniques: a brief survey," 2017 [Online]. Available: https://arxiv.org/abs/1707.02268.
4	N. Nazari and M. A. Mahdavi, "A survey on automatic text summarization," Journal of AI and Data Mining, vol. 7, no. 1, pp. 121-135, 2019.
5	S. Narayan, S. B. Cohen, and M. Lapata, "Ranking sentences for extractive summarization with reinforcement learning," 2018 [Online]. Available: https://arxiv.org/abs/1802.08636.
6	J. Cheng and M. Lapata, "Neural summarization by extracting sentences and words," in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL), Berlin, Germany, 2016, pp. 484-494.
7	Y. Dong, "A survey on neural network-based summarization methods," 2018 [Online]. Available: https://arxiv.org/abs/1804.04589.
8	W. Wang, Y. Gao, H. Y. Huang, and Y. Zhou, "Concept pointer network for abstractive summarization," in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 2019, pp. 3076-3085.
9	G. Rossiello, P. Basile, and G. Semeraro, "Centroid-based text summarization through compositionality of word embeddings," in Proceedings of the MultiLing 2017 Workshop on Summarization and Summary Evaluation Across Source Types and Genres, Valencia, Spain, 2017, pp. 12-21.
10	J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, "BERT: pre-training of deep bidirectional transformers for language understanding," 2018 [Online]. Available: https://arxiv.org/abs/1810.04805.
11	C. Gulcehre, S. Ahn, R. Nallapati, B. Zhou, and Y. Bengio, "Pointing the unknown words," in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL), Berlin, Germany, 2016, pp. 140-149.
12	R. Nallapati, B. Zhou, C. Gulcehre, and B. Xiang, "Abstractive text summarization using sequence-tosequence RNNs and beyond," 2016 [Online]. Available: https://arxiv.org/abs/1602.06023.
13	K. Ganesan, "Rouge 2.0: updated and improved measures for evaluation of summarization tasks," 2018 [Online]. Available: https://arxiv.org/abs/1803.01937.
14	I. Loshchilov and F. Hutter, "SGDR: stochastic gradient descent with warm restarts," 2016 [Online]. Available: https://arxiv.org/abs/1608.03983.
15	M. Yasunaga, J. Kasai, R. Zhang, A. R. Fabbri, I. Li, D. Friedman, and D. R. Radev, "ScisummNet: a large annotated corpus and content-impact models for scientific paper summarization with citation networks," in Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, 2019, pp. 7386-7393.
16	S. Erera, M. Shmueli-Scheuer, G. Feigenblat, O. P. Nakash, O. Boni, H. Roitman, et al., "A summarization system for scientific documents," in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLPIJCNLP), Hong Kong, China, 2019, pp. 211-216.
17	T. Lee and S. Kang, "Automatic text summarization based on selective OOV copy mechanism with BERT embedding," Journal of KIISE, vol. 47, no. 1, pp. 36-44, 2020. DOI
18	J. Howard and S. Ruder, "Universal language model fine-tuning for text classification," in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL), Melbourne, Australia, 2018, pp. 328-339.
19	M. Hu, Y. Peng, F. Wei, Z. Huang, D. Li, N. Yang, and M. Zhou, "Attention-guided answer distillation for machine reading comprehension," 2018 [Online]. Available: https://arxiv.org/abs/1808.07644.
20	I. Sutskever, O. Vinyals, and Q. V. Le, "Sequence to sequence learning with neural networks," Advances in Neural Information Processing Systems, vol. 27, pp. 3104-3112, 2014.
21	A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, "Attention is all you need," Advances in Neural Information Processing Systems, vol. 30, pp. 5998-6008, 2017.
22	S. Xu, H. Li, P. Yuan, Y. Wu, X. He, and B. Zhou, "Self-attention guided copy mechanism for abstractive summarization," in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL), Virtual Event, 2020, pp. 1355-1362.
23	D. Bahdanau, K. Cho, and Y. Bengio, "Neural machine translation by jointly learning to align and translate," 2014 [Online]. Available: https://arxiv.org/abs/1409.0473.
24	J. Gu, Z. Lu, H. Li, and V. O. Li, "Incorporating copying mechanism in sequence-to-sequence learning," in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL), Berlin, Germany, 2016, pp. 1631-1640.
25	T. Shi, Y. Keneshloo, N. Ramakrishnan, and C. K. Reddy, "Neural abstractive text summarization with sequence-to-sequence models," ACM Transactions on Data Science, vol. 2, no. 1, article no. 1, 2021. https://doi.org/10.1145/3419106 DOI
26	L. N. Smith, "Cyclical learning rates for training neural networks," in Proceedings of 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), Santa Rosa, CA, 2017, pp. 464-472.