Browse > Article
http://dx.doi.org/10.13088/jiis.2022.28.4.309

A Study of Pre-trained Language Models for Korean Language Generation  

Song, Minchae (Nonghyup, The Department of Big Data Strategy)
Shin, Kyung-shik (School of Business, Ewha Womans University)
Publication Information
Journal of Intelligence and Information Systems / v.28, no.4, 2022 , pp. 309-328 More about this Journal
Abstract
This study empirically analyzed a Korean pre-trained language models (PLMs) designed for natural language generation. The performance of two PLMs - BART and GPT - at the task of abstractive text summarization was compared. To investigate how performance depends on the characteristics of the inference data, ten different document types, containing six types of informational content and creation content, were considered. It was found that BART (which can both generate and understand natural language) performed better than GPT (which can only generate). Upon more detailed examination of the effect of inference data characteristics, the performance of GPT was found to be proportional to the length of the input text. However, even for the longest documents (with optimal GPT performance), BART still out-performed GPT, suggesting that the greatest influence on downstream performance is not the size of the training data or PLMs parameters but the structural suitability of the PLMs for the applied downstream task. The performance of different PLMs was also compared through analyzing parts of speech (POS) shares. BART's performance was inversely related to the proportion of prefixes, adjectives, adverbs and verbs but positively related to that of nouns. This result emphasizes the importance of taking the inference data's characteristics into account when fine-tuning a PLMs for its intended downstream task.
Keywords
Pre-train Language Model; Transformer; Abstractive text summarization; BART; GPT;
Citations & Related Records
Times Cited By KSCI : 4  (Citation Analysis)
연도 인용수 순위
1 Alomari, A., Norisma, I., Sabri, A. Q. M., and I. Alsmadi, "Deep reinforcement and transfer learning for abstractive text summarization: A review," Computer Speech & Language, Vol.71 (2022), 101276.   DOI
2 Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., and D. Amodei, "Language models are few-shot learners," In Advances in Neural Information Processing Systems, Vol.33(2020), 1877~1901.
3 Cao, Z., Wei, F., Li, W., and S. Li, "Faithful to the original: Fact aware neural abstractive summarization," In Proceedings of the Thirty Second AAAI Conference on Artificial Intelligence, (2018), 4784~4791.
4 K.R. Chowdhary, Fundamentals of Artificial Intelligence. Springer, 2020.
5 Liu, Y., Wan, Y., He, L., Peng, H., and P. S. Yu, "KG-BART: Knowledge Graph-Augmented BART for Generative Commonsense Reasoning," In Proceedings of the Association for the Advancement of Artificial Intelligence, Vol.35 (2021), 6418~6425.
6 Martschat, S., and K. Markert, "Improving ROUGE for Timeline Summarization," In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, Vol.2(2017), 285~290.
7 Park, H-y, and K.-j., Kim, "Recommender system using BERT sentiment analysis," Journal of Intelligence and Information Systems, Vol.27, No.2(2021), 1~15.   DOI
8 Park, H-j, and K.-s., Shin, "Aspect-Based Sentiment Analysis Using BERT: Developing Aspect Category Sentiment Classification Models," Journal of Intelligence and Information Systems, Vol.26, No.4(2020), 1~25.   DOI
9 Radford, A., Narasimhan, K., Salimans, T., and I. Sutskever, "Improving language understanding by generative pre-training," In https://s3-us-west-2.amazonaws.com/openaiassets/research-covers/languageunsupervised/language understanding paper, (2018).
10 Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., and I. Sutskever, "Language models are unsupervised multitask learners," OpenAI Blog, 1(8):9(2019).
11 Falke, T., Ribeiro, L. F. R., Utama, P. A., Dagan, I., and I. Gurevych, "Ranking generated summaries by correctness: An interesting but challenging application for natural language inference," In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, (2019), 2214~2220.
12 Dong, Y., Wang, S., Gan, Z., Cheng, Y., Cheung, J. C. K., and J. Liu, "Multi-fact correction in abstractive text summarization," In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, (2020), 9320~9331.
13 Egonmwan, E., and Y. Chali, "Transformer and seq2seq model for Paraphrase Generation," In Proceedings of the 3rd Workshop on Neural Generation and Translation, China. Association for Computational Linguistics, (2019), 249~255.
14 El-Kassas, W. S., Salama, C. R., Rafea, A. A., and H. K. Mohamed, "Automatic Text Summarization: A Comprehensive Survey," Expert Systems with Applications, Vol.165(2021), 113679.   DOI
15 Cho, K., Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., and Y. Bengio, "Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation," In Proceedings of the Empirical Methods in Natural Language Processing, (2014), 1724~1734.
16 Shin, J., Noh, Y., Song, H.-J, and S., Park, "Solving Factual Inconsistency in Abstractive Summarization using Named Entity Fact Discrimination," Journal of KIISE, Vol.49, No.3(2022), 231~240.   DOI
17 Sridhar R., and D. Yang, "Explaining Toxic Text via Knowledge Enhanced Text Generation," In North American Chapter of the Association for Computational Linguistics, (2022), 811~826.
18 Tang, T., Li, J., Zhao, W. X., and J. R. Wen, "Context-Tuning: Learning Contextualized Prompts for Natural Language Generation," In arXiv preprint arXiv:2201.08670, (2022).
19 Kryscinski, W., McCann, B., Xiong, C., and R. Socher, "Evaluating the factual consistency of abstractive text summarization," In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, (2020), 9332~9346.
20 Alyafeai, Z., AlShaibani, M. S., and I. Ahmad, "A survey on transfer learning in natural language processing," In ArXiv, vol. abs/2007.04239, (2020).
21 Devlin, J., Chang, M. W., Lee, K., and K. Toutanova, "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding," In arXiv:1810.04805, https://arxiv.org/abs/1810.04805, (2018).
22 Gupta, S., and S. K. Gupta, "Abstractive summarization: An overview of the state of the art," Expert Systems with Applications, Vol.121, No.1(2019), 49~65.   DOI
23 Howard, J., and S. Ruder, "Universal Language Model Fine-tuning for Text Classification," In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Vol.1(2018), 328~339.
24 Fu, X., Wang, J., Zhang, J., Wei, J., and Z. Yang, "Document Summarization with VHTM: Variational Hierarchical Topic-Aware Mechanism," In Proceedings of the Association for the Advancement of Artificial Intelligence, (2020), 7740~7747.
25 Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., and I. Polosukhin, "Attention is all you need," Advances in neural information processing systems, Vol.30(2017).
26 Weng, R., Yu, H., Huang, S., Cheng, S., and W. Luo, "Acquiring Knowledge from Pre-Trained Model to Neural Machine Translation," In Proceedings of the Association for the Advancement of Artificial Intelligence Conference on Artificial Intelligence, Vol.34(2020), 9266~9273.
27 Wu, S., Zhao, X., Yu, T., Zhang, R., Shen, C., Liu, H., Li, F., Zhu, H., Luo, J., Xu, L., and X. Zhang, "Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning," In ArXiv, abs/2110.04725, (2021).
28 Yun. Y. Ko, E, and N. Kim, "Subject-Balanced Intelligent Text Summarization Scheme," Journal of Intelligence and Information Systems, Vol.25, No.2(2019), 141~166.   DOI
29 Heu, J.-U, "Analysis and Comparison of Query focused Korean Document Summarization using Word Embedding," The Journal of the Institute of Internet, Broadcasting and Communication, Vol.19, No.6(2019), 161~167.   DOI
30 Kryscinski, W., Keskar, N. S., McCann, B., Xiong, C., and R. Socher, "Neural text summarization: A critical evaluation," In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, (2019), 540~551.
31 Huang, D., Cui, L., Yang, S., Bao, G. Wang, K., Xie, J. and Y. Zhang, "What have we achieved on text summarization?," In Proceedings of Conference on Empirical Methods in Natural Language Processing, (2020), 446~469.
32 Li, J., Tang, T., Nie, J.Y., Wen, J. R., and W. X. Zhao, "Learning to Transfer Prompts for Text Generation," In arXiv preprint arXiv:2205.01543, http://arxiv.org/abs/2205.01543, (2022).
33 Lu, X., West, P., Zellers, R., Bras, R. L., Bhagavatula, C., and Y. Choi, "NEUROLOGIC DECODING: (Un)supervised Neural Text Generation with Predicate Logic Constraints," In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, (2021), 4288~4299.
34 Peters, M., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., and L. Zettlemoyer, "Deep Contextualized Word Representations," In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol.1(2018), 2227~2237.
35 Zhang, S., Roller, S., Goyal, N., Artetxe, M., Chen, M., Chen, S., Dewan, C., Diab, M., Li, X., Lin, X.V., Mihaylov, T., Ott, M., Shleifer, S., Shuster, K., Simig, D., Koura, P.S., Sridhar, A., Wang, T., and L. Zettlemoyer, "OPT: Open Pre-trained Transformer Language Models," In arXiv preprint arXiv:2205.01068, (2022).
36 Zhu, C., Hinthorn, W., Xu, R., Zeng, Q., Zeng, M., Huang, X., and M. Jiang, "Enhancing factual consistency of abstractive summarization," In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, (2021), 718~733.
37 Luo, Y., Lu, M., Liu, G., and S. Wang, "Few-Shot Table-to-Text Generation with Prefix-Controlled Generator," In THE 29TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL LINGUISTICS, arXiv:2208.10709, (2022).
38 Zaken, E. B., Ravfogel, S., and Y. Goldberg, "BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models," In Proceedings of the Association for Computational Linguistics, (2022).
39 Ermakova, L., Cossu, J.V., and J. Mothe, "A survey on evaluation of summarization methods," Information Processing and Management, Vol.56, No.5(2019), 1794~1814.   DOI
40 Narayan, S., Cohen, S., and M. Lapata, "Don't Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization," In Proceedings of the Empirical Methods in Natural Language Processing, (2018), 1797~1807.
41 Ruder, S., Peters, M. E., Swayamdipta, S., and T. Wolf, "Transfer learning in natural language processing," In Proceedings of NAACL-HLT: Tutorials, (2019), 15~18.
42 Guan, J., Mao, X., Fan, C., Liu, Z., Ding, W., and M. Huang, "Long Text Generation by Modeling Sentence-Level and Discourse-Level Coherence," In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Vol.1(2021), 6379~6393.
43 Lin, B. Y., Zhou., W., Shen, M., Zhou, P., Bhagavatula, C., Choi, Y., and X. Ren, "CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning," In Findings of the Association for Computational Linguistics: Empirical Methods in Natural Language Processing, (2020), 1823~1840.
44 Deng, S., Zhang, N., Yang, J., Ye, H., Tan, C., Chen, M., Huang, S., Huang, F., and H. Chen, "LOGEN: Few-shot Logical Knowledge-Conditioned Text Generation with Self-training," In CoRR abs/2112.01404. arXiv:2112.01404, https://arxiv.org/abs/2112.01404, (2021).
45 Wang, Y., Li, J., Pong, H., King, I., Lyu, M., and S. Shi, "Topic-Aware Neural Key phrase Generation for Social Media Language," In Proceedings of the Association for Computational Linguistics, (2019), 2516~2526
46 Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., Stoyanov, V., and L. Zettlemoyer, "Bart: Denoising sequence-to-sequencepre-training for natural language generation, translation, and comprehension." In arXiv preprint arXiv:1910.13461, (2019).