DOI QR코드

DOI QR Code

Semantic Pre-training Methodology for Improving Text Summarization Quality

텍스트 요약 품질 향상을 위한 의미적 사전학습 방법론

  • 전민규 (국민대학교 비즈니스IT전문대학원) ;
  • 김남규 (국민대학교 비즈니스IT전문대학원)
  • Received : 2023.04.05
  • Accepted : 2023.05.17
  • Published : 2023.06.30

Abstract

Recently, automatic text summarization, which automatically summarizes only meaningful information for users, is being studied steadily. Especially, research on text summarization using Transformer, an artificial neural network model, has been mainly conducted. Among various studies, the GSG method, which trains a model through sentence-by-sentence masking, has received the most attention. However, the traditional GSG has limitations in selecting a sentence to be masked based on the degree of overlap of tokens, not the meaning of a sentence. Therefore, in this study, in order to improve the quality of text summarization, we propose SbGSG (Semantic-based GSG) methodology that selects sentences to be masked by GSG considering the meaning of sentences. As a result of conducting an experiment using 370,000 news articles and 21,600 summaries and reports, it was confirmed that the proposed methodology, SbGSG, showed superior performance compared to the traditional GSG in terms of ROUGE and BERT Score.

최근 사용자에게 의미있는 정보만을 자동으로 간추리는 텍스트 자동 요약이 꾸준히 연구되고 있으며, 특히 인공신경망 모델인 트랜스포머를 활용한 텍스트 요약 연구가 주로 수행되고 있다. 다양한 연구 중 특히 문장 단위 마스킹을 통해 모델을 학습시키는 GSG 방식이 가장 주목을 받고 있지만, 전통적인 GSG는 문장의 의미가 아닌 토큰의 중복 정도에 기반을 두어 마스킹 대상 문장을 선정한다는 한계를 갖는다. 따라서 본 연구에서는 텍스트 요약의 품질을 향상시키기 위해, 문장의 의미를 고려하여 GSG의 마스킹 대상 문장을 선정하는 SbGSG(Semantic-based GSG) 방법론을 제안한다. 뉴스기사 370,000건과 요약문 및 레포트 21,600건을 사용하여 실험을 수행한 결과, ROUGE와 BERT Score 측면에서 제안 방법론인 SbGSG가 전통적인 GSG에 비해 우수한 성능을 보임을 확인하였다.

Keywords

Acknowledgement

본 논문은 교육부 및 한국연구재단의 4단계 두뇌한국21 사업(4단계 BK21 사업)으로 지원된 연구임 본 논문은 과학기술정보통신부 및 정보통신산업진흥원의 '고성능 컴퓨팅 지원' 사업으로부터 지원받아 수행하였음

References

  1. Y. Liu, "Fine-tune BERT for extractive summarization," arXiv:1903.10318, 2019.
  2. J. Xu and G. Durrett, "Neural extractive text summarization with syntactic compression," arXiv:1902.00863, 2019.
  3. M. Zhong, P. Liu, Y. Chen, D. Wang, X. Qiu, and X. Huang, "Extractive summarization as text matching," arXiv:2004.08795, 2020
  4. R. Nallapati, B. Zhou, C. Gulcehre, and B. Xiang, "Abstractive text summarization using sequence-to-sequence rnns and beyond," arXiv:1602.06023, 2016
  5. A. M. Rush, S. Chopra, and J. Weston, "A neural attention model for abstractive sentence summarization," arXiv:1509.00685, 2015
  6. A. See, P. J. Liu, and C. D. Manning, "Get to the point: Summarization with pointer-generator networks," arXiv:1704.04368, 2017
  7. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L Kaiser, and I. Polosukhin, "Attention is all you need," Advances in Neural Information Processing Systems, vol. 30, 2017.
  8. J. Zhang, Y. Zhao, M. Saleh, and P. Liu, "Pegasus: Pre-training with extracted gap-sentences for abstractive summarization," Proceedings of the 37th International Conference on Machine Learning, Vol. 119, pp. 11328-11339, 2020.
  9. C. Lin, "Rouge: A package for automatic evaluation of summaries," Text Summarization Branches Out, pp. 74-81, Barcelona, Spain, Jul. 2004.
  10. H. P. Luhn, "A statistical approach to mechanized encoding and searching of literary information," IBM Journal of Research and Development, vol. 1, no. 4, pp. 309-317, 1957. https://doi.org/10.1147/rd.14.0309
  11. M. A. Fattah and F. Ren, "GA, MR, FFNN, PNN and GMM based models for automatic text summarization," Comput. Speech Lang., vol. 23, no. 1, pp. 126-144, 2009. https://doi.org/10.1016/j.csl.2008.04.002
  12. R. Mihalcea and P. Tarau, "Textrank: Bringing order into text," Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 404-411, Barcelona, Spain, Jul. 2004.
  13. L. Page, S. Brin, R. Motwani, and T. Winograd, "The PageRank Citation Ranking: Bringing Order to the Web," Stanford University technical report, 1998.
  14. 차준석, 김정인, 김판구, "단어 간 의미적 연관성을 고려한 어휘 체인 기반의 개선된 자동 문서요약 방법," 스마트미디어저널, vol. 6, no. 1, pp. 22-29, 2017
  15. R. Nallapati, B. Zhou, and M. Ma, "Classify or select: Neural architectures for extractive document summarization," arXiv:1611.04244, 2016.
  16. A. Khan and N. Salim, "A review on abstractive summarization methods," Journal of Theoretical and Applied Information Technology, vol. 59, no. 1, pp. 64-72, 2014.
  17. 이태석, 선충녕, 정영임, 강승식, "미등록 어휘에 대한 선택적 복사를 적용한 문서 자동요약," 스마트미디어저널, vol. 8, no. 2, pp. 58-65, 2019
  18. S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural Comput., vol. 9, no 8, pp. 1735-1780, 1997. https://doi.org/10.1162/neco.1997.9.8.1735
  19. I. Sutskever, O. Vinyals, and Q. V. Le, "Sequence to sequence learning with neural networks," Advances in Neural Information Processing Systems, vol. 27, 2014.
  20. D. Bahdanau, K. Cho and Y. Bengio, "Neural machine translation by jointly learning to align and translate," arXiv:1409.0473, 2014.
  21. A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, "Improving language understanding by generative pre-training," 2018.
  22. A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, "Language models are unsupervised multitask learners," OpenAI Blog, vol. 1, no. 8, pp. 9, 2019.
  23. T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, and A. Askell, "Language models are few-shot learners," Advances in Neural Information Processing Systems, vol. 33, pp. 1877-1901, 2020.
  24. R. Thoppilan, D. De Freitas, J. Hall, N. Shazeer, A. Kulshreshtha, H. Cheng, A. Jin, T. Bos, L. Baker, and Y. Du, "Lamda: Language models for dialog applications," arXiv:2201.08239, 2022.
  25. H. Touvron, T. Lavril, G. Izacard, X. Martinet, M. Lachaux, T. Lacroix, B. Roziere, N. Goyal, E. Hambro, and F. Azhar, "Llama: Open and efficient foundation language models," arXiv:2302.13971, 2023.
  26. A. Chowdhery, S. Narang, J. Devlin, M. Bosma, G. Mishra, A. Roberts, P. Barham, H.W. Chung, C. Sutton, and S. Gehrmann, "Palm: Scaling language modeling with pathways," arXiv:2204.02311, 2022.
  27. J. Devlin, M. Chang, K. Lee, and K. Toutanova, "Bert: Pre-training of deep bidirectional transformers for language understanding," arXiv:1810.04805, 2018.
  28. M. Joshi, D. Chen, Y. Liu, D. S. Weld, L. Zettlemoyer, and O. Levy, "Spanbert: Improving pre-training by representing and predicting spans," Transactions of the Association for Computational Linguistics, vol. 8, pp. 64-77, 2020. https://doi.org/10.1162/tacl_a_00300
  29. 김은희, 신주현, 임명진, "ELMo 임베딩 기반 문장 중요도를 고려한 중심 문장 추출방법," 스마트미디어저널, vol. 10, no. 1, pp. 39-46, 2021.
  30. N. Reimers and I. Gurevych, "Sentence-bert: Sentence embeddings using siamese bert-networks," arXiv:1908.10084, 2019.
  31. C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P.J. Liu, "Exploring the limits of transfer learning with a unified text-to-text transformer," The Journal of Machine Learning Research, vol. 21, no. 1, pp. 5485-5551, 2020.