Browse > Article
http://dx.doi.org/10.3745/KTSDE.2022.11.3.125

Deletion-Based Sentence Compression Using Sentence Scoring Reflecting Linguistic Information  

Lee, Jun-Beom (경희대학교 컴퓨터공학과)
Kim, So-Eon (경희대학교 컴퓨터공학과)
Park, Seong-Bae (경희대학교 컴퓨터공학과)
Publication Information
KIPS Transactions on Software and Data Engineering / v.11, no.3, 2022 , pp. 125-132 More about this Journal
Abstract
Sentence compression is a natural language processing task that generates concise sentences that preserves the important meaning of the original sentence. For grammatically appropriate sentence compression, early studies utilized human-defined linguistic rules. Furthermore, while the sequence-to-sequence models perform well on various natural language processing tasks, such as machine translation, there have been studies that utilize it for sentence compression. However, for the linguistic rule-based studies, all rules have to be defined by human, and for the sequence-to-sequence model based studies require a large amount of parallel data for model training. In order to address these challenges, Deleter, a sentence compression model that leverages a pre-trained language model BERT, is proposed. Because the Deleter utilizes perplexity based score computed over BERT to compress sentences, any linguistic rules and parallel dataset is not required for sentence compression. However, because Deleter compresses sentences only considering perplexity, it does not compress sentences by reflecting the linguistic information of the words in the sentences. Furthermore, since the dataset used for pre-learning BERT are far from compressed sentences, there is a problem that this can lad to incorrect sentence compression. In order to address these problems, this paper proposes a method to quantify the importance of linguistic information and reflect it in perplexity-based sentence scoring. Furthermore, by fine-tuning BERT with a corpus of news articles that often contain proper nouns and often omit the unnecessary modifiers, we allow BERT to measure the perplexity appropriate for sentence compression. The evaluations on the English and Korean dataset confirm that the sentence compression performance of sentence-scoring based models can be improved by utilizing the proposed method.
Keywords
Sentence Compression; Linguistic Information; Language Model; Perplexity;
Citations & Related Records
연도 인용수 순위
  • Reference
1 T. Berg-Kirkpatrick, D. Berg-Kirkpatrick, and D. Klein, "Jointly learning to extract and compress," In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp.481-490, 2011.
2 H. Jing, "Sentence Reduction for Automatic Text Summarization," In Proceedings of the 6th Applied Natural Language Processing Conference, Seattle, pp.310-315, 2000.
3 K. Filippova, E. Alfonseca, C. Colmenares, L. Kaiser, and O. Vinyals, "Sentence compression by deletion with LSTMs," In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Lisbon, pp.360-368, 2015.
4 K. Filippova and Y. Altun, "Overcoming the lack of parallel data in sentence compression," In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Seattle, pp.1481-1491, 2013.
5 I. Jung, S. Choi, and S. Park, "Single sentence summarization with an event word attention mechanism," Journal of Korean Institute of Information Scientists and Engineers, Vol.47, No.2, pp.155-161, 2020.
6 K. Knight and D. Marcu, "Statistics-based summarization-step one: Sentence compression," In Proceedings of the Conference on Innovative Applications of Artificial Intelligence, Texas, pp.703-710, 2000.
7 K. Filippova and M. Strube, "Dependency tree based sentence compression," In Proceedings of the Fifth International Natural Language Generation Conference, pp.25-32, 2008.
8 I. Sutskever, O. Vinyals, and Q. Le, "Sequence to sequence learning with neural networks," In Proceedings of the Advances in Neural Information Processing Systems, pp.3104-3112, 2014.
9 D. Bahdanau, K. Cho, and Y. Bengio, "Neural machine translation by jointly learning to align and translate," In Proceedings of the International Conference on Learning Representations, San Diego, 2015.
10 G. Lee, Y. Park, and K. Lee, "Building a Korean sentence-compression corpus by analyzing sentences and deleting words," Journal of Korean Institute of Information Scientists and Engineers, Vol.48, No.2, pp.193-194, 2021.
11 T. Niu, C. Xiong, and R. Socher, "Deleter: Leveraging BERT to perform unsupervised successive text compression," arXiv preprint arXiv:1909.03223, 2019.
12 G. Lee, "A study on korean document summarization using extractive summarization and sentence compression," Ph.D. Dissertation. Chungnam National University, Korea, 2020.