Browse > Article
http://dx.doi.org/10.5391/IJFIS.2016.16.4.293

Document Summarization via Convex-Concave Programming  

Kim, Minyoung (Department of Electronics & IT Media Engineering, Seoul National University of Science & Technology)
Publication Information
International Journal of Fuzzy Logic and Intelligent Systems / v.16, no.4, 2016 , pp. 293-298 More about this Journal
Abstract
Document summarization is an important task in various areas where the goal is to select a few the most descriptive sentences from a given document as a succinct summary. Even without training data of human labeled summaries, there has been several interesting existing work in the literature that yields reasonable performance. In this paper, within the same unsupervised learning setup, we propose a more principled learning framework for the document summarization task. Specifically we formulate an optimization problem that expresses the requirements of both faithful preservation of the document contents and the summary length constraint. We circumvent the difficult integer programming originating from binary sentence selection via continuous relaxation and the low entropy penalization. We also suggest an efficient convex-concave optimization solver algorithm that guarantees to improve the original objective at every iteration. For several document datasets, we demonstrate that the proposed learning algorithm significantly outperforms the existing approaches.
Keywords
Document summarization; Natural language processing; Text mining; Optimization;
Citations & Related Records
연도 인용수 순위
  • Reference
1 H. P. Luhn, "The automatic creation of literature abstracts," IBM Journal of Research and Development, vol. 2, no. 2, pp. 159-165, 1958. http://dx.doi.org/10.1147/rd.22.0159   DOI
2 C. Y. Lin and E. Hovy, "The automated acquisition of topic signatures for text summarization," in Proceedings of the 18th Conference on Computational Linguistics, Saarbrucken, Germany, 2000, pp. 495-501. http://dx.doi.org/10.3115/990820.990892
3 J. Kupiec, J. Pedersen, and F. Chen, "A trainable document summarizer," in Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development In Information Retrieval, Seattle, WA, 1995, pp. 68-73. http://dx.doi.org/10.1145/215206.215333
4 J. M. Conroy and D. P. O'leary, "Text summarization via hidden Markov models," in Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, LA, 2001, pp. 406-407. http://dx.doi.org/10.1145/383952.384042
5 C. Y. Lin and E. Hovy, "Automatic evaluation of summaries using N-gram co-occurrence statistics," in Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, Edmonton, Canada, 2003, pp. 71-78. http://dx.doi.org/10.3115/1073445.1073465
6 A. L. Yuille and A. Rangarajan, "The concave-convex procedure," Neural Computation, vol. 15, no. 4, pp. 915-936, 2003. http://dx.doi.org/10.1162/08997660360581958   DOI
7 M. Osborne, "Using maximum entropy for sentence extraction," in Proceedings of the ACL-02 Workshop on Automatic Summarization, Philadelphia, PA, 2002, pp. 1-8. http://dx.doi.org/10.3115/1118162.1118163
8 K. M. Svore, L. Vanderwende, and C. J. C. Burges, "Enhancing single-document summarization by combining RankNet and third-party sources," in Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, Czech, 2007, pp. 448-457.
9 H. Lin and J. Bilmes, "A class of submodular functions for document summarization," in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, OR, 2011, pp. 510-520.
10 Y. Ye, Interior point algorithms: theory and analysis. New York: John Wiley & Sons, 1997.