Browse > Article
http://dx.doi.org/10.3745/JIPS.2013.9.4.602

Automatic Single Document Text Summarization Using Key Concepts in Documents  

Sarkar, Kamal (Dept. of Computer Science and Engineering, Jadavur University)
Publication Information
Journal of Information Processing Systems / v.9, no.4, 2013 , pp. 602-620 More about this Journal
Abstract
Many previous research studies on extractive text summarization consider a subset of words in a document as keywords and use a sentence ranking function that ranks sentences based on their similarities with the list of extracted keywords. But the use of key concepts in automatic text summarization task has received less attention in literature on summarization. The proposed work uses key concepts identified from a document for creating a summary of the document. We view single-word or multi-word keyphrases of a document as the important concepts that a document elaborates on. Our work is based on the hypothesis that an extract is an elaboration of the important concepts to some permissible extent and it is controlled by the given summary length restriction. In other words, our method of text summarization chooses a subset of sentences from a document that maximizes the important concepts in the final summary. To allow diverse information in the summary, for each important concept, we select one sentence that is the best possible elaboration of the concept. Accordingly, the most important concept will contribute first to the summary, then to the second best concept, and so on. To prove the effectiveness of our proposed summarization method, we have compared it to some state-of-the art summarization systems and the results show that the proposed method outperforms the existing systems to which it is compared.
Keywords
Automatic Text Summarization; Key Concepts; Keyphrase Extraction;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 G. Salton, A. Singhal, M. Mitra and C. Buckley, "Automatic text structuring and summary," Journal of Information Processing and Management, vol. 33, no. 2, 1997, pp. 193-207.   DOI   ScienceOn
2 Y. Ledeneva, R. G. Hernández, R. M. Soto, R. C. Reyes and A. Gelbukh, "EM clustering algorithm for automatic text summarization," In Advances in Artificial Intelligence, Springer Berlin Heidelberg, 2011, pp. 305-315.
3 R. Barzilay and M. Elhadad, "Using Lexical Chains for Text Summarization," Proceedings of the Workshop on Intelligent Scalable Text Summarization, Madrid, Spain, 1997, pp. 10-17.
4 M. A. K Halliday and R. Hasan, "Cohesion in English," English Language Series, Longman, London, 1976.
5 J. Morris and G. Hirst, "Lexical cohesion computed by thesaural relations as an indicator of the structure of text," Computational Linguistics, vol. 17, no. 1, 1991, pp. 21-43.
6 R. A. García-Hernández and Y. Ledeneva, "Single Extractive Text Summarization Based on a Genetic Algorithm," In Pattern Recognition, Springer Berlin Heidelberg, 2013, pp. 374-383.
7 J. M. Conroy and D. P. O'Leary, "Text summarization via hidden Markov models and pivoted QR matrix decomposition," Tech. Rep., University of Maryland, College Park, 2001.
8 M. Osborne, "Using maximum entropy for sentence extraction," Proceedings of the ACL-02, Proceedings of Workshop on Automatic Summarization, (Philadelphia, Pennsylvania), Annual Meeting of the ACL, Association for Computational Linguistics, Morristown, vol. 4, 2002.
9 C. D. Paice and P.A. Jones, "The identification of important concepts in highly structured technical papers," Proceedings of the 16th International Conference on Research and Development in Information Retrieval (SIGIR '93), 1993, pp. 69-78.
10 I. Mani, "Automatic summarization," Vol. 3, Amsterdam/Philadelphia: John Benjamins Publishing Company, 2001.
11 D. R. Radev, H. Jing, M. Sty and D. Tam, "Centroid-based summarization of multiple documents," Journal of Information Processing and Management, Elsevier, Volume 40, no. 6, 2004, pp. 919-938.   DOI   ScienceOn
12 P. D. Turney, "Learning algorithm for keyphrase extraction," Journal of Information Retrieval, vol. 2, no. 4, 2000, pp. 303-36.   DOI   ScienceOn
13 D. Zajic, B. Dorr and R. Schwartz, "Automatic Headline Generation for Newspaper Stories," Workshop on Automatic Summarization, Philadelphia, PA, 2002, pp. 78-85.
14 H. Jing and K. McKeown, "The decomposition of human-written summary sentences," Proceedings of SIGIR '99: 22nd International Conference on Research and Development in Information Retrieval, University of California, Berkeley, August, 1999, pp. 129-136.
15 H. Jing, "Using hidden Markov modeling to decompose human-written summaries," Computational Linguistics, vol. 28, no. 4, 2002, pp. 527-543.   DOI   ScienceOn
16 M. Banko, V. Mittal and M. Witbrock, "Headline generation based on statistical Translation," Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics (ACL-2000), Hong Kong, 2000, pp. 318-325.
17 B. J. Dorr, D. Zajic and R. Schwartz, "Hedge trimmer: A parse-and-trim approach to headline generation," Proceedings of the HLT/NAACL 2003 Text Summarization Workshop and Document Understanding Conference (DUC 2003), Edmonton, Alberta, 2003, pp. 1-8.
18 D. Zajic, B. J. Dorr and R. Schwartz, "BBN/UMD at DUC-2004: Topiary," Proceedings of the North American Chapter of the Association for Computational Linguistics, Workshop on Document Understanding, Boston, MA, 2004, pp. 112-119.
19 X. Wan and J. Xiao, "Exploiting Neighborhood Knowledge for Single Document Summarization and Keyphrase Extraction," ACM Transactions on Information Systems, vol. 28, no. 2, Article 8, 2010, pp. 8:1-8:34.
20 J. Steinberger, M. Poesio, M.A. Kabadjov and K. Jezek, "Two uses of anaphora resolution in summarization," Information Processing & Management, vol. 43, no. 6, 2007, pp. 1663-1680.   DOI   ScienceOn
21 S. Elbeltagy and A. Rafea, "Kp-miner: A keyphrase extraction system for English and Arabic documents," Information Systems, vol. 34, no. 1, 2009, pp. 132-144.   DOI   ScienceOn
22 Y. B. Wu and Q. Li, "Document keyphrases as subject metadata: incorporating document key concepts in search results," Journal of Information Retrieval, vol. 11, no. 3, 2008, pp. 229-249.   DOI   ScienceOn
23 C. Y. Lin, "ROUGE: A package for automatic evaluation of summaries," Proceedings of the Workshop on Text Summarization Branches Out, July 25-26, Barcelona, Spain, 2004.
24 C.-Y. Lin and E. Hovy, "Automatic evaluation of summaries using n-gram co-occurrence," Proceedings of 2003 Language Technology Conference (HLT-NAACL 2003), Edmonton, Canada, 2003.
25 A. Nenkova, "Automatic Text Summarization of Newswire: Lessons Learned from the Document Understanding Conference," AAAI, 2005.
26 I. H. Witten, G.W. Paynter, E. Frank, C. Gutwin and C. G. Nevill-Manning, "KEA: Practical Automatic Keyphrase Extraction," Proceedings of Digital Libraries'99: The Fourth ACM Conference on Digital Libraries, ACM Press, Berkeley, CA, 1999, pp. 254 - 255.
27 K.Sarkar, M.Nasipuri and S.Ghose, "Machine Learning Based Keyphrase Extraction: Comparing Decision Trees, Naïve Bayes and Artificial Neural Networks," Int J Inf Process Syst, vol. 8, no. 4, 2012, pp.693-712.   DOI   ScienceOn
28 P. Baxendale, "Man-made index for technical literature-An experiment," IBM Journal of Research and Development, vol. 2, no. 4, 1958, pp. 354 - 361.   DOI
29 H. P. Luhn, "The automatic creation of literature abstracts," IBM Journal of Research Development, vol. 2, no. 2, 1958, pp.159-165.   DOI
30 H. P. Edmundson, "New methods in automatic extracting," Journal of the Association for Computing Machinery, vol. 16, no. 2, 1969, pp.264-285.   DOI
31 K. Sarkar, "An approach to summarizing Bengali news documents," Proceedings of the International Conference on Advances in Computing, Communications and Informatics, ACM, 2012, pp. 857-862.
32 K. Sarkar, "Bengali text summarization by sentence extraction," Proceedings of International Conference on Business and Information Management, NIT Durgapur, 2012, pp. 233-245.
33 K. Sarkar, M. Nasipuri, S. Ghose, "Using Machine Learning for Medical Document Summarization," International Journal of Database Theory and Application, Vol. 4, no. 1, pp. 31- 48, 2010.