Browse > Article
http://dx.doi.org/10.9708/jksci.2021.26.08.047

Adjusting Weights of Single-word and Multi-word Terms for Keyphrase Extraction from Article Text  

Kang, In-Su (Dept. of Computer Science, Kyungsung University)
Abstract
Given a document, keyphrase extraction is to automatically extract words or phrases which topically represent the content of the document. In unsupervised keyphrase extraction approaches, candidate words or phrases are first extracted from the input document, and scores are calculated for keyphrase candidates, and final keyphrases are selected based on the scores. Regarding the computation of the scores of candidates in unsupervised keyphrase extraction, this study proposes a method of adjusting the scores of keyphrase candidates according to the types of keyphrase candidates: word-type or phrase-type. For this, type-token ratios of word-type and phrase-type candidates as well as information content of high-frequency word-type and phrase-type candidates are collected from the input document, and those values are employed in adjusting the scores of keyphrase candidates. In experiments using four keyphrase extraction evaluation datasets which were constructed for full-text articles in English, the proposed method performed better than a baseline method and comparison methods in three datasets.
Keywords
Keyphrase; Keyphrase extraction; Score adjustment; Type-token ratio; Unsupervised keyphrase extraction;
Citations & Related Records
연도 인용수 순위
  • Reference
1 R. Mihalcea, and P. Tarau, "TextRank: Bringing Order into Texts," Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 404-411, 2004.
2 S. Kim, O. Medelyan, M. Kan, and T. Baldwin, "SemEval-2010 Task 5 : Automatic Keyphrase Extraction from Scientific Articles," Proceedings of the 5th International Workshop on Semantic Evaluation, pp. 21-26, 2010.
3 T. Nguyen, and M. Kan, "Keyphrase Extraction in Scientific Publications," Proceedings of the 10th International Conference on Asian Digital Libraries, pp. 317-326, 2007.
4 E. Papagiannopoulou, and G. Tsoumakas, "A review of keyphrase extraction," Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, Vol. 10, No. 2, 2020.
5 B. Richards, "Type/Token Ratios: what do they really tell us?," Journal of Child Language, Vol. 14(2), pp. 201-209, 1987.   DOI
6 X. Wan, and J. Xiao, "Single document keyphrase extraction using neighborhood knowledge," Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence, pp. 855-860, 2008.
7 C. Florescu, and C. Caragea, "A New Scheme for Scoring Phrases in Unsupervised Keyphrase Extraction," Advances in Information Retrieval - 39th European Conference on IR Research, pp. 477-483, 2017.
8 W. Johnson, "Studies in language behavior: A program of research," Psychological Monographs, Vol. 56, No. 2, pp. 1-15, 1944.   DOI
9 C. Shannon, "A mathematical theory of communication," Bell System Technical Journal, Vol. 27, No. 3, pp. 379-423, 1948.   DOI
10 O. Medelyan, E. Frank, and I. Witten, "Human-competitive tagging using automatic keyphrase extraction," Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pp. 1318-1327, 2009.
11 M. Krapivin, A. Autaeu, and M. Marchese, "Large Dataset for Keyphrases Extraction," University of Trento, Tech Report # DISI-09-055, 2009.
12 Datasets of Automatic Keyphrase Extraction, https://github.com/LIAAD/KeywordExtractor-Datasets
13 SpaCy. https://spacy.io/
14 M. Porter, "An Algorithm for Suffix Stripping," Program, Vol. 14, No. 3, pp. 130-137, 1980.   DOI
15 R. Campos, V. Mangaravite, A. Pasquali, A. Jorge, C. Nunes, and A. Jatowt, "YAKE! Keyword extraction from single documents using multiple local features," Information Sciences, Vol. 509, pp. 257-289, 2020.   DOI
16 P. Turney, "Learning Algorithms for Keyphrase Extraction," Information Retrieval, Vol. 2, No. 4, pp. 303-336, 2000.   DOI
17 K. Hasan, and V. Ng, "Automatic Keyphrase Extraction: A Survey of the State of the Art," Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 1262-1273, 2014.
18 S. El-Beltagy, and A. Rafea, "KP-Miner: Participation in SemEval-2," Proceedings of the 5th International Workshop on Semantic Evaluation, pp. 190-193, 2010.
19 C. Florescu, and C. Caragea, "PositionRank: An Unsupervised Approach to Keyphrase Extraction from Scholarly Documents," Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pp. 1105-1115, 2017.
20 F. Boudin, "Unsupervised Keyphrase Extraction with Multipartite Graphs," Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 667-672, 2018.