Browse > Article
http://dx.doi.org/10.9708/jksci.2020.25.04.141

Evaluation of English Term Extraction based on Inner/Outer Term Statistics  

Kang, In-Su (Dept. of Computer Science, Kyungsung University)
Abstract
Automatic term extraction is to recognize domain-specific terms given a collection of domain-specific text. Previous term extraction methods operate effectively in unsupervised manners which include extracting candidate terms, and assigning importance scores to candidate terms. Regarding the calculation of term importance scores, the study focuses on utilizing sets of inner and outer terms of a candidate term. For a candidate term, its inner terms are shorter terms which belong to the candidate term as components, and its outer terms are longer terms which include the candidate term as their component. This work presents various functions that compute, for a candidate term, term strength from either set of its inner or outer terms. In addition, a scoring method of a term importance is devised based on C-value score and the term strength values obtained from the sets of inner and outer terms. Experimental evaluations using GENIA and ACL RD-TEC 2.0 datasets compare and analyze the effectiveness of the proposed term extraction methods for English. The proposed method performed better than the baseline method by up to 1% and 3% respectively for GENIA and ACL datasets.
Keywords
Term extraction; Inner term set; Outer term set; Term importance score; Domain term;
Citations & Related Records
연도 인용수 순위
  • Reference
1 T. Koutropoulou, and E. Gallopoulos, "TMG-BoBI: Generating Back-of-the-Book Indexes with the Text-to-Matrix-Generator," Proceedings of 10th International Conference on Information, Intelligence, Systems and Applications, 2019.
2 Z. Wu, Z. Li, P. Mitra, and C. Giles, "Can back-of-the-book indexes be automatically created?," Proceedings of the 22nd ACM International Conference on Information and Knowledge Management, pp. 1745-1750, 2013.
3 N. Simon, and V. Keselj, "Automatic Term Extraction in Technical Domain using Part-of-Speech and Common-Word Features," Proceedings of the ACM Symposium on Document Engineering, 2018.
4 G. Petasis, V. Karkaletsis, G. Paliouras, A. Krithara, and E. Zavitsanos, "Ontology Population and Enrichment: State of the Art," Knowledge-Driven Multimedia Information Extraction and Ontology Evolution, pp. 134-166, 2011.
5 M. Asim, M. Wasim, M. Khan, W. Mahmood, and H. Abbasi, "A survey of ontology learning techniques and applications," Database, Vol. 2018, 2018.
6 K. Frantzi, S. Ananiadou, and H. Mima, "Automatic recognition of multi-word terms:. the c-value/nc-value method," International Journal on Digital Libraries, Vol 3, No. 2, pp. 115-130, 2000.   DOI
7 G. Bordea, P. Buitelaar, and T. Polajnar, "Domain-independent term extraction through domain modelling," Proceedings of the 10th International Conference on Terminology and Artificial Intelligence, 2013.
8 S. Rose, D. Engel, N. Cramer, and W. Cowley, "Automatic keyword extraction from individual documents," Text Mining: Applications and Theory, John Wiley & Sons Ltd, 2010.
9 H. Nakagawa, and T. Mori, "A Simple but Powerful Automatic Term Extraction Method," COLING-02: COMPUTERM 2002: Second International Workshop on Computational Terminology, 2002.
10 N. Astrakhantsev, "Methods and software for terminology extraction from domain specific text collection," Ph.D. thesis, Institute for System Programming of Russian Academy of Sciences, 2015.
11 Z. Zhang, J. Gao, and F. Ciravegna, "JATE 2.0: Java Automatic Term Extraction with Apache Solr," Proceedings of the Tenth International Conference on Language Resources and Evaluation, 2016.
12 K. Meijer, F. Frasincar, and F. Hogenboom, "A semantic approach for extracting domain taxonomies from text," Decision Support Systems, Vol. 62, pp. 78-93, 2014.   DOI
13 K. Ahmad, L. Gillam, and L. Tostevin, "University of Surrey Participation in TREC8: Weirdness Indexing for Logical Document Extrapolation and Retrieval (WILDER)," Proceedings of The Eighth Text REtrieval Conference, 1999.
14 J. Ventura, C. Jonquet, M. Roche, and M. Teisseire, "Combining c-value and keyword extraction methods for biomedical terms extraction," International Symposium on Languages in Biology and Medicine, pp. 45-49, 2013.
15 B. QasemiZadeh, and A. Schumann, "The ACL RD-TEC 2.0: A Language Resource for Evaluating Term Extraction and Entity Recognition Methods," Proceedings of the Tenth International Conference on Language Resources and Evaluation, 2016.
16 J. Kim, T. Ohta, Y. Tateisi, and J. Tsujii, "GENIA corpus - a semantically annotated corpus for bio-textmining," ISMB (Supplement of Bioinformatics), pp. 180-182, 2003.
17 A. Sajatovic, M. Buljan, J. Snajder, and B. Dalbelo, "Basic: Evaluating Automatic Term Extraction Methods on Individual Documents," Proceedings of the Joint Workshop on Multiword Expressions and WordNet, pp. 149-154, 2019.
18 SpaCy. https://spacy.io/
19 M. Marcus, B. Santorini, and M. Marcinkiewicz, "Building a Large Annotated Corpus of English: The Penn Treebank," Computational Linguistics, Vol. 19, No. 2, pp. 313-330, 1993.
20 N. Astrakhantsev, "ATR4S: Toolkit with State-of-the-art Automatic Terms Recognition Methods in Scala," CoRR abs/1611.07804, 2016.
21 Z. Zhang, J. Gao, and F. Ciravegna, "SemRe-Rank: Incorporating Semantic Relatedness to Improve Automatic Term Extraction Using Personalized PageRank," CoRR abs/1711.03373, 2017.