Browse > Article
http://dx.doi.org/10.3745/KTSDE.2016.5.9.419

Named Entity Recognition for Patent Documents Based on Conditional Random Fields  

Lee, Tae Seok (한국과학기술정보연구원 정보서비스실)
Shin, Su Mi (한국과학기술정보연구원 정보서비스실)
Kang, Seung Shik (국민대학교 컴퓨터공학부)
Publication Information
KIPS Transactions on Software and Data Engineering / v.5, no.9, 2016 , pp. 419-424 More about this Journal
Abstract
Named entity recognition is required to improve the retrieval accuracy of patent documents or similar patents in the claims and patent descriptions. In this paper, we proposed an automatic named entity recognition for patents by using a conditional random field that is one of the best methods in machine learning research. Named entity recognition system has been constructed from the training set of tagged corpus with 660,000 words and 70,000 words are used as a test set for evaluation. The experiment shows that the accuracy is 93.6% and the Kappa coefficient is 0.67 between manual tagging and automatic tagging system. This figure is better than the Kappa coefficient 0.6 for manually tagged results and it shows that automatic named entity tagging system can be used as a practical tagging for patent documents in replacement of a manual tagging.
Keywords
Conditional Random Fields; Named Entity Recognition; Patent Corpus; Kappa Coefficient; 10-Fold Cross Validation;
Citations & Related Records
연도 인용수 순위
  • Reference
1 D. Nadeau and S. Sekine, "A Survey of Named Entity Recognition and Classification," Lingvisticae Investigationes, Vol.30, No.1, pp.3-26, 2007.   DOI
2 S. Cucerzan and D. Yarowsky, "Language Independent Named Entity Recognition Combining Morphological and Contextual Evidence," Proceedings of the 1999 Joint SIGDAT Conference on Empirical Methods in NLP and Very Large Corpora, pp.90-99, 1999.
3 Y. Wang, "Annotating and Recognising Named Entities in Clinical Notes," Proceedings of the ACL-IJCNLP 2009 Student Research Workshop, pp.18-26, 2009.
4 H. Gurulingappa, B. Muller, R. Klinger, H. Mevissen, M. Hofmann-Apitius, J. Fluck, and C. Friedrich, "Patent Retrieval in Chemistry based on Semantically Tagged Named Entities," Proceedings of the Eighteenth Text RETrieval Conference (TREC 2009), pp.1-9, 2009.
5 D. Eisinger, G. Tsatsaronis, M. Bundschus, U. Wieneke, and M. Schroeder, "Automated Patent Categorization and Guided Patent Search using IPC as Inspired by MeSH and PubMed," Journal of Biomed Semantics, Vol.4, Suppl. 1, 2013.
6 J. Lafferty, A. McCallum, and F. Pereira, "Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data," Proceedings of the 18th International Conference on Machine Learning, pp.282-289, 2001.
7 C. Sutton and A. McCallum, "An Introduction to Conditional Random Fields," Machine Learning, Vol.4, No.4, pp.267-373, 2011.
8 H. Wallach, "Conditional Random Fields: An Introduction," CIS Technical Report MS-CIS-04-21, University of Pennsylvania, pp.1-9, 2004.