Browse > Article
http://dx.doi.org/10.5626/JCSE.2014.8.3.137

Classifying Articles in Chinese Wikipedia with Fine-Grained Named Entity Types  

Zhou, Jie (Zhengzhou Information Science and Technology Institute)
Li, Bicheng (Zhengzhou Information Science and Technology Institute)
Tang, Yongwang (Zhengzhou Information Science and Technology Institute)
Publication Information
Journal of Computing Science and Engineering / v.8, no.3, 2014 , pp. 137-148 More about this Journal
Abstract
Named entity classification of Wikipedia articles is a fundamental research area that can be used to automatically build large-scale corpora of named entity recognition or to support other entity processing, such as entity linking, as auxiliary tasks. This paper describes a method of classifying named entities in Chinese Wikipedia with fine-grained types. We considered multi-faceted information in Chinese Wikipedia to construct four feature sets, designed different feature selection methods for each feature, and fused different features with a vector space using different strategies. Experimental results show that the explored feature sets and their combination can effectively improve the performance of named entity classification.
Keywords
Named entity classification; Chinese Wikipedia; Fine-grained; Feature selection; NER corpora;
Citations & Related Records
연도 인용수 순위
  • Reference
1 J. Hoffart, F. M. Suchanek, K. Berberich, and G. Weikum, "YAGO2: a spatially and temporally enhanced knowledge base from Wikipedia," Artificial Intelligence, vol. 194, pp. 28-61, 2013.   DOI   ScienceOn
2 J. Kazama and K. Torisawa, "Inducing gazetteers for named entity recognition by large-scale clustering of dependency relations," in Proceedings of Association for Computational Linguistics: Human Language Technologies, Columbus, OH, 2008, pp. 407-415.
3 R. Bunescu and M. Pasca, "Using encyclopedic knowledge for named entity disambiguation," in Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics, Trento, Italy, 2006, pp. 9-16.
4 D. M. Nemeskey and E. Simon, "Automatically generated NE tagged corpora for English and Hungaria," in Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Jeju, Korea, 2012, pp. 38-46.
5 Y. Feng, L. Sun, D. Zhang, and W. Li, "Study on the Chinese named entity recognition using small scale character tail hints," Acta Electronica Sinica, vol. 36, no. 9, pp. 1833-1838, 2008.
6 J. Nothman, N. Ringland, W. Radford, T. Murphy, and J. R. Curran, "Learning multilingual named entity recognition from Wikipedia," Artificial Intelligence, vol. 194, pp. 151-175, 2013.   DOI   ScienceOn
7 J. Nothman, J. R. Curran and T. Murphy, "Transforming Wikipedia into named entity training data," in Proceedings of the Australasian Language Technology Workshop, Tasmania, Australia, 2008, pp. 124-132.
8 W. Chen, Y. Zhang, and H. Isahara, "Chinese named entity recognition with conditional random fields," in Proceedings of the 5th SIGHAN Workshop on Chinese Language Processing, Sydney, Australia, 2006, pp. 118-121.
9 A. Toral and R. Munoz, "A proposal to automatically build and maintain gazetteers for Named Entity Recognition by using Wikipedia," in Proceedings of the 11st Conference of the European Chapter of the Association for Computational Linguistics, Trento, Italy, 2006, pp. 56-61.
10 A. E. Richman and P. Schone, "Mining Wiki resources for multilingual named entity recognition," in Proceedings of the Association for Computational Linguistics: Human Language Technologies, Columbus, OH, 2008, pp. 1-9.
11 W. Dakka and S. Cucerzan, "Augmenting Wikipedia with named entity tags," in Proceedings of the 3rd International Joint Conference on Natural Language Processing, Hyderabad, India, 2008, pp. 545-552.
12 Y. Watanabe, M. Asahara, and Y. Matsumoto, "A graphbased approach to named entity categorization in Wikipedia using conditional random fields," in Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, Czech Republic, 2007, pp. 649-657.
13 I. Saleh, K. Darwish, and A. Fahmy, "Classifying Wikipedia articles into NE's using SVM's with threshold adjustment," in Proceedings of the 2010 Named Entities Workshop, Uppsala, Sweden, 2010, pp. 85-92.
14 M. Tkatchenko, A. Ulanov, and A. Simanovsky, "Classifying Wikipedia entities into fine-grained classes," in Proceedings of IEEE 27th International Conference on Data Engineering Workshops, Hannover, Germany, 2011, pp. 212-217.
15 S. Tardif, J. R. Curran, and T. Murphy. "Improved text categorisation for Wikipedia named entities," in Proceedings of the Australasian Language Technology Association Workshop, Sydney, Australia, 2009, pp. 104-108.
16 D. Nadeau, P. D. Turney, and S. Matwin, "Unsupervised named-entity recognition: generating gazetteers and resolving ambiguity," in Advances in Artificial Intelligence, Lecture Notes in Computer Science volume 4013, Heidelberg: Springer, 2006, pp 266-277.
17 A. Sumida, N. Yoshinaga, and K. Torisawa, "Boosting precision and recall of hyponymy relation acquisition from hierarchical layouts in Wikipedia," in Proceedings of the 6th International Conference on Language Resources and Evaluation, Marrakech, Morocco, 2008, pp. 2462-2469.
18 R. Higashinaka, K. Sadamitsu, K. Saito, T. Makino, and Y. Matsuo, "Creating an extended named entity dictionary from Wikipedia," in Proceedings of the 24th International Conference on Computational Linguistics, Mumbai, India, 2012, pp. 1163-1178.
19 F. Alotaibi and M. G. Lee, "Mapping arabic Wikipedia into the named entities taxonomy," in Proceedings of the 24th International Conference on Computational Linguistics, Mumbai, India, 2012, pp. 43-52.
20 J. H. Oh, K. Uchimoto, and K. Torisawa, "Bilingual cotraining for monolingual hyponymy-relation acquisition," in Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Singapore, 2009, pp. 432-440.
21 L. Ratinov and D. Roth, "Design challenges and misconceptions in named entity recognition," in Proceedings of the 13th Conference on Computational Natural Language Learning, Boulder, CO, 2009, pp. 147-155.
22 C. Bohn and K. Norvag, "Extracting named entities and synonyms from Wikipedia," in Proceedings of the 24th IEEE International Conference on Advanced Information Networking and Applications, Perth, Australia, 2010, pp. 1300-1307.
23 H. Ji, R. Grishman, and H. T. Dang, "Overview of the TAC2011 knowledge base population track," in Proceedings of the 4th Text Analysis Conference, Gaithersburg, MD, 2011.
24 J. Giles, "Internet encyclopaedias go head to head," Nature, vol. 438, no. 7070, pp. 900-901, 2005.   DOI   ScienceOn
25 J. Kazama and K. Torisawa, "Exploiting Wikipedia as external knowledge for named entity recognition," in Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, Czech Republic, 2007, pp. 698-707.