Browse > Article
http://dx.doi.org/10.14400/JDC.2015.13.5.187

A Study on Utilization of Wikipedia Contents for Automatic Construction of Linguistic Resources  

Yoo, Cheol-Jung (Dept. of Software Engineering, Chonbuk National University)
Kim, Yong (Dept. of Library & Information Science, Chonbuk National University)
Yun, Bo-Hyun (Dept. of Computer Science Education, Mokwon University)
Publication Information
Journal of Digital Convergence / v.13, no.5, 2015 , pp. 187-194 More about this Journal
Abstract
Various linguistic knowledge resources are required in order that machine can understand diverse variation in natural languages. This paper aims to devise an automatic construction method of linguistic resources by reflecting characteristics of online contents toward continuous expansion. Especially we focused to build NE(Named-Entity) dictionary because the applicability of NEs is very high in linguistic analysis processes. Based on the investigation on Korean Wikipedia, we suggested an efficient construction method of NE dictionary using the syntactic patterns and structural features such as metadatas.
Keywords
Linguistic Resource Construction; Wikipedia; Named-Entity Dictionary; Knowledge Construction; Utilization of online contents;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 Y.M. Park, J. S. Lee, Named Entity Recognition and Dictionary Construction for Korean Title: Books, Movies, Music and TV Programs, KIPS Tr. Software and Data Eng. Vol. 3, No. 7, pp.285-292, 2014.   DOI   ScienceOn
2 A. Mikheev, C. Grover, M. Moens, Description of the LTG System Used for MUC-7. Proc. of MUC-7. pp.1-8 1998.
3 S. Brin, Extracting Patterns and Relations from the World Wide Web. Proc. of the International Workshop on The World Wide Web and Databases, pp.172-183, 1998.
4 M. Negri, B. Magnini, Using wordnet predicates for multilingual named entity recognition. Proc. of The Second Global Wordnet Conference, pp.169-174. 2004.
5 B. Magnini, N. Matteo, R. Prevete, and H. Tanev, A wordnet-based approach to named entities recognition. Proc. of the 2002 workshop on Building and using semantic networks, pp.1-7, 2002.
6 S. Sekine, R. Grishman, H. Shinnou, A decision tree method for finding and classifying names in Japanese texts. Proc. the Sixth Workshop on Very Large Corpora. 1998.
7 C. K. Lee, P-M. Ryu, H. K Kim, Named Entity Recognition using a modified Pegasos algorithm. Proc. of the CIKM, pp.655-667. 2010.
8 Korean Wikipedia, http://ko.wikipedia.org/
9 Toral, A. R. Munoz, A proposal to automatically build and maintain gazetters for named entity recognition by using Wikipedia, NEW TEXT Wikis and blogs and other dynamic text sources, 2006.
10 R. Bunescu, M. Pasca, Using encyclopedia knowledge for named entity disambiguation. Proc. of EACL pp.9-16, 2006.
11 T. Nguyen H. cao, Exploiting Wikipedia and text features for named entity disambiguation. Proc. of the 2nd international conference on intelligent information and database system, pp.101-104, 2010.
12 C. Lee, Y. Hwang, M. Jang, Fine-Grained Named Entity Recognition and Relation Extraction for Question Answering. Proc. of SIGIR, pp.799-800, 2007.
13 L. Deng, D. Yu, Deep Learning: Methods and Applications. Foundations and $Trends^{(R)}$ in Signal Processing. Vol. 7, No. 3-4, pp 197-387, 2014.   DOI
14 Michael Scriven. Evaluation thesaurus. UK: Sage Press, 1991.
15 V. Nastase, M. Strube, B, Boerschinger, C. Zim, A. Elghafari, WikiNet: A Very Large Scale Multi-Lingual Concept Network. Proc. of LREC. pp.1015-1022, 2010.
16 Y. J. Bae, C. Y. Ok, Semantic Analysis of Korean Compound Noun using Lexical Semantic Network(U-WIN). Journal of KIISE: Software and Applications, pp.833-847, 2013.
17 T. J. Kim, E. Sang, D. M. Fien, Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition. Proc. of the seventh conference on Natural language learning at HLT-NAACL 2003-Volume 4. Association for Computational Linguistics, pp.142-147, 2003.