Browse > Article
http://dx.doi.org/10.5391/IJFIS.2012.12.4.277

A Muti-Resolution Approach to Restaurant Named Entity Recognition in Korean Web  

Kang, Bo-Yeong (School of Mechanical Engineering, Kyungpook National University)
Kim, Dae-Won (School of Computer Science and Engineering, Chung-Ang University)
Publication Information
International Journal of Fuzzy Logic and Intelligent Systems / v.12, no.4, 2012 , pp. 277-284 More about this Journal
Abstract
Named entity recognition (NER) technique can play a crucial role in extracting information from the web. While NER systems with relatively high performances have been developed based on careful manipulation of terms with a statistical model, term mismatches often degrade the performance of such systems because the strings of all the candidate entities are not known a priori. Despite the importance of lexical-level term mismatches for NER systems, however, most NER approaches developed to date utilize only the term string itself and simple term-level features, and do not exploit the semantic features of terms which can handle the variations of terms effectively. As a solution to this problem, here we propose to match the semantic concepts of term units in restaurant named entities (NEs), where these units are automatically generated from multiple resolutions of a semantic tree. As a test experiment, we applied our restaurant NER scheme to 49,153 nouns in Korean restaurant web pages. Our scheme achieved an average accuracy of 87.89% when applied to test data, which was considerably better than the 78.70% accuracy obtained using the baseline system.
Keywords
Named entity classification; semantic feature; multi-resolution approach;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 I.T. Jolliffe, Principal Component Analysis, Springer-Verlag, 2002.
2 T.H. Kim and others, "Proper noun extraction using data sets," Proc. of the 12th Annual Conference on Human and Cognitive Language Technology, pp.11-18, 2000.
3 T.G. Noh and S.J. Lee, "Extraction and classification of proper nouns by rule-based machine learning," Proc. of Korean Information Science Society, pp.170-172, 2000.   과학기술학회마을
4 K.H. Lee, Study on named entity recognition in Korean text, Master's thesis, KAIST, Korea, 2000.
5 K.J. Lee, D.G. Lee, H.C. Rim, S.J. Lim, "Fine grained classification of named entities using machine learning and dictionary," Proc. of 30th Korean Information Science Society, 2003.
6 C.G. Lee, Y.G. Hwang, H.J. Oh, S. Lim, J. Heo, C.H. Lee, H.J. Kim, J.H. Wang, M.G. Jang, "Fine-grained named entity recognition using Conditional Random Fields(CRF) for question answering," Proc. of 18th Annual Conference on Human and Cognitive Language Technology, pp. 268-272, 2006.
7 D. Nadeau and S. Sekine, "A survery of named entity recognition and classification," Linguisticae Investigationes, vol.30, no.1, pp.3-26, January 2007.   DOI
8 Restaurant Row, at http://www.restaurantrow.com/.
9 Menupan.Com, at http://www.menupan.co.kr.
10 C.G. Kim, "Design and Implementation of the Compound Noun Segmentation Algorithm Based on Statistical Information," Int. Journal of Fuzzy Logic and Intelligent Systems, vol.4, no.3, pp.306-310, 2004.   DOI   ScienceOn
11 KLT, at http://nlp.kookmin.ac.kr/HAM/kor/.
12 C. Fellbaum and others, WordNet: An electronic lexical database, The MIT press, 1998.
13 WIRE, at http://www.cwr.cl/projects/WIRE/.
14 JoyFood.Com, at http://www.joyfood.com.
15 HTML-parser, at http://htmlparser.sourceforge.net.
16 G.M. Weiss, "Mining with rarity: a unifying framework," The SIGKD Explorations Special Issue on Learning from Imbalanced Datasets, Vol.6, No.1, pp.1-6, 2004.
17 H. Isozaki and H. Kazawa, "Efficient support vector classifiers for named entity recognition," Proc. of COLING-2002, pp.390-396, 2002.
18 A.S. Yoon, S.H. Hwang, E.R. Lee, and H.C. Kwon, "Construction of Korean WordNet, KorLex 1.5," Journal of KIISE : Software and Applications, vol.36, no.1, pp.92-108, 2009.