DOI QR코드

DOI QR Code

A Muti-Resolution Approach to Restaurant Named Entity Recognition in Korean Web

  • Kang, Bo-Yeong (School of Mechanical Engineering, Kyungpook National University) ;
  • Kim, Dae-Won (School of Computer Science and Engineering, Chung-Ang University)
  • Received : 2012.12.07
  • Accepted : 2012.12.20
  • Published : 2012.12.25

Abstract

Named entity recognition (NER) technique can play a crucial role in extracting information from the web. While NER systems with relatively high performances have been developed based on careful manipulation of terms with a statistical model, term mismatches often degrade the performance of such systems because the strings of all the candidate entities are not known a priori. Despite the importance of lexical-level term mismatches for NER systems, however, most NER approaches developed to date utilize only the term string itself and simple term-level features, and do not exploit the semantic features of terms which can handle the variations of terms effectively. As a solution to this problem, here we propose to match the semantic concepts of term units in restaurant named entities (NEs), where these units are automatically generated from multiple resolutions of a semantic tree. As a test experiment, we applied our restaurant NER scheme to 49,153 nouns in Korean restaurant web pages. Our scheme achieved an average accuracy of 87.89% when applied to test data, which was considerably better than the 78.70% accuracy obtained using the baseline system.

Keywords

References

  1. T.H. Kim and others, "Proper noun extraction using data sets," Proc. of the 12th Annual Conference on Human and Cognitive Language Technology, pp.11-18, 2000.
  2. T.G. Noh and S.J. Lee, "Extraction and classification of proper nouns by rule-based machine learning," Proc. of Korean Information Science Society, pp.170-172, 2000.
  3. K.H. Lee, Study on named entity recognition in Korean text, Master's thesis, KAIST, Korea, 2000.
  4. K.J. Lee, D.G. Lee, H.C. Rim, S.J. Lim, "Fine grained classification of named entities using machine learning and dictionary," Proc. of 30th Korean Information Science Society, 2003.
  5. C.G. Lee, Y.G. Hwang, H.J. Oh, S. Lim, J. Heo, C.H. Lee, H.J. Kim, J.H. Wang, M.G. Jang, "Fine-grained named entity recognition using Conditional Random Fields(CRF) for question answering," Proc. of 18th Annual Conference on Human and Cognitive Language Technology, pp. 268-272, 2006.
  6. D. Nadeau and S. Sekine, "A survery of named entity recognition and classification," Linguisticae Investigationes, vol.30, no.1, pp.3-26, January 2007. https://doi.org/10.1075/li.30.1.03nad
  7. Restaurant Row, at http://www.restaurantrow.com/.
  8. Menupan.Com, at http://www.menupan.co.kr.
  9. C.G. Kim, "Design and Implementation of the Compound Noun Segmentation Algorithm Based on Statistical Information," Int. Journal of Fuzzy Logic and Intelligent Systems, vol.4, no.3, pp.306-310, 2004. https://doi.org/10.5391/IJFIS.2004.4.3.306
  10. KLT, at http://nlp.kookmin.ac.kr/HAM/kor/.
  11. C. Fellbaum and others, WordNet: An electronic lexical database, The MIT press, 1998.
  12. WIRE, at http://www.cwr.cl/projects/WIRE/.
  13. JoyFood.Com, at http://www.joyfood.com.
  14. HTML-parser, at http://htmlparser.sourceforge.net.
  15. I.T. Jolliffe, Principal Component Analysis, Springer-Verlag, 2002.
  16. G.M. Weiss, "Mining with rarity: a unifying framework," The SIGKD Explorations Special Issue on Learning from Imbalanced Datasets, Vol.6, No.1, pp.1-6, 2004.
  17. H. Isozaki and H. Kazawa, "Efficient support vector classifiers for named entity recognition," Proc. of COLING-2002, pp.390-396, 2002.
  18. A.S. Yoon, S.H. Hwang, E.R. Lee, and H.C. Kwon, "Construction of Korean WordNet, KorLex 1.5," Journal of KIISE : Software and Applications, vol.36, no.1, pp.92-108, 2009.