Browse > Article
http://dx.doi.org/10.3745/KIPSTB.2003.10B.2.229

HMM-based Korean Named Entity Recognition  

Hwang, Yi-Gyu (한국전자통신연구원)
Yun, Bo-Hyun (목원대학교 컴퓨터교육과)
Abstract
Named entity recognition is the process indispensable to question answering and information extraction systems. This paper presents an HMM based named entity (m) recognition method using the construction principles of compound words. In Korean, many named entities can be decomposed into more than one word. Moreover, there are contextual relationships among nouns in an NE, and among an NE and its surrounding words. In this paper, we classify words into a word as an NE in itself, a word in an NE, and/or a word adjacent to an n, and train an HMM based on NE-related word types and parts of speech. Proposed named entity recognition (NER) system uses trigram model of HMM for considering variable length of NEs. However, the trigram model of HMM has a serious data sparseness problem. In order to solve the problem, we use multi-level back-offs. Experimental results show that our NER system can achieve an F-measure of 87.6% in the economic articles.
Keywords
Named Entity Recognition; Information Extraction; Trigram; Variable Length Named Entity; HMM;
Citations & Related Records
Times Cited By KSCI : 9  (Citation Analysis)
연도 인용수 순위
1 D. M. Bikel, S. Miller, R. Schwartz, R. Weischedel, 'Nymble : A High-Performance Learning Named-finder,' In Proceedings of the Fifth Conference on Applied Natural Language Proceesing, pp.194-201, 1997   DOI
2 M. Collins and Y. Singer, 'Unsupervised Models for Named Entity Classification,' EMNLP/VLC-99, pp.189-196, 1999
3 K. Fukuda, T. Tsunoda, A. Tamura and T. Takagi, 'Toward Information Extraction : Identifying protein names from biological papers,' In Proc. of the Pacific Symposium on Biocomputing '98 (PSB '98), 1998
4 J. Fukumoto, M. Shimohata, F. Masui and M. Saski, 'Description of the Oki System as Used for MET-2,' In Proceedings of 7th Message Understanding Conference, 1998
5 A. Mikheev, C. Grover, M. Moens, 'Description of the LTG System Used for MUC-7,' In Proceedings of 7th Message Understanding Conference, 1998
6 L .R. Rabiner, 'A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,' Proceedings of the IEEE, Vol.77, No.2, pp.257-286, 1989   DOI   ScienceOn
7 M. Sassano and T. Utsuro, 'Named Entity Chunking Techniques in Supervised Learning for Japanese Named Entity Recognition,' Proceedings of the 18th International Conference on Computational Linguistics, pp.705-711, 2000   DOI
8 S. Sekine, R. Grishman and H. Shinnou, 'A Decision Tree Method for Finding And Classifying Names in Japanese Texts,' Proceedings of the Sixth Workshop on Very Large Corpora, 1998
9 G. D. Zhou, J. Su, 'Named Entity Recognition using an HMM-based Chunk Tagger,' In Processing of the ACL 2002   DOI
10 K. Uchimoto, Q. Ma, M. Murata, H. Ozakum and H. Isahara, 'Named Entity Extraction Based on A ME Model and Transformation Rules,' In Processing of the ACL 2000   DOI
11 C. N. Seon, Y. Ko, J. S. Kim and J. Seo, 'Named Entity Recognition using Machine Learning Methods and Pattern Selection Rules,' pp.229-236, NLPRS 2001
12 노태길, 이상조, '규칙 기반의 기계학습을 통한 고유 명사의 추출과 분류,' 한국정보과학회 가을 학술발표논문집, Vol.27, No.2, pp.170-172, 2000   과학기술학회마을
13 S. Yu, S. Bai and P. Wu, 'Description of the Kent Ridge Digital Labs System Used for MUC-7,' In Proceedings of 7th Message Understanding Conference, 1998
14 S. Katz, 'Estimation of Probabilities from Sparse Data for the Language Model Component of a Speech Recognizer,' IEEE Transaction on Acoustic, Speech, and Signal Processing, Vol.ASSp-35, No.3, pp.400-401, 1987
15 김태현, 이현숙, 하유선, 이만호, 맹성현, '데이터 집합을 이용한 고유명사 추출,' 제 12회 한글 및 한국어 정보처리 학술대회, pp.11-18, 2000   과학기술학회마을
16 이경희, 이주호, '한국어 문서에서 개체명 인식에 관한 연구,' 한글 및 한국어 정보처리 학술대회, pp.292-299, 2000   과학기술학회마을