Corpus-Based Ambiguity-Driven Learning of Context- Dependent Lexical Rules for Part-of-Speech Tagging

품사태킹을 위한 어휘문맥 의존규칙의 말뭉치기반 중의성주도 학습

  • Published : 1999.01.01

Abstract

Most stochastic taggers can not resolve some morphological ambiguities that can be resolved only by referring to lexical contexts because they use only contextual probabilities based ontag n-grams and lexical probabilities. Existing lexical rules are effective for resolving such ambiguitiesbecause they can refer to lexical contexts. However, they have two limitations. One is that humanexperts tend to make erroneous rules because they are deterministic rules. Another is that it is hardand time-consuming to acquire rules because they should be manually acquired. In this paper, wepropose context-dependent lexical rules, which are lexical rules based on the statistics of a taggedcorpus, and an ambiguity-driven teaming method, which is the method of automatically acquiring theproposed rules from a tagged corpus. By using the proposed rules, the proposed tagger can partiallyannotate an unseen corpus with high accuracy because it is a kind of memorizing tagger that canannotate a training corpus with 100% accuracy. So, the proposed tagger is useful to improve theaccuracy of a stochastic tagger. And also, it is effectively used for detecting and correcting taggingerrors in a manually tagged corpus. Moreover, the experimental results show that the proposed methodis also effective for English part-of-speech tagging.

Keywords

References

  1. 제6회 한글 및 한국어정보처리 학술대회 논문집 v.6 no.1 은닉 마르코프 모델을 이용한 두단계 한국어 품사 태킹 이상주;임희석;임해창
  2. In Proceedings of the 2nd Conference on Applied Natural Language Processing Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text Kenneth Ward Church
  3. CPOL(Computer Processing of Oriental Languages) v.11 no.3 Twoply Hidden Markov Model: A Korean Part-of-Speech Tagging Model Based on Morpheme-Unit with Eojeol-Unit Context Kim Jin Dong;L.im Heui Seok;Lee Sang Zoo;Rim Hae Chang
  4. 제10회 한글 및 한국어정보처리 학술대회 논문집 v.10 no.1 어절 띄어쓰기를 고려한 형태소 단위 품사 태깅 모델 김진동;이상주;임해창
  5. In Proceedings of the 12th National Conference on Artificial Intelligence(AAAI-94) Some Advances in Transformation-Based Part of Speech Tagging Eric Brill
  6. In Proceedings of the Empirical Methods in Natural Language Processing Conference(EMNLP) A Maximum Entropy Model for Part-of-Speech Tagging Adwait Ratnaparkhi
  7. 제24회 정보과학회 봄 학술발표논문집 v.24 no.1 규칙기반 한국어 품사 태깅을 위한 어휘 규칙 획득의 수작업 최소화 방안 이정규;이상주;임희석;임해창
  8. 언어 지식과 통계정보를 이용한 한국어 품사 태깅 모델 임희석
  9. 언어 정보 획득을 위한 한국어 코퍼스 분석 도구 이호
  10. In Proceedings of the 4th Conference on Applied Natural Language Processing Tagging Accurately - Don't Guess if You Know Pasi Tapanainen;Ator Voutilatinen
  11. In Proceedings of the 17th International Conference on Computer Processing of Oriental Languages(ICCPOL) Tagging Chinese Corpus Based on Statistical and Rule Techniques M.Zang;S.Li;T.Zhao
  12. 한국정보과학회 논문지(B) v.24 no.2 통계와 규칙에 기반한 2단계 한국어 품사 태깅 시스템 신상현;이근배;이종혁
  13. 한국어에서의 품사부착말뭉치의 작성요령: KAIST 말뭉치 김재훈;김길창