A Hidden Markov Model Imbedding Multiword Units for Part-of-Speech Tagging

  • Kim, Jae-Hoon (Division of Automation and Information Engineering, Korea Maritime University) ;
  • Jungyun Seo (Department of Computer Science, Sogang University)
  • Published : 1997.12.01

Abstract

Morphological Analysis of Korean has known to be a very complicated problem. Especially, the degree of part-of-speech(POS) ambiguity is much higher than English. Many researchers have tried to use a hidden Markov model(HMM) to solve the POS tagging problem and showed arround 95% correctness ratio. However, the lack of lexical information involves a hidden Markov model for POS tagging in lots of difficulties in improving the performance. To alleviate the burden, this paper proposes a method for combining multiword units, which are types of lexical information, into a hidden Markov model for POS tagging. This paper also proposes a method for extracting multiword units from POS tagged corpus. In this paper, a multiword unit is defined as a unit which consists of more than one word. We found that these multiword units are the major source of POS tagging errors. Our experiment shows that the error reduction rate of the proposed method is about 13%.

Keywords

References

  1. J. National Language Understanding, 2nd edition, Allen,J.;ed.Benjain
  2. Computational Linguistics v.21 no.4 Transformation-based error driven learning and natrual language processing: a case study in part-of-speech tagging Brill,E.
  3. Proceedings of National Conference on Artificial Intelligence v.AAAI-93 Equations for part-of-speech tagging Charniak,E.;Hendrickson,C.;Jacobson,N.;Perkowitz,M.
  4. Computational Linguistics v.16 no.1 Word association noms, mutual information, and lexicography Church,K.W.;Mercer,R.L.
  5. Computational Lingustics v.19 no.1 Introduction to the special issue on computational linguistics using large corpora Church,K.W.;Mercer,R.L.
  6. Biometrika v.40 no.3 The population frequencies of species and the estimation of population parameters Good,I.J.
  7. Proceedings of the Korea Cognitive Science Society Spring Conference A Korean part-of-speech tagging model based on morpheme unit with Eojeol-unit context Kim,J.D.;Lim,H.S.;Rim,H.C.
  8. Ph.D. Thesis of Computer Science Lexical Disambiguatio Using Error-Driven Learning Kim,J.H.
  9. Journal of the Korea Information Science Society v.22 no.1 An efficient Korean part-of-speech tagging using hidden Markov model Kim,J.H.;Lim,C.S.;Seo,J.
  10. Journal of the Korea Information Processing Society v.3 no.6 An effective estimation method for lexical probabilities in Korean lexical disambiguation Lee,H.K.
  11. M.S. Thesis, Dept. of Computer Science A Korean Part-of-Speech Tagging System with Handling Unknown Word Lee,S.H.
  12. M.S. Thesis, Dept. of Computer Science Design and Implementation of an Automatic Tagging System for Korean Texts Lee,W.J.
  13. Proceedings of the 22 KISS Spring Conference Exitracing collocations from tagged corpus in Korean Lee,K.J.;Kim,J.H.;Kim,G.C.
  14. Processing of Oriental Languages(ICCPOL-95) TAKRAG: Two phase learning method for hybrid statistical/rule-based part-of-speech disambiguation Lee,J.H.;Shin,S.H.
  15. Proceedings of the 7 International Conference on Computer Processing of Oriental Languages(ICCPOL-97) A Korean part-of-speech tagger using transformation-based error-driven learning Lim,H.S.;Kim,J.D.;Rim,H.C.
  16. Proceedings of International Conference on Computational Linguistics(COLING-94) Automatic model refinement - with an application to tagging Lin,Y.C.;Chiang,T.H.;Su,K.Y.
  17. Computational Linguistics v.20 no.2 Tagging English text with a probabilistic model Merialdo,B.
  18. Proceedings of International Conference on Recent Advances in Natural Language Processing(RANLP-94) A HMM part-of-speech tagger for Korean wirh word phrasal relations Shin,J.H.;Han,Y.S.;Park,Y.C.;Choi,K.S.
  19. Proceedings of the 26 Annual Meeting of the Assoc. for Computational Linguistics(ACL-32) Part-of-speech tagging using a variable memory Markov model Schutze,H.;Y.Singer
  20. Computatinoal linguistics v.19 no.1 Retrieving Collocations from Text: Xtract Smadja,F.
  21. Proceedings of the Annual Meeting of the Assoc. for Computational Linguistics(ACL-94) A corpusbased approach to automatic compound extraction Su,K.Y.;Wu,M.W.;Chang,J.S.