Browse > Article

Domain Adaptation Method for LHMM-based English Part-of-Speech Tagger  

Kwon, Oh-Woog (한국전자통신연구원 언어처리연구팀)
Kim, Young-Gil (한국전자통신연구원 언어처리연구팀)
Abstract
A large number of current language processing systems use a part-of-speech tagger for preprocessing. Most language processing systems required a tagger with the highest possible accuracy. Specially, the use of domain-specific advantages has become a hot issue in machine translation community to improve the translation quality. This paper addresses a method for customizing an HMM or LHMM based English tagger from general domain to specific domain. The proposed method is to semi-automatically customize the output and transition probabilities of HMM or LHMM using domain-specific raw corpus. Through the experiments customizing to Patent domain, our LHMM tagger adapted by the proposed method shows the word tagging accuracy of 98.87% and the sentence tagging accuracy of 78.5%. Also, compared with the general tagger, our tagger improved the word tagging accuracy of 2.24% (ERR: 66.4%) and the sentence tagging accuracy of 41.0% (ERR: 65.6%).
Keywords
part-of-speech tagger; domain adaptation method; LHMM; HMM;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Daelemans, W., Zavrel, J., Berck, P. and Gillis, S. "MBT: A memory-based part-of-speech tagger generator," Proceedings 4th Workshop on Very Large Corpora, pp.14-27, 1996.
2 Ma'rquez, L., Padro', L. and Rodr'ıguez, H, "A machine learning approach to POS tagging," Machine Learning, vol.39, no.1, pp.59-91, 2000.   DOI   ScienceOn
3 Ratnaparkhi, A., "A maximum entropy part-ofspeech tagger," Proceedings 1st Conference on Empirical Methods in Natural Language Processing, E.
4 Brill, E. and Wu, J., "Classifier Combination for Improved Lexical Disambiguation," Proceedings Joint 17th International Conference on Computational Linguistics and 36th Annual Meeting of the Association for Computational Linguistics, COLING- ACL, pp.191-195. Montr'eal, Canada, 1998.
5 Merialdo, B., "Tagging English text with a probabilistic model," Computational Linguistics, vol.20, no.2, pp.155-171, 1994.
6 Brants, T., "TnT - a statistical part-of-speech tagger," Proceedings of the Sixth Applied Natural Language Processing (ANLP-2000), Seattle, WA, pp.224-231, 2000.
7 Ferran Pla and Antonio Molina, "Improving Partof-speech Tagging Using Lexicalized HMMs," Natural Language Engineering, vol.10, no.2, pp.167-189, 2004.   DOI   ScienceOn
8 John Lafferty, Andrew McCallum, and Fernando Pereira, "Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data," Proceedings of the Eighteenth International Conference on Machine Learning 2001, pp.282- 289, 2001.
9 Brill, E., "Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging," Computational Linguistics, vol.21, no.4, pp.543-565, 1995.