[KSCI] Korea Science Citation Index Service

Domain Adaptation Method for LHMM-based English Part-of-Speech Tagger

Kwon, Oh-Woog (한국전자통신연구원 언어처리연구팀)
Kim, Young-Gil (한국전자통신연구원 언어처리연구팀)

Publication Information

Journal of KIISE:Computing Practices and Letters / v.16, no.10, 2010 , pp. 1000-1004 More about this Journal

Abstract

A large number of current language processing systems use a part-of-speech tagger for preprocessing. Most language processing systems required a tagger with the highest possible accuracy. Specially, the use of domain-specific advantages has become a hot issue in machine translation community to improve the translation quality. This paper addresses a method for customizing an HMM or LHMM based English tagger from general domain to specific domain. The proposed method is to semi-automatically customize the output and transition probabilities of HMM or LHMM using domain-specific raw corpus. Through the experiments customizing to Patent domain, our LHMM tagger adapted by the proposed method shows the word tagging accuracy of 98.87% and the sentence tagging accuracy of 78.5%. Also, compared with the general tagger, our tagger improved the word tagging accuracy of 2.24% (ERR: 66.4%) and the sentence tagging accuracy of 41.0% (ERR: 65.6%).

Keywords

part-of-speech tagger; domain adaptation method; LHMM; HMM;

Citations & Related Records

Reference

1	Daelemans, W., Zavrel, J., Berck, P. and Gillis, S. "MBT: A memory-based part-of-speech tagger generator," Proceedings 4th Workshop on Very Large Corpora, pp.14-27, 1996.
2	Ma'rquez, L., Padro', L. and Rodr'ıguez, H, "A machine learning approach to POS tagging," Machine Learning, vol.39, no.1, pp.59-91, 2000. DOI ScienceOn
3	Ratnaparkhi, A., "A maximum entropy part-ofspeech tagger," Proceedings 1st Conference on Empirical Methods in Natural Language Processing, E.
4	Brill, E. and Wu, J., "Classifier Combination for Improved Lexical Disambiguation," Proceedings Joint 17th International Conference on Computational Linguistics and 36th Annual Meeting of the Association for Computational Linguistics, COLING- ACL, pp.191-195. Montr'eal, Canada, 1998.
5	Merialdo, B., "Tagging English text with a probabilistic model," Computational Linguistics, vol.20, no.2, pp.155-171, 1994.
6	Brants, T., "TnT - a statistical part-of-speech tagger," Proceedings of the Sixth Applied Natural Language Processing (ANLP-2000), Seattle, WA, pp.224-231, 2000.
7	Ferran Pla and Antonio Molina, "Improving Partof-speech Tagging Using Lexicalized HMMs," Natural Language Engineering, vol.10, no.2, pp.167-189, 2004. DOI ScienceOn
8	John Lafferty, Andrew McCallum, and Fernando Pereira, "Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data," Proceedings of the Eighteenth International Conference on Machine Learning 2001, pp.282- 289, 2001.
9	Brill, E., "Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging," Computational Linguistics, vol.21, no.4, pp.543-565, 1995.

KSCI

Domain Adaptation Method for LHMM-based English Part-of-Speech Tagger LHMM기반 영어 형태소 품사 태거의 도메인 적응 방법

Domain Adaptation Method for LHMM-based English Part-of-Speech Tagger