Proceedings of the Korean Society for Language and Information Conference (한국언어정보학회:학술대회논문집)
- 2002.02a
- /
- Pages.217-226
- /
- 2002
Generating a Category Set of Words Using a Hierarchical Part-of-speech System and Tagged Corpus
- Kojima, Takeyuki (Dept. of Computer, Information and Communication Science, Tokyo University of Agric. and Tech.) ;
- Kotani, Yoshiyuki (Dept. of Computer, Information and Communication Science, Tokyo University of Agric. and Tech.)
- Published : 2002.02.01
Abstract
In this paper, we propose a method of generating a proper categorization of morphemes by giving a hierarchical part-of-speech system and a corpus tagged using this part-of-speech system. Our method use hierarchical information in the part-of-speech system and statistical information in the corpus to generate a category set. The statistical information is based on the context of occurrence of categories. First, we specify the format of given information. Then, we describe an algorithm to generate a proper categorization. Finally, we present the results of our experiments in applying this method. We obtained a moderately proper categorization and found several candidates for improvement .
Keywords