Generating a Category Set of Words Using a Hierarchical Part-of-speech System and Tagged Corpus

  • Kojima, Takeyuki (Dept. of Computer, Information and Communication Science, Tokyo University of Agric. and Tech.) ;
  • Kotani, Yoshiyuki (Dept. of Computer, Information and Communication Science, Tokyo University of Agric. and Tech.)
  • Published : 2002.02.01

Abstract

In this paper, we propose a method of generating a proper categorization of morphemes by giving a hierarchical part-of-speech system and a corpus tagged using this part-of-speech system. Our method use hierarchical information in the part-of-speech system and statistical information in the corpus to generate a category set. The statistical information is based on the context of occurrence of categories. First, we specify the format of given information. Then, we describe an algorithm to generate a proper categorization. Finally, we present the results of our experiments in applying this method. We obtained a moderately proper categorization and found several candidates for improvement .

Keywords