Language and Information (한국언어정보학회지:언어와정보)
- Volume 6 Issue 2
- /
- Pages.105-128
- /
- 2002
- /
- 1226-7430(pISSN)
A Rule-Based Analysis from Raw Korean Text to Morphologically Annotated Corpora
- Lee, Ki-Yong (Korea University) ;
- Markus Schulze (Universitat Erlangen-Nurnberg)
- Published : 2002.12.01
Abstract
Morphologically annotated corpora are the basis for many tasks of computational linguistics. Most current approaches use statistically driven methods of morphological analysis, that provide just POS-tags. While this is sufficient for some applications, a rule-based full morphological analysis also yielding lemmatization and segmentation is needed for many others. This work thus aims at 〔1〕 introducing a rule-based Korean morphological analyzer called Kormoran based on the principle of linearity that prohibits any combination of left-to-right or right-to-left analysis or backtracking and then at 〔2〕 showing how it on be used as a POS-tagger by adopting an ordinary technique of preprocessing and also by filtering out irrelevant morpho-syntactic information in analyzed feature structures. It is shown that, besides providing a basis for subsequent syntactic or semantic processing, full morphological analyzers like Kormoran have the greater power of resolving ambiguities than simple POS-taggers. The focus of our present analysis is on Korean text.
Keywords
- ambiguity;
- allomorph;
- base form;
- Hangul;
- morphological analysis;
- out-put filter;
- preprocessor;
- rule-based;
- tagging;
- word-form