Search | Korea Science

Moghadam, Fatemeh Momenipour;Keyvanpour, MohammadReza
- Journal of Information Processing Systems
- /
- v.11 no.3
- /
- pp.450-464
- /
- 2015
In linguistics, stemming is the operation of reducing words to their more general form, which is called the 'stem'. Stemming is an important step in information retrieval systems, natural language processing, and text mining. Information retrieval systems are evaluated by metrics like precision and recall and the fundamental superiority of an information retrieval system over another one is measured by them. Stemmers decrease the indexed file, increase the speed of information retrieval systems, and improve the performance of these systems by boosting precision and recall. There are few Persian stemmers and most of them work based on morphological rules. In this paper we carefully study Persian stemmers, which are classified into three main classes: structural stemmers, lookup table stemmers, and statistical stemmers. We describe the algorithms of each class carefully and present the weaknesses and strengths of each Persian stemmer. We also propose some metrics to compare and evaluate each stemmer by them.
https://doi.org/10.3745/JIPS.04.0020 인용 PDF KSCI

Jo, Se-Hyeong
- The KIPS Transactions:PartB
- /
- v.8B no.6
- /
- pp.675-684
- /
- 2001
This paper describes a method for stemming of Korean language by using unsupervised learning from raw corpus. This technique does not require a lexicon or any language-specific knowledge. Since we use unsupervised learning, the time and effort required for learning is negligible. Unlike heuristic approaches that are theoretically ungrounded, this method is based on widely accepted statistical methods, and therefore can be easily extended. The method is currently applied only to Korean language, but it can easily be adapted to other agglutinative languages, since it is not language-dependent.
PDF