• Title/Summary/Keyword: Stemmer

Search Result 6, Processing Time 0.021 seconds

Comparative Study of Various Persian Stemmers in the Field of Information Retrieval

  • Moghadam, Fatemeh Momenipour;Keyvanpour, MohammadReza
    • Journal of Information Processing Systems
    • /
    • v.11 no.3
    • /
    • pp.450-464
    • /
    • 2015
  • In linguistics, stemming is the operation of reducing words to their more general form, which is called the 'stem'. Stemming is an important step in information retrieval systems, natural language processing, and text mining. Information retrieval systems are evaluated by metrics like precision and recall and the fundamental superiority of an information retrieval system over another one is measured by them. Stemmers decrease the indexed file, increase the speed of information retrieval systems, and improve the performance of these systems by boosting precision and recall. There are few Persian stemmers and most of them work based on morphological rules. In this paper we carefully study Persian stemmers, which are classified into three main classes: structural stemmers, lookup table stemmers, and statistical stemmers. We describe the algorithms of each class carefully and present the weaknesses and strengths of each Persian stemmer. We also propose some metrics to compare and evaluate each stemmer by them.

A Korean Language Stemmer based on Unsupervised Learning (자율 학습에 의한 실질 형태소와 형식 형태소의 분리)

  • Jo, Se-Hyeong
    • The KIPS Transactions:PartB
    • /
    • v.8B no.6
    • /
    • pp.675-684
    • /
    • 2001
  • This paper describes a method for stemming of Korean language by using unsupervised learning from raw corpus. This technique does not require a lexicon or any language-specific knowledge. Since we use unsupervised learning, the time and effort required for learning is negligible. Unlike heuristic approaches that are theoretically ungrounded, this method is based on widely accepted statistical methods, and therefore can be easily extended. The method is currently applied only to Korean language, but it can easily be adapted to other agglutinative languages, since it is not language-dependent.

  • PDF

A Korean Language Stemmer based on Unsupervised Learning (자율 학습에 의한 실질 형태소와 형식 형태소의 분리)

  • Cha, Yong-Tae;Cho, Se-Hyeong
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2002.11a
    • /
    • pp.577-580
    • /
    • 2002
  • 자연어의 처리를 위해 반드시 필요한 형태소 분석에는 여러 가지 방법이 있으나 기본적으로 사전을 갖춘 상태에서 가장 가능성 있는 후보를 선택하는 방식을 선택한다. 이러한 방식으로는 사전이 없는 미지의 언어를 분석하기는 불가능하다. 기지의 언어라도 지속적으로 어휘가 변하는 경우나 매우 특별한 분야의 경우에는 필요로 하는 사전이 존재하지 않는다. 본 논문에서는 태그가 없는 단순 말뭉치만을 가지고 자율학습을 이용하여 한국어의 실질 형태소와 형식 형태소를 분리해내는 기법에 대하여 기술한다.

  • PDF

Molecular Breeding of Genes, Pathways and Genomes by DNA Shuffing

  • Stemmer, Willem P.C.
    • Biotechnology and Bioprocess Engineering:BBE
    • /
    • v.7 no.3
    • /
    • pp.121-129
    • /
    • 2002
  • Existing methods for optimization of sequences by random mutagenesis generate libraries with a small number of mostly deleterious mutations, resulting in libraries containing a large fraction of non-functional clones that explore only a small part of sequence space. Large numbers of clones need to be screened to find the rare mutants with improvements. Library display formats are useful to screen very large libraries but impose screening limitations that limit the value of this approach for most commercial applications. By contrast, in both classical breeding and in DNA shuffling, natural diversity is permutated by homologous recombination, generating libraries of very high quality, from which improved clones can be identified with a small number of complex screens. Given that this small number of screens can be performed under the conditions of actual use of the product, commercially relevant improvements can be reliably obtained.

Comparison of User-generated Tags with Subject Descriptors, Author Keywords, and Title Terms of Scholarly Journal Articles: A Case Study of Marine Science

  • Vaidya, Praveenkumar;Harinarayana, N.S.
    • Journal of Information Science Theory and Practice
    • /
    • v.7 no.1
    • /
    • pp.29-38
    • /
    • 2019
  • Information retrieval is the challenge of the Web 2.0 world. The experiment of knowledge organisation in the context of abundant information available from various sources proves a major hurdle in obtaining information retrieval with greater precision and recall. The fast-changing landscape of information organisation through social networking sites at a personal level creates a world of opportunities for data scientists and also library professionals to assimilate the social data with expert created data. Thus, folksonomies or social tags play a vital role in information organisation and retrieval. The comparison of these user-created tags with expert-created index terms, author keywords and title words, will throw light on the differentiation between these sets of data. Such comparative studies show revelation of a new set of terms to enhance subject access and reflect the extent of similarity between user-generated tags and other set of terms. The CiteULike tags extracted from 5,150 scholarly journal articles in marine science were compared with corresponding Aquatic Science and Fisheries Abstracts descriptors, author keywords, and title terms. The Jaccard similarity coefficient method was employed to compare the social tags with the above mentioned wordsets, and results proved the presence of user-generated keywords in Aquatic Science and Fisheries Abstracts descriptors, author keywords, and title words. While using information retrieval techniques like stemmer and lemmatization, the results were found to enhance keywords to subject access.