• Title/Summary/Keyword: unknown word guessing

Search Result 2, Processing Time 0.014 seconds

Probabilistic Segmentation and Tagging of Unknown Words (확률 기반 미등록 단어 분리 및 태깅)

  • Kim, Bogyum;Lee, Jae Sung
    • Journal of KIISE
    • /
    • v.43 no.4
    • /
    • pp.430-436
    • /
    • 2016
  • Processing of unknown words such as proper nouns and newly coined words is important for a morphological analyzer to process documents in various domains. In this study, a segmentation and tagging method for unknown Korean words is proposed for the 3-step probabilistic morphological analysis. For guessing unknown word, it uses rich suffixes that are attached to open class words, such as general nouns and proper nouns. We propose a method to learn the suffix patterns from a morpheme tagged corpus, and calculate their probabilities for unknown open word segmentation and tagging in the probabilistic morphological analysis model. Results of the experiment showed that the performance of unknown word processing is greatly improved in the documents containing many unregistered words.

An Efficient Method for Korean Noun Extraction Using Noun Patterns (명사 출현 특성을 이용한 효율적인 한국어 명사 추출 방법)

  • 이도길;이상주;임해창
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.1_2
    • /
    • pp.173-183
    • /
    • 2003
  • Morphological analysis is the most widely used method for extracting nouns from Korean texts. For every Eojeol, in order to extract nouns from it, a morphological analyzer performs frequent dictionary lookup and applies many morphonological rules, therefore it requires many operations. Moreover, a morphological analyzer generates all the possible morphological interpretations (sequences of morphemes) of a given Eojeol, which may by unnecessary from the noun extraction`s point of view. To reduce unnecessary computation of morphological analysis from the noun extraction`s point of view, this paper proposes a method for Korean noun extraction considering noun occurrence characteristics. Noun patterns denote conditions on which nouns are included in an Eojeol or not, which are positive cues or negative cues, respectively. When using the exclusive information as the negative cues, it is possible to reduce the search space of morphological analysis by ignoring Eojeols not including nouns. Post-noun syllable sequences(PNSS) as the positive cues can simply extract nouns by checking the part of the Eojeol preceding the PNSS and can guess unknown nouns. In addition, morphonological information is used instead of many morphonological rules in order to recover the lexical form from its altered surface form. Experimental results show that the proposed method can speed up without losing accuracy compared with other systems based on morphological analysis.