• Title/Summary/Keyword: compound noun segmentation

Search Result 7, Processing Time 0.017 seconds

Integrated Indexing Method using Compound Noun Segmentation and Noun Phrase Synthesis (복합명사 분할과 명사구 합성을 이용한 통합 색인 기법)

  • Won, Hyung-Suk;Park, Mi-Hwa;Lee, Geun-Bae
    • Journal of KIISE:Software and Applications
    • /
    • v.27 no.1
    • /
    • pp.84-95
    • /
    • 2000
  • In this paper, we propose an integrated indexing method with compound noun segmentation and noun phrase synthesis. Statistical information is used in the compound noun segmentation and natural language processing techniques are carefully utilized in the noun phrase synthesis. Firstly, we choose index terms from simple words through morphological analysis and part-of-speech tagging results. Secondly, noun phrases are automatically synthesized from the syntactic analysis results. If syntactic analysis fails, only morphological analysis and tagging results are applied. Thirdly, we select compound nouns from the tagging results and then segment and re-synthesize them using statistical information. In this way, segmented and synthesized terms are used together as index terms to supplement the single terms. We demonstrate the effectiveness of the proposed integrated indexing method for Korean compound noun processing using KTSET2.0 and KRIST SET which are a standard test collection for Korean information retrieval.

  • PDF

Design and Implementation of the Compound Noun Segmentation Algorithm Based on Statistical Information

  • Kim, Chang-Geun;Tack, Han-Ho
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.4 no.3
    • /
    • pp.306-310
    • /
    • 2004
  • This paper suggests a reverse segmentation algorithm using affix information and some preference pattern information of Korean compound nouns. The structure of Korean compound nouns is mostly derived from Chinese characters, and it includes some preference patterns utilized as a segmentation rule in this paper. To evaluate the accuracy of the proposed algorithm, an experiment was performed with 36,061 compound nouns. The experiment resulted in getting 99.3% of correct segmentation and showed excellent satisfactory results from the comparative experimentation with other algorithms. Especially, most of the four-syllable or five-syllable compound nouns were successfully segmented without fail.

Korean Base-Noun Extraction and its Application (한국어 기준명사 추출 및 그 응용)

  • Kim, Jae-Hoon
    • The KIPS Transactions:PartB
    • /
    • v.15B no.6
    • /
    • pp.613-620
    • /
    • 2008
  • Noun extraction plays an important part in the fields of information retrieval, text summarization, and so on. In this paper, we present a Korean base-noun extraction system and apply it to text summarization to deal with a huge amount of text effectively. The base-noun is an atomic noun but not a compound noun and we use tow techniques, filtering and segmenting. The filtering technique is used for removing non-nominal words from text before extracting base-nouns and the segmenting technique is employed for separating a particle from a nominal and for dividing a compound noun into base-nouns. We have shown that both of the recall and the precision of the proposed system are about 89% on the average under experimental conditions of ETRI corpus. The proposed system has applied to Korean text summarization system and is shown satisfactory results.

A Reverse Segmentation Algorithm of Compound Nouns Using Affix Information and Preference Pattern (접사정보 및 선호패턴을 이용한 복합명사의 역방향 분해 알고리즘)

  • Ryu, Bang;Baek, Hyun-Chul;Kim, Sang-Bok
    • Journal of Korea Multimedia Society
    • /
    • v.7 no.3
    • /
    • pp.418-426
    • /
    • 2004
  • This paper suggests a reverse segmentation Algorithm using affix information and some preference pattern information of Korean compound nouns. The structure of Korean compound nouns are mostly derived from the Chinese characters and it includes some preference patterns, which are going to be utilized as a segmentation rule in this paper. To evaluate the accuracy of the proposed algorithm, an experiment was performed with 36061 compound nouns. The experiment resulted in getting 99.3% of correct segmentation and showed excellent satisfactory result from the comparative experimentation with other algorithm, especially most of the four or five-syllable compound nouns were successfully segmented without fail.

  • PDF

Korean Word Segmentation and Compound-noun Decomposition Using Markov Chain and Syllable N-gram (마코프 체인 밀 음절 N-그램을 이용한 한국어 띄어쓰기 및 복합명사 분리)

  • 권오욱
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.3
    • /
    • pp.274-284
    • /
    • 2002
  • Word segmentation errors occurring in text preprocessing often insert incorrect words into recognition vocabulary and cause poor language models for Korean large vocabulary continuous speech recognition. We propose an automatic word segmentation algorithm using Markov chains and syllable-based n-gram language models in order to correct word segmentation error in teat corpora. We assume that a sentence is generated from a Markov chain. Spaces and non-space characters are generated on self-transitions and other transitions of the Markov chain, respectively Then word segmentation of the sentence is obtained by finding the maximum likelihood path using syllable n-gram scores. In experimental results, the algorithm showed 91.58% word accuracy and 96.69% syllable accuracy for word segmentation of 254 sentence newspaper columns without any spaces. The algorithm improved the word accuracy from 91.00% to 96.27% for word segmentation correction at line breaks and yielded the decomposition accuracy of 96.22% for compound-noun decomposition.

Segmentation of Korean Compound Nouns Using Semantic Category Analysis of Unregistered Nouns (미등록어의 의미 범주 분석을 이용한 복합명사 분해)

  • Kang Yu-Hwan;Seo Young-Hoon
    • Journal of Information Technology Applications and Management
    • /
    • v.11 no.4
    • /
    • pp.95-102
    • /
    • 2004
  • This paper proposes a method of segmenting compound nouns which include unregistered nouns into a correct combination of unit nouns using characteristics of person's names, loanwords, and location names. Korean person's name is generally composed of 3 syllables, only relatively small number of syllables is used as last names, and the second and the third syllables combination is somewhat restrictive. Also many person's names appear with clue words in compound nouns. Most loanwords have one or more syllables which cannot appear in Korean words, or have sequences of syllables different from usual Korean words. Location names are generally used with clue words designating districts in compound nouns. Use of above characteristics to analyze compound nouns not only makes segmentation more accurate, helps natural language systems use semantic categories of those unregistered nouns. Experimental results show that the precision of our method is approximately 98% on average. The precision of human names and loanwords recognition is about 94% and about 92% respectively.

  • PDF

A Reverse Segmentation Algorithm of Compound Nouns (복합명사의 역방향 분해 알고리즘)

  • Lee, Hyeon-Min;Park, Hyeok-Ro
    • The KIPS Transactions:PartB
    • /
    • v.8B no.4
    • /
    • pp.357-364
    • /
    • 2001
  • 본 논문에서는 단위명사 사전과 접사 사전을 이용하여 한국어 복합명사를 분해하는 새로운 알고리즘을 제안한다. 한국어 복합명사는 그 구조에 있어서 중심어가 뒤에 나타난다는 점에 착안하여 본 논문에서 제안한 분해 알고리즘은 복합명사를 끝음절에서 첫음절 방향 즉 역방향으로 분해를 시도한다. ETRI의 태깅된 코퍼스로부터 추출한 복합명사 3,230개에 대해 실험한 결과 약 96.6%의 분해 정확도를 얻었다. 미등록어를 포함한 복합명사의 경우는 77.5%의 분해 정확도를 나타냈다. 실험에 사용된 데이터중의 미등록어는 대부분 접사를 포함한 파행어로서, 제안한 복합명사 분해 알고리즘은 접사가 부착된 미등록어 분석에 있어서 보다 높은 분석 정확도를 나타냄을 알 수 있었다.

  • PDF