• Title/Summary/Keyword: Noun extraction

Search Result 52, Processing Time 0.021 seconds

Keyword Extraction in Korean Using Unsupervised Learning Method (비감독 학습 기법에 의한 한국어의 키워드 추출)

  • Shin, Seong-Yoon;Rhee, Yang-Won
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.14 no.6
    • /
    • pp.1403-1408
    • /
    • 2010
  • Korean information retrieval uses noun as index terms or keywords of representing the document. and noun and keyword extraction is to find all nouns presented in the document, In this paper, we proposes the method of keyword extraction using pre-built dictionary. This method reduces the execution time by reducing unnecessary operations. And noun, even large documents without affecting significantly the accuracy, can be extracted. This paper proposed noun extraction method using the appearance characteristics of the noun and keyword extraction method using unsupervised learning techniques.

Keyword Extraction Using Unsupervised Learning Method (비감독 학습 기법에 의한 키워드 추출)

  • Shin, Seong-Yoon;Baek, Jeong-Uk;Rhee, Yang-Won
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2010.05a
    • /
    • pp.165-166
    • /
    • 2010
  • Noun extraction is to find all nouns presented in the document, Korean information retrieval uses noun as index terms or keywords of representing the document. In this paper, we proposes the method of keyword extraction using pre-built dictionary. This method reduces the execution time by reducing unnecessary operations. And noun, even large documents without affecting significantly the accuracy, can be extracted. This paper proposed noun extraction method using the appearance characteristics of the noun and keyword extraction method using unsupervised learning techniques.

  • PDF

Noun and Keyword Extraction for Information Processing of Korean (한국어 정보처리를 위한 명사 및 키워드 추출)

  • Shin, Seong-Yoon;Rhee, Yang-Won
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.3
    • /
    • pp.51-56
    • /
    • 2009
  • In a language, noun and keyword extraction is a key element in information processing. When it comes to processing Korean language information, however, there are still a lot of problems with noun and keyword extraction. This paper proposes an effective noun extraction method that considers noun emergence features. The proposed method can be effectively used in areas like information retrieval where large volumes of documents and data need to be processed in a fast manner. In this paper, a category-based keyword construction method is also presented that uses an unsupervised learning technique to ensure high volumes of queries are automatically classified. Our experimental results show that the proposed method outperformed both the supervised learning-based X2 method known to excel in keyword extraction and the DF method, in terms o classification precision.

Korean Base-Noun Extraction and its Application (한국어 기준명사 추출 및 그 응용)

  • Kim, Jae-Hoon
    • The KIPS Transactions:PartB
    • /
    • v.15B no.6
    • /
    • pp.613-620
    • /
    • 2008
  • Noun extraction plays an important part in the fields of information retrieval, text summarization, and so on. In this paper, we present a Korean base-noun extraction system and apply it to text summarization to deal with a huge amount of text effectively. The base-noun is an atomic noun but not a compound noun and we use tow techniques, filtering and segmenting. The filtering technique is used for removing non-nominal words from text before extracting base-nouns and the segmenting technique is employed for separating a particle from a nominal and for dividing a compound noun into base-nouns. We have shown that both of the recall and the precision of the proposed system are about 89% on the average under experimental conditions of ETRI corpus. The proposed system has applied to Korean text summarization system and is shown satisfactory results.

Korean Noun Extractor using Occurrence Patterns of Nouns and Post-noun Morpheme Sequences (한국어 명사 출현 특성과 후절어를 이용한 명사추출기)

  • Park, Yong-Hyun;Hwang, Jae-Won;Ko, Young-Joong
    • Journal of KIISE:Software and Applications
    • /
    • v.37 no.12
    • /
    • pp.919-927
    • /
    • 2010
  • Since the performance of mobile devices is recently improved, the requirement of information retrieval is increased in the mobile devices as well as PCs. If a mobile device with small memory uses a tradition language analysis tool to extract nouns from korean texts, it will impose a burden of analysing language. As a result, the need for the language analysis tools adequate to the mobile devices is increasing. Therefore, this paper proposes a new method for noun extraction using post-noun morpheme sequences and noun patterns from a large corpus. The proposed noun extractor has only the dictionary capacity of 146KB and its performance shows 0.86 $F_1$-measure; the capacity of noun dictionary corresponds to only the 4% capacity of the existing noun extractor with a POS tagger. In addition, it easily extract nouns for unknown word because its dependence for noun dictionaries is low.

Effective Thematic Words Extraction from a Book using Compound Noun Phrase Synthesis Method

  • Ahn, Hee-Jeong;Kim, Kee-Won;Kim, Seung-Hoon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.22 no.3
    • /
    • pp.107-113
    • /
    • 2017
  • Most of online bookstores are providing a user with the bibliographic book information rather than the concrete information such as thematic words and atmosphere. Especially, thematic words help a user to understand books and cast a wide net. In this paper, we propose an efficient extraction method of thematic words from book text by applying the compound noun and noun phrase synthetic method. The compound nouns represent the characteristics of a book in more detail than single nouns. The proposed method extracts the thematic word from book text by recognizing two types of noun phrases, such as a single noun and a compound noun combined with single nouns. The recognized single nouns, compound nouns, and noun phrases are calculated through TF-IDF weights and extracted as main words. In addition, this paper suggests a method to calculate the frequency of subject, object, and other roles separately, not just the sum of the frequencies of all nouns in the TF-IDF calculation method. Experiments is carried out in the field of economic management, and thematic word extraction verification is conducted through survey and book search. Thus, 9 out of the 10 experimental results used in this study indicate that the thematic word extracted by the proposed method is more effective in understanding the content. Also, it is confirmed that the thematic word extracted by the proposed method has a better book search result.

An Efficient Method for Korean Noun Extraction Using Noun Patterns (명사 출현 특성을 이용한 효율적인 한국어 명사 추출 방법)

  • 이도길;이상주;임해창
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.1_2
    • /
    • pp.173-183
    • /
    • 2003
  • Morphological analysis is the most widely used method for extracting nouns from Korean texts. For every Eojeol, in order to extract nouns from it, a morphological analyzer performs frequent dictionary lookup and applies many morphonological rules, therefore it requires many operations. Moreover, a morphological analyzer generates all the possible morphological interpretations (sequences of morphemes) of a given Eojeol, which may by unnecessary from the noun extraction`s point of view. To reduce unnecessary computation of morphological analysis from the noun extraction`s point of view, this paper proposes a method for Korean noun extraction considering noun occurrence characteristics. Noun patterns denote conditions on which nouns are included in an Eojeol or not, which are positive cues or negative cues, respectively. When using the exclusive information as the negative cues, it is possible to reduce the search space of morphological analysis by ignoring Eojeols not including nouns. Post-noun syllable sequences(PNSS) as the positive cues can simply extract nouns by checking the part of the Eojeol preceding the PNSS and can guess unknown nouns. In addition, morphonological information is used instead of many morphonological rules in order to recover the lexical form from its altered surface form. Experimental results show that the proposed method can speed up without losing accuracy compared with other systems based on morphological analysis.

Study on the Generation Methods of Composition Noun for Efficient Index Term Extraction (효율적인 색인어 추출을 위한 합성명사 생성 방안에 대한 연구)

  • Kim, Mi-Jin;Park, Mi-Seong;Choe, Jae-Hyeok;Lee, Sang-Jo
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.4
    • /
    • pp.1122-1131
    • /
    • 2000
  • The efficiency of thesytem depends upon an accurate extraction capability of index terms in the system of information search or in that of automatic index. Therefore, extraction of accurate index terms is of utmost importance. This report presents the generation methods of composition noun for efficient index term extraction by using words of high frequency appearance, so that the right documents can be found during information search. For the sake of presentation of this method, index terms of composition noun shall be extracted by applying the rule of composition and disintegration to the nouns with high frequency of appearance in the documents, such as those with upper 30%∼40% of frequency ratio. In addition, for he purpose of effecting an inspection of validity in relation to a composition of high frequency nouns such as those with upper 30∼40% of frequency ratio as presented in this report, it proposes an adequate frquency ratio during noun composition. Based upon the proposed application, in this short documents with less than 300 syllables, low frequency omissions were noticed, when composed with nouns in the upper 30% of frequency ratio; whereas the documents with more than 30 syllables, when composed with nouns in he upper 40% of frequency ration, had a considerable reduction of low frequency omissions. Thus, total number of index terms has decreased to 57.7% of these existing and an accurate extraction of index terms with an 85.6% adequacy ratio became possible.

  • PDF

Noun and affix extraction using conjunctive information (결합정보를 이용한 명사 및 접사 추출)

  • 서창덕;박인칠
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.34C no.5
    • /
    • pp.71-81
    • /
    • 1997
  • This paper proposes noun and affix extraction methods using conjunctive information for making an automatic indexing system thorugh morphological analysis and syntactic analysis. The korean language has a peculiar spacing words rule, which is different from other languages, and the conjunctive information, which is extracted from the rule, can reduce the number of multiple parts of speech at a minimum cost. The proposed algorithms also solve the problem that one word is seperated by newline charcter. We show efficiency of the proposed algorithms through the process of morhologica analyzing.

  • PDF

A Method for Clustering Noun Phrases into Coreferents for the Same Person in Novels Translated into Korean (한국어 번역 소설에서 인물명 명사구의 동일인물 공통참조 클러스터링 방법)

  • Park, Taekeun;Kim, Seung-Hoon
    • Journal of Korea Multimedia Society
    • /
    • v.20 no.3
    • /
    • pp.533-542
    • /
    • 2017
  • Novels include various character names, depending on the genre and the spatio-temporal background of the novels and the nationality of characters. Besides, characters and their names in a novel are created by the author's pen and imagination. As a result, any proper noun dictionary cannot include all kinds of character names. In addition, the novels translated into Korean have character names consisting of two or more nouns (such as "Harry Potter"). In this paper, we propose a method to extract noun phrases for character names and to cluster the noun phrases into coreferents for the same character name. In the extraction of noun phrases, we utilize KKMA morpheme analyzer and CPFoAN character identification tool. In clustering the noun phrases into coreferents, we construct a directed graph with the character names extracted by CPFoAN and the extracted noun phrases, and then we create name sets for characters by traversing connected subgraphs in the directed graph. With four novels translated into Korean, we conduct a survey to evaluate the proposed method. The results show that the proposed method will be useful for speaker identification as well as for constructing the social network of characters.