• Title/Summary/Keyword: Disambiguous Word

Search Result 2, Processing Time 0.014 seconds

Korean Document Classification Using Extended Vector Space Model (확장된 벡터 공간 모델을 이용한 한국어 문서 분류 방안)

  • Lee, Samuel Sang-Kon
    • The KIPS Transactions:PartB
    • /
    • v.18B no.2
    • /
    • pp.93-108
    • /
    • 2011
  • We propose a extended vector space model by using ambiguous words and disambiguous words to improve the result of a Korean document classification method. In this paper we study the precision enhancement of vector space model and we propose a new axis that represents a weight value. Conventional classification methods without the weight value had some problems in vector comparison. We define a word which has same axis of the weight value as ambiguous word after calculating a mutual information value between a term and its classification field. We define a word which is disambiguous with ambiguous meaning as disambiguous word. We decide the strengthness of a disambiguous word among several words which is occurring ambiguous word and a same document. Finally, we proposed a new classification method based on extension of vector dimension with ambiguous and disambiguous words.

A Method Of Compound Noun Phrase Indexing for Resolving Syntactic Diversity (구문 다양성 해소를 위한 복합명사구 색인 방법)

  • Cho, Min-Hee;Jeong, Do-Heon
    • The Journal of the Korea Contents Association
    • /
    • v.11 no.3
    • /
    • pp.467-476
    • /
    • 2011
  • Compound noun phrase (CNP) is important factor for semantic information process because the meaning of the CNP is more disambiguous than that of single word. However, the CNP can be expressed in various types even though it expresses same meaning. It is called syntactic diversity. It makes information system difficult to grasp sense identity. In order to resolve the syntactic diversity in this research, we propose an indexing method for compound noun phrase. The main purpose is to make identical index term for various types of CNPs which has same meaning. To do so, the research follows next steps. For the first, we make rule template and utilize the template to extract CNPs from set of domestic research papers. In general, the CNP has a unique meaning. Considering the characteristic, we suggest synthesis rules of index terms and apply the rule to CNPs extracted in previous step. For the objective performance evaluation of the research, a test set, HANTEC 2.0, was utilized and the result was compared to baseline model. Through the experiment and the evaluation, we have confirmed that the indexing method suggested in this paper could positively affect retrieval precision and improve performance of the information retrieval.