• Title/Summary/Keyword: 단어백터

Search Result 4, Processing Time 0.019 seconds

Feature Extraction of Web Document using Association Word Mining (연관 단어 마이닝을 사용한 웹문서의 특징 추출)

  • 고수정;최준혁;이정현
    • Journal of KIISE:Databases
    • /
    • v.30 no.4
    • /
    • pp.351-361
    • /
    • 2003
  • The previous studies to extract features for document through word association have the problems of updating profiles periodically, dealing with noun phrases, and calculating the probability for indices. We propose more effective feature extraction method which is using association word mining. The association word mining method, by using Apriori algorithm, represents a feature for document as not single words but association-word-vectors. Association words extracted from document by Apriori algorithm depend on confidence, support, and the number of composed words. This paper proposes an effective method to determine confidence, support, and the number of words composing association words. Since the feature extraction method using association word mining does not use the profile, it need not update the profile, and automatically generates noun phrase by using confidence and support at Apriori algorithm without calculating the probability for index. We apply the proposed method to document classification using Naive Bayes classifier, and compare it with methods of information gain and TFㆍIDF. Besides, we compare the method proposed in this paper with document classification methods using index association and word association based on the model of probability, respectively.

Automatic term-network construction for Oral Documents (구술문서에 기초한 자동 용어 네트워크 구축)

  • Park, Soon-Cheol
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.12 no.4
    • /
    • pp.25-31
    • /
    • 2007
  • An automatic term-network construction system is proposed in this paper. This system uses the statistical values of the terms appeared in a document corpus. The 186 oral history documents collected from the Saemangeum area of Chollapuk-do, Korea, are used for the research. The term relationships presented in the term-network are decided by the cosine similarities of the term vectors. The number of the terms extracted from the documents is about 1700. The system is able to show the term relationships from the term-network as quickly as like a real-time system. The way of this term-network construction is expected as one of the methods to construct the ontology system and to support the semantic retrieval system in the near future.

  • PDF

A Study on Similarity Calculation Method Between Research Infrastructure (국가연구시설장비의 유사도 판단기법에 관한 연구)

  • Kim, Yong Joo;Kim, Young Chan
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.7 no.12
    • /
    • pp.469-476
    • /
    • 2018
  • In order to jointly utilize research infrastructure and to build efficient construction, which are essential in science and technology research and development process. Although various classification methods have been introduced for efficient utilization of registered information, functions that can be directly utilized such as similar research infrastructure search is not yet been implemented due to limitations of collection information. In this study, we analyzed the similar search technique so far, presented the methodology for the calculation of similarity of research infrastructure, and analyzed the learning result. Study suggested that a technique can be use to extract meaningful keywords from information and analyze the similarity between the research infrastructure.

Lexical Ambiguity Resolution System of Korean Language using Dependency Grammar and Collative Semantics (의존 문법과 대조 의미론을 이용한 한국어의 어휘적 중의성 해결 시스템)

  • 윤근수;권혁철
    • Korean Journal of Cognitive Science
    • /
    • v.3 no.1
    • /
    • pp.1-24
    • /
    • 1991
  • This paper presents the Lexical Ambiguity Resolution System of Korean Language. This system uses Dependency grammar and Collative Semantics. Dependency grammar is used to analyze Korean syntactic dependency. A robust way to analyze a sentence is to establish links between individual words. Collative Semantics investigates the interplay between lexical ambiguity and semantics relations. Collative Semantics consists of sense-frame, semantic vector, collation, and screening. Our system was implemented by C programming language. This system analyzes sentences, discriminates the kinds of semantic relation between pairs of words senses in those sentences, and resolves lexical ambiguity.