• Title/Summary/Keyword: 동사 군집화

Search Result 4, Processing Time 0.02 seconds

Similar Verb Words Extraction based on their Case Frame Structure (격틀 구조에 기반한 유사 동사 추출)

  • Cho, Junghyun;Jung, Hyunki;Kim, Yu-Seop
    • Annual Conference on Human and Language Technology
    • /
    • 2009.10a
    • /
    • pp.219-224
    • /
    • 2009
  • 한국어 Propbank를 구축하기 위해서는 유사 동사를 군집화하고 군집에 포함되는 동사들의 구문 및 의미 특성을 모아놓은 정보가 필요하다. 본 연구에서는 이러한 군집화의 초기 단계로써 개별 동사들의 격틀 구조에 기반하여 동사간의 유사도를 추정하여 유사 동사를 추출하고자 하였다. 본 연구는 개별 동사의 격틀 정보를 추출하기 위하여 세종 계획의 용언 사전과 KAIST 언어자원의 동사 격틀 사전을 활용하였다. 또한 격틀을 세분화하여 보다 상세한 격틀 정보를 생성하기 위하여 격틀이 가지고 있는 논항의 특성을 활용하였다. 동사의 유사도를 측정하기 위하여 개별 동사들은 벡터로 표현하였고, 벡터의 원소는 해당 동사가 다른 동사와 세분화된 격틀을 공유하는 정도로 하였다. 실험에서는 두 용언 사전에서 개별적으로 위의 과정을 진행하여 각 동사와 유사한 동사들을 추출하였다.

  • PDF

Performance Improvement of Word Clustering Using Ontology (온톨로지를 이용한 단어 군집화 성능 개선)

  • Park Eun-Jin;Kim Jae-Hoon;Ock Cheol-Young
    • The KIPS Transactions:PartB
    • /
    • v.13B no.3 s.106
    • /
    • pp.337-344
    • /
    • 2006
  • In this paper, we describe the design and the implementation of word clustering system using a definition of an entry word in the dictionary, called a dictionary definition. Generally word clustering needs various features like words and the performance of a system for the word clustering depends on using some kinds of features. Dictionary definition describes the meaning of an entry in detail, but words in the dictionary definition are implicative or abstractive, and then its length is not long. The word clustering using only features extracted from the dictionary definition results in a lots of small-size clusters. In order to make large-size clusters and improve the performance, we need to transform the features into more general words with keeping the original meaning of the dictionary definition as intact as possible. In this paper, we propose two methods for extending the dictionary definition using ontology. One is to extend the dictionary definition to parent words on the ontology and the other is to extend the dictionary definition to some words in fixed depth from the root of the ontology. Through our experiments, we have observed that the proposed systems outperform that without extending features, and the latter's extending method overtakes the former's extending method in performance. We have also observed that verbs are very useful in extending features in the case of word clustering.

Semantic Clustering of Predicates using Word Definition in Dictionary (사전 뜻풀이를 이용한 용언 의미 군집화)

  • Bae, Young-Jun;Choe, Ho-Seop;Song, Yoo-Hwa;Ock, Cheol-Young
    • Korean Journal of Cognitive Science
    • /
    • v.22 no.3
    • /
    • pp.271-298
    • /
    • 2011
  • The lexical semantic system should be built to grasp lexical semantic information more clearly. In this paper, we studied a semantic clustering of predicates that is one of the steps in building the lexical semantic system. Unlike previous studies that used argument of subcategorization(subject and object), selectional restrictions and interaction information of adverb, we used sense tagged definition in dictionary for the semantic clustering of predicate, and also attempted hierarchical clustering of predicate using the relationship between the generic concept and the specific concept. Most of the predicates in the dictionary were used for clustering. Total of 106,501 predicates(85,754 verbs, 20,747 adjectives) were used for the test. We got results of clustering which is 2,748 clusters of predicate and 130 recursive definition clusters and 261 sub-clusters. The maximum depth of cluster was 16 depth. We compared results of clustering with the Sejong semantic classes for evaluation. The results showed 70.14% of the cohesion.

  • PDF

e-Learning Course Reviews Analysis based on Big Data Analytics (빅데이터 분석을 이용한 이러닝 수강 후기 분석)

  • Kim, Jang-Young;Park, Eun-Hye
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.21 no.2
    • /
    • pp.423-428
    • /
    • 2017
  • These days, various and tons of education information are rapidly increasing and spreading due to Internet and smart devices usage. Recently, as e-Learning usage increasing, many instructors and students (learners) need to set a goal to maximize learners' result of education and education system efficiency based on big data analytics via online recorded education historical data. In this paper, the author applied Word2Vec algorithm (neural network algorithm) to find similarity among education words and classification by clustering algorithm in order to objectively recognize and analyze online recorded education historical data. When the author applied the Word2Vec algorithm to education words, related-meaning words can be found, classified and get a similar vector values via learning repetition. In addition, through experimental results, the author proved the part of speech (noun, verb, adjective and adverb) have same shortest distance from the centroid by using clustering algorithm.