• Title/Summary/Keyword: multi-sense word

Search Result 13, Processing Time 0.026 seconds

Modified multi-sense skip-gram using weighted context and x-means (가중 문맥벡터와 X-means 방법을 이용한 변형 다의어스킵그램)

  • Jeong, Hyunwoo;Lee, Eun Ryung
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.3
    • /
    • pp.389-399
    • /
    • 2021
  • In recent years, word embedding has been a popular field of natural language processing research and a skip-gram has become one successful word embedding method. It assigns a word embedding vector to each word using contexts, which provides an effective way to analyze text data. However, due to the limitation of vector space model, primary word embedding methods assume that every word only have a single meaning. As one faces multi-sense words, that is, words with more than one meaning, in reality, Neelakantan (2014) proposed a multi-sense skip-gram (MSSG) to find embedding vectors corresponding to the each senses of a multi-sense word using a clustering method. In this paper, we propose a modified method of the MSSG to improve statistical accuracy. Moreover, we propose a data-adaptive choice of the number of clusters, that is, the number of meanings for a multi-sense word. Some numerical evidence is given by conducting real data-based simulations.

Word Sense Classification Using Support Vector Machines (지지벡터기계를 이용한 단어 의미 분류)

  • Park, Jun Hyeok;Lee, Songwook
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.11
    • /
    • pp.563-568
    • /
    • 2016
  • The word sense disambiguation problem is to find the correct sense of an ambiguous word having multiple senses in a dictionary in a sentence. We regard this problem as a multi-class classification problem and classify the ambiguous word by using Support Vector Machines. Context words of the ambiguous word, which are extracted from Sejong sense tagged corpus, are represented to two kinds of vector space. One vector space is composed of context words vectors having binary weights. The other vector space has vectors where the context words are mapped by word embedding model. After experiments, we acquired accuracy of 87.0% with context word vectors and 86.0% with word embedding model.

The Structure of Polysemy: A study of multi-sense words based on WordNet

  • Lin, Jen-Yi;Yang, Chang-Hua;Tseng, Shu-Chuan;Huang, Chu-Ren
    • Proceedings of the Korean Society for Language and Information Conference
    • /
    • 2002.02a
    • /
    • pp.320-329
    • /
    • 2002
  • The issues in polysemy with respect to the verbs in WordNet will be discussed in this paper. The hypernymy/hyponymy structure of the multiple senses is observed when we try to build a bilingual network for Chinese and English. There are several types of polysemic patterns and a co-hypernym may have the same word form as its subordinates. Fellbaum (2000) dubbed autotroponymy that the verbs linked by mailer relation share the same verb form. However, her syntactic criteria seem not compatible to the hierarchies in WN. Either the criteria or the network should be reconducted. For most verbs in WN 1.7, polysemous relations are unlikely to extend over 3 levels of IS-A relation. Highly polysemous verbs are more complicated and may be involved in certain semantic structures. Semi-automatic sense grouping may be helpful for multimlinguital information retrieveal.

  • PDF

CNN-based Distant Supervision Relation Extraction Model with Multi-sense Word Embedding (다중-어의 단어 임베딩을 적용한 CNN 기반 원격 지도 학습 관계 추출 모델)

  • Nam, Sangha;Han, Kijong;Kim, Eun-Kyung;Gwon, Seong-Gu;Jeong, Yu-Seong;Choi, Key-Sun
    • Annual Conference on Human and Language Technology
    • /
    • 2017.10a
    • /
    • pp.137-142
    • /
    • 2017
  • 원격 지도 학습은 자동으로 매우 큰 코퍼스와 지식베이스 간의 주석 데이터를 생성하여 기계 학습에 필요한 학습 데이터를 사람의 손을 빌리지 않고 저렴한 비용으로 만들 수 있어, 많은 연구들이 관계 추출 문제를 해결하기 위해 원격 지도 학습 방법을 적용하고 있다. 그러나 기존 연구들에서는 모델 학습의 입력으로 사용되는 단어 임베딩에서 단어의 동형이의어 성질을 반영하지 못한다는 단점이 있다. 때문에 서로 다른 의미를 가진 동형이의어가 하나의 임베딩 값을 가지다 보니, 단어의 의미를 정확히 파악하지 못한채 관계 추출 모델을 학습한다고 볼 수 있다. 본 논문에서는 원격 지도 학습 기반 관계 추출 모델에 다중-어의 단어 임베딩을 적용한 모델을 제안한다. 다중-어의 단어 임베딩 학습을 위해 어의 중의성 해소 모듈을 활용하였으며, 관계 추출 모델은 문장 내 주요 특징을 효율적으로 파악하는 모델인 CNN과 PCNN을 활용하였다. 본 논문에서 제안하는 다중-어의 단어 임베딩 적용 관계추출 모델의 성능을 평가하기 위해 추가적으로 2가지 방식의 단어 임베딩을 학습하여 비교 평가를 수행하였고, 그 결과 어의 중의성 해소 모듈을 활용한 단어 임베딩을 활용하였을 때 관계추출 모델의 성능이 향상된 결과를 보였다.

  • PDF

CNN-based Distant Supervision Relation Extraction Model with Multi-sense Word Embedding (다중-어의 단어 임베딩을 적용한 CNN 기반 원격 지도 학습 관계 추출 모델)

  • Nam, Sangha;Han, Kijong;Kim, Eun-Kyung;Gwon, Seong-Gu;Jeong, Yu-Seong;Choi, Key-Sun
    • 한국어정보학회:학술대회논문집
    • /
    • 2017.10a
    • /
    • pp.137-142
    • /
    • 2017
  • 원격 지도 학습은 자동으로 매우 큰 코퍼스와 지식베이스 간의 주석 데이터를 생성하여 기계 학습에 필요한 학습 데이터를 사람의 손을 빌리지 않고 저렴한 비용으로 만들 수 있어, 많은 연구들이 관계 추출 문제를 해결하기 위해 원격 지도 학습 방법을 적용하고 있다. 그러나 기존 연구들에서는 모델 학습의 입력으로 사용되는 단어 임베딩에서 단어의 동형이의어 성질을 반영하지 못한다는 단점이 있다. 때문에 서로 다른 의미를 가진 동형이의어가 하나의 임베딩 값을 가지다 보니, 단어의 의미를 정확히 파악하지 못한 채 관계 추출 모델을 학습한다고 볼 수 있다. 본 논문에서는 원격 지도 학습 기반 관계 추출 모델에 다중-어의 단어 임베딩을 적용한 모델을 제안한다. 다중-어의 단어 임베딩 학습을 위해 어의 중의성 해소 모듈을 활용하였으며, 관계 추출 모델은 문장 내 주요 특징을 효율적으로 파악하는 모델인 CNN과 PCNN을 활용하였다. 본 논문에서 제안하는 다중-어의 단어 임베딩 적용 관계추출 모델의 성능을 평가하기 위해 추가적으로 2가지 방식의 단어 임베딩을 학습하여 비교 평가를 수행하였고, 그 결과 어의 중의성 해소 모듈을 활용한 단어 임베딩을 활용하였을 때 관계추출 모델의 성능이 향상된 결과를 보였다.

  • PDF

A Study on the Identification and Classification of Relation Between Biotechnology Terms Using Semantic Parse Tree Kernel (시맨틱 구문 트리 커널을 이용한 생명공학 분야 전문용어간 관계 식별 및 분류 연구)

  • Choi, Sung-Pil;Jeong, Chang-Hoo;Chun, Hong-Woo;Cho, Hyun-Yang
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.45 no.2
    • /
    • pp.251-275
    • /
    • 2011
  • In this paper, we propose a novel kernel called a semantic parse tree kernel that extends the parse tree kernel previously studied to extract protein-protein interactions(PPIs) and shown prominent results. Among the drawbacks of the existing parse tree kernel is that it could degenerate the overall performance of PPI extraction because the kernel function may produce lower kernel values of two sentences than the actual analogy between them due to the simple comparison mechanisms handling only the superficial aspects of the constituting words. The new kernel can compute the lexical semantic similarity as well as the syntactic analogy between two parse trees of target sentences. In order to calculate the lexical semantic similarity, it incorporates context-based word sense disambiguation producing synsets in WordNet as its outputs, which, in turn, can be transformed into more general ones. In experiments, we introduced two new parameters: tree kernel decay factors, and degrees of abstracting lexical concepts which can accelerate the optimization of PPI extraction performance in addition to the conventional SVM's regularization factor. Through these multi-strategic experiments, we confirmed the pivotal role of the newly applied parameters. Additionally, the experimental results showed that semantic parse tree kernel is superior to the conventional kernels especially in the PPI classification tasks.

High Speed Triple-port Register File for 32-bit RISC/DSP Processors (32비트 RISC/DSP CPU를 위한 고속 3포트 레지스터 파일의 설계)

  • 고재명;유동렬
    • Proceedings of the IEEK Conference
    • /
    • 1998.10a
    • /
    • pp.1165-1168
    • /
    • 1998
  • This paper describes a 72-word by 32-bit 2-read/1-write multi-port register file, which is suitable for 32-bit RISC/DSP microprocessors. To minimize area and achieve high speed, advanced single-ended sense amplifiers are used. Each part of circuit is optimized at transistor level. The verification of functionality and timing is performed using HSPICE simulations. After modeling and validating the circuit at transistor level, it was laid out in a 0.6um 1-poly 3-metal layer CMOS technology. The simulation results show maximum operating frequency is 179MHz in worst case conditions. It contains 27,326 transistors and the size is 3.02mm by 2.20mm.

  • PDF

A New Hidden Error Function for Layer-By-Layer Training of Multi layer Perceptrons (다층 퍼셉트론의 층별 학습을 위한 중간층 오차 함수)

  • Oh Sang-Hoon
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2005.11a
    • /
    • pp.364-370
    • /
    • 2005
  • LBL(Layer-By-Layer) algorithms have been proposed to accelerate the training speed of MLPs(Multilayer Perceptrons). In this LBL algorithms, each layer needs a error function for optimization. Especially, error function for hidden layer has a great effect to achieve good performance. In this sense, this paper proposes a new hidden layer error function for improving the performance of LBL algorithm for MLPs. The hidden layer error function is derived from the mean squared error of output layer. Effectiveness of the proposed error function was demonstrated for a handwritten digit recognition and an isolated-word recognition tasks and very fast learning convergence was obtained.

  • PDF

A Study on a Method for Computing the Kill/Survival 6Probability of Vulnerable Target (다수 미사일의 공격에 대한 복합취약 표적의 생존확률에 대한 연구)

  • 황흥석
    • Journal of the military operations research society of Korea
    • /
    • v.22 no.2
    • /
    • pp.200-214
    • /
    • 1996
  • In this paper, the problem of determining the probability of kill(or survival) of a vulnerable target by one or more missiles is considered. The general formulas are obtained for the kill or survival probability the target is killed or survival. Several well-known concepts such as those of vulnerability, lethality, multi-component target, and a general combinatorial theorem of probability are introduced and used. For the convenience in this paper, the word missile is used in a very general sense and the target is generally taken to be a point target. And, this paper, is concentrated primarily with the probabilistic aspects of the problem, also a general numerical procedures are also described. Two examples are shown to illustrate the use of some of the formulas in this study, but also illustrate a few points which may not have been sufficiently emphasized. The extension study to complete a software package will be followed.

  • PDF

The Study on the Important Factors of the Amenity in Multi-Family Housing Estates (공동주택 주거환경의 어메니티 중요인자에 관한 연구)

  • 이재준
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.26 no.3
    • /
    • pp.118-133
    • /
    • 1998
  • The residents, living in multi-family housing, prefer to have healthy and natural outdoor environmental for better human and environment quality. Thus, providing the high qulity of amenity has become a popular word in site planning and housing developments field. However, the scope and definition of amenity have not yet clearly identified and it becomes and issue in planning and development field. The purpose of this study isto examine and to evaluate the amenity and its implicationi for site planning so that analysis methods such as to interview and survey with residents were carried out. The results of this study are summarized below; The amenity of residential environment means total environmental quality to the residents in a broad sense. Abundent green environment would be very important factor to increase the amenity of residential environment so that the expansion of green field would improve the quality of multifamily housing. The expansion of green environment and biotope was the most important factor to increase the symbiosis system between residents and outdoor environment. And the amenity should be conformed to the certain standand of environmental quality and the high quality of amenity would be increased significantly in the future for residental developments. Thus, it should be accomplished by preparing practical methods in means of discriminative strategy products planning principle.

  • PDF