• Title/Summary/Keyword: Word Space Model

Search Result 60, Processing Time 0.022 seconds

Word Sense Disambiguation Using Embedded Word Space

  • Kang, Myung Yun;Kim, Bogyum;Lee, Jae Sung
    • Journal of Computing Science and Engineering
    • /
    • v.11 no.1
    • /
    • pp.32-38
    • /
    • 2017
  • Determining the correct word sense among ambiguous senses is essential for semantic analysis. One of the models for word sense disambiguation is the word space model which is very simple in the structure and effective. However, when the context word vectors in the word space model are merged into sense vectors in a sense inventory, they become typically very large but still suffer from the lexical scarcity. In this paper, we propose a word sense disambiguation method using word embedding that makes the sense inventory vectors compact and efficient due to its additive compositionality. Results of experiments with a Korean sense-tagged corpus show that our method is very effective.

Word Sense Disambiguation using Korean Word Space Model (한국어 단어 공간 모델을 이용한 단어 의미 중의성 해소)

  • Park, Yong-Min;Lee, Jae-Sung
    • The Journal of the Korea Contents Association
    • /
    • v.12 no.6
    • /
    • pp.41-47
    • /
    • 2012
  • Various Korean word sense disambiguation methods have been proposed using small scale of sense-tagged corpra and dictionary definitions to calculate entropy information, conditional probability, mutual information and etc. for each method. This paper proposes a method using Korean Word Space model which builds word vectors from a large scale of sense-tagged corpus and disambiguates word senses with the similarity calculation between the word vectors. Experiment with Sejong morph sense-tagged corpus showed 94% precision for 200 sentences(583 word types), which is much superior to the other known methods.

Word Sense Classification Using Support Vector Machines (지지벡터기계를 이용한 단어 의미 분류)

  • Park, Jun Hyeok;Lee, Songwook
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.11
    • /
    • pp.563-568
    • /
    • 2016
  • The word sense disambiguation problem is to find the correct sense of an ambiguous word having multiple senses in a dictionary in a sentence. We regard this problem as a multi-class classification problem and classify the ambiguous word by using Support Vector Machines. Context words of the ambiguous word, which are extracted from Sejong sense tagged corpus, are represented to two kinds of vector space. One vector space is composed of context words vectors having binary weights. The other vector space has vectors where the context words are mapped by word embedding model. After experiments, we acquired accuracy of 87.0% with context word vectors and 86.0% with word embedding model.

SOC Verification Based on WGL

  • Du, Zhen-Jun;Li, Min
    • Journal of Korea Multimedia Society
    • /
    • v.9 no.12
    • /
    • pp.1607-1616
    • /
    • 2006
  • The growing market of multimedia and digital signal processing requires significant data-path portions of SoCs. However, the common models for verification are not suitable for SoCs. A novel model--WGL (Weighted Generalized List) is proposed, which is based on the general-list decomposition of polynomials, with three different weights and manipulation rules introduced to effect node sharing and the canonicity. Timing parameters and operations on them are also considered. Examples show the word-level WGL is the only model to linearly represent the common word-level functions and the bit-level WGL is especially suitable for arithmetic intensive circuits. The model is proved to be a uniform and efficient model for both bit-level and word-level functions. Then Based on the WGL model, a backward-construction logic-verification approach is presented, which reduces time and space complexity for multipliers to polynomial complexity(time complexity is less than $O(n^{3.6})$ and space complexity is less than $O(n^{1.5})$) without hierarchical partitioning. Finally, a construction methodology of word-level polynomials is also presented in order to implement complex high-level verification, which combines order computation and coefficient solving, and adopts an efficient backward approach. The construction complexity is much less than the existing ones, e.g. the construction time for multipliers grows at the power of less than 1.6 in the size of the input word without increasing the maximal space required. The WGL model and the verification methods based on WGL show their theoretical and applicable significance in SoC design.

  • PDF

Korean Document Classification Using Extended Vector Space Model (확장된 벡터 공간 모델을 이용한 한국어 문서 분류 방안)

  • Lee, Samuel Sang-Kon
    • The KIPS Transactions:PartB
    • /
    • v.18B no.2
    • /
    • pp.93-108
    • /
    • 2011
  • We propose a extended vector space model by using ambiguous words and disambiguous words to improve the result of a Korean document classification method. In this paper we study the precision enhancement of vector space model and we propose a new axis that represents a weight value. Conventional classification methods without the weight value had some problems in vector comparison. We define a word which has same axis of the weight value as ambiguous word after calculating a mutual information value between a term and its classification field. We define a word which is disambiguous with ambiguous meaning as disambiguous word. We decide the strengthness of a disambiguous word among several words which is occurring ambiguous word and a same document. Finally, we proposed a new classification method based on extension of vector dimension with ambiguous and disambiguous words.

Empirical Comparison of Word Similarity Measures Based on Co-Occurrence, Context, and a Vector Space Model

  • Kadowaki, Natsuki;Kishida, Kazuaki
    • Journal of Information Science Theory and Practice
    • /
    • v.8 no.2
    • /
    • pp.6-17
    • /
    • 2020
  • Word similarity is often measured to enhance system performance in the information retrieval field and other related areas. This paper reports on an experimental comparison of values for word similarity measures that were computed based on 50 intentionally selected words from a Reuters corpus. There were three targets, including (1) co-occurrence-based similarity measures (for which a co-occurrence frequency is counted as the number of documents or sentences), (2) context-based distributional similarity measures obtained from a latent Dirichlet allocation (LDA), nonnegative matrix factorization (NMF), and Word2Vec algorithm, and (3) similarity measures computed from the tf-idf weights of each word according to a vector space model (VSM). Here, a Pearson correlation coefficient for a pair of VSM-based similarity measures and co-occurrence-based similarity measures according to the number of documents was highest. Group-average agglomerative hierarchical clustering was also applied to similarity matrices computed by individual measures. An evaluation of the cluster sets according to an answer set revealed that VSM- and LDA-based similarity measures performed best.

Semantic Extention Search for Documents Using the Word2vec (Word2vec을 활용한 문서의 의미 확장 검색방법)

  • Kim, Woo-ju;Kim, Dong-he;Jang, Hee-won
    • The Journal of the Korea Contents Association
    • /
    • v.16 no.10
    • /
    • pp.687-692
    • /
    • 2016
  • Conventional way to search documents is keyword-based queries using vector space model, like tf-idf. Searching process of documents which is based on keywords can make some problems. it cannot recogize the difference of lexically different but semantically same words. This paper studies a scheme of document search based on document queries. In particular, it uses centrality vectors, instead of tf-idf vectors, to represent query documents, combined with the Word2vec method to capture the semantic similarity in contained words. This scheme improves the performance of document search and provides a way to find documents not only lexically, but semantically close to a query document.

Word Sense Similarity Clustering Based on Vector Space Model and HAL (벡터 공간 모델과 HAL에 기초한 단어 의미 유사성 군집)

  • Kim, Dong-Sung
    • Korean Journal of Cognitive Science
    • /
    • v.23 no.3
    • /
    • pp.295-322
    • /
    • 2012
  • In this paper, we cluster similar word senses applying vector space model and HAL (Hyperspace Analog to Language). HAL measures corelation among words through a certain size of context (Lund and Burgess 1996). The similarity measurement between a word pair is cosine similarity based on the vector space model, which reduces distortion of space between high frequency words and low frequency words (Salton et al. 1975, Widdows 2004). We use PCA (Principal Component Analysis) and SVD (Singular Value Decomposition) to reduce a large amount of dimensions caused by similarity matrix. For sense similarity clustering, we adopt supervised and non-supervised learning methods. For non-supervised method, we use clustering. For supervised method, we use SVM (Support Vector Machine), Naive Bayes Classifier, and Maximum Entropy Method.

  • PDF

Semantic Feature Analysis for Multi-Label Text Classification on Topics of the Al-Quran Verses

  • Gugun Mediamer;Adiwijaya
    • Journal of Information Processing Systems
    • /
    • v.20 no.1
    • /
    • pp.1-12
    • /
    • 2024
  • Nowadays, Islamic content is widely used in research, including Hadith and the Al-Quran. Both are mostly used in the field of natural language processing, especially in text classification research. One of the difficulties in learning the Al-Quran is ambiguity, while the Al-Quran is used as the main source of Islamic law and the life guidance of a Muslim in the world. This research was proposed to relieve people in learning the Al-Quran. We proposed a word embedding feature-based on Tensor Space Model as feature extraction, which is used to reduce the ambiguity. Based on the experiment results and the analysis, we prove that the proposed method yields the best performance with the Hamming loss 0.10317.

Large Vocabulary Continuous Speech Recognition Based on Language Model Network (언어 모델 네트워크에 기반한 대어휘 연속 음성 인식)

  • 안동훈;정민화
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.6
    • /
    • pp.543-551
    • /
    • 2002
  • In this paper, we present an efficient decoding method that performs in real time for 20k word continuous speech recognition task. Basic search method is a one-pass Viterbi decoder on the search space constructed from the novel language model network. With the consistent search space representation derived from various language models by the LM network, we incorporate basic pruning strategies, from which tokens alive constitute a dynamic search space. To facilitate post-processing, it produces a word graph and a N-best list subsequently. The decoder is tested on the database of 20k words and evaluated with respect to accuracy and RTF.