• Title/Summary/Keyword: word semantic information

Search Result 306, Processing Time 0.024 seconds

Assignment Semantic Category of a Word using Word Embedding and Synonyms (워드 임베딩과 유의어를 활용한 단어 의미 범주 할당)

  • Park, Da-Sol;Cha, Jeong-Won
    • Journal of KIISE
    • /
    • v.44 no.9
    • /
    • pp.946-953
    • /
    • 2017
  • Semantic Role Decision defines the semantic relationship between the predicate and the arguments in natural language processing (NLP) tasks. The semantic role information and semantic category information should be used to make Semantic Role Decisions. The Sejong Electronic Dictionary contains frame information that is used to determine the semantic roles. In this paper, we propose a method to extend the Sejong electronic dictionary using word embedding and synonyms. The same experiment is performed using existing word-embedding and retrofitting vectors. The system performance of the semantic category assignment is 32.19%, and the system performance of the extended semantic category assignment is 51.14% for words that do not appear in the Sejong electronic dictionary of the word using the word embedding. The system performance of the semantic category assignment is 33.33%, and the system performance of the extended semantic category assignment is 53.88% for words that do not appear in the Sejong electronic dictionary of the vector using retrofitting. We also prove it is helpful to extend the semantic category word of the Sejong electronic dictionary by assigning the semantic categories to new words that do not have assigned semantic categories.

Latent Semantic Analysis Approach for Document Summarization Based on Word Embeddings

  • Al-Sabahi, Kamal;Zuping, Zhang;Kang, Yang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.1
    • /
    • pp.254-276
    • /
    • 2019
  • Since the amount of information on the internet is growing rapidly, it is not easy for a user to find relevant information for his/her query. To tackle this issue, the researchers are paying much attention to Document Summarization. The key point in any successful document summarizer is a good document representation. The traditional approaches based on word overlapping mostly fail to produce that kind of representation. Word embedding has shown good performance allowing words to match on a semantic level. Naively concatenating word embeddings makes common words dominant which in turn diminish the representation quality. In this paper, we employ word embeddings to improve the weighting schemes for calculating the Latent Semantic Analysis input matrix. Two embedding-based weighting schemes are proposed and then combined to calculate the values of this matrix. They are modified versions of the augment weight and the entropy frequency that combine the strength of traditional weighting schemes and word embedding. The proposed approach is evaluated on three English datasets, DUC 2002, DUC 2004 and Multilingual 2015 Single-document Summarization. Experimental results on the three datasets show that the proposed model achieved competitive performance compared to the state-of-the-art leading to a conclusion that it provides a better document representation and a better document summary as a result.

Word Network Analysis based on Mutual Information for Ontology of Korean Rural Planning (한국농촌계획 온톨로지 구축을 위한 상호정보 기반 단어연결망 분석)

  • Lee, Jemyung
    • Journal of Korean Society of Rural Planning
    • /
    • v.23 no.3
    • /
    • pp.37-51
    • /
    • 2017
  • There has been a growing concern on ontology especially in recent knowledge-based industry and defining a field-customized semantic word network is essential for building it. In this paper, a word network for ontology is established with 785 publications of Korean Society of Rural Planning(KSRP), from 1995 to 2017. Semantic relationships between words in the publications were quantitatively measured with the 'normalized pointwise mutual information' based on the information theory. Appearance and co-appearance frequencies of nouns and adjectives in phrases are analyzed based on the assumption that a 'noun phrase' represents a single 'concept'. The word network of KSRP was compared with that of $WordNet^{TM}$, a world-wide thesaurus network, for the verification. It is proved that the KSRP's word network, established in this paper, provides words' semantic relationships based on the common concepts of Korean rural planning research field. With the results, it is expecting that the established word network can present more opportunity for preparation of the fourth industrial revolution to the field of the Korean rural planning.

Semantic Similarity Measures Between Words within a Document using WordNet (워드넷을 이용한 문서내에서 단어 사이의 의미적 유사도 측정)

  • Kang, SeokHoon;Park, JongMin
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.16 no.11
    • /
    • pp.7718-7728
    • /
    • 2015
  • Semantic similarity between words can be applied in many fields including computational linguistics, artificial intelligence, and information retrieval. In this paper, we present weighted method for measuring a semantic similarity between words in a document. This method uses edge distance and depth of WordNet. The method calculates a semantic similarity between words on the basis of document information. Document information uses word term frequencies(TF) and word concept frequencies(CF). Each word weight value is calculated by TF and CF in the document. The method includes the edge distance between words, the depth of subsumer, and the word weight in the document. We compared out scheme with the other method by experiments. As the result, the proposed method outperforms other similarity measures. In the document, the word weight value is calculated by the proposed method. Other methods which based simple shortest distance or depth had difficult to represent the information or merge informations. This paper considered shortest distance, depth and information of words in the document, and also improved the performance.

Query-based Document Summarization using Pseudo Relevance Feedback based on Semantic Features and WordNet (의미특징과 워드넷 기반의 의사 연관 피드백을 사용한 질의기반 문서요약)

  • Kim, Chul-Won;Park, Sun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.15 no.7
    • /
    • pp.1517-1524
    • /
    • 2011
  • In this paper, a new document summarization method, which uses the semantic features and the pseudo relevance feedback (PRF) by using WordNet, is introduced to extract meaningful sentences relevant to a user query. The proposed method can improve the quality of document summaries because the inherent semantic of the documents are well reflected by the semantic feature from NMF. In addition, it uses the PRF by the semantic features and WordNet to reduce the semantic gap between the high level user's requirement and the low level vector representation. The experimental results demonstrate that the proposed method achieves better performance that the other methods.

Hierarchical Structure in Semantic Networks of Japanese Word Associations

  • Miyake, Maki;Joyce, Terry;Jung, Jae-Young;Akama, Hiroyuki
    • Proceedings of the Korean Society for Language and Information Conference
    • /
    • 2007.11a
    • /
    • pp.321-329
    • /
    • 2007
  • This paper reports on the application of network analysis approaches to investigate the characteristics of graph representations of Japanese word associations. Two semantic networks are constructed from two separate Japanese word association databases. The basic statistical features of the networks indicate that they have scale-free and small-world properties and that they exhibit hierarchical organization. A graph clustering method is also applied to the networks with the objective of generating hierarchical structures within the semantic networks. The method is shown to be an efficient tool for analyzing large-scale structures within corpora. As a utilization of the network clustering results, we briefly introduce two web-based applications: the first is a search system that highlights various possible relations between words according to association type, while the second is to present the hierarchical architecture of a semantic network. The systems realize dynamic representations of network structures based on the relationships between words and concepts.

  • PDF

Expansion of Topic Modeling with Word2Vec and Case Analysis (Word2Vec를 이용한 토픽모델링의 확장 및 분석사례)

  • Yoon, Sang Hun;Kim, Keun Hyung
    • The Journal of Information Systems
    • /
    • v.30 no.1
    • /
    • pp.45-64
    • /
    • 2021
  • Purpose The traditional topic modeling technique makes it difficult to distinguish the semantic of topics because the key words assigned to each topic would be also assigned to other topics. This problem could become severe when the number of online reviews are small. In this paper, the extended model of topic modeling technique that can be used for analyzing a small amount of online reviews is proposed. Design/methodology/approach The extended model of being proposed in this paper is a form that combines the traditional topic modeling technique and the Word2Vec technique. The extended model only allocates main words to the extracted topics, but also generates discriminatory words between topics. In particular, Word2vec technique is applied in the process of extracting related words semantically for each discriminatory word. In the extended model, main words and discriminatory words with similar words semantically are used in the process of semantic classification and naming of extracted topics, so that the semantic classification and naming of topics can be more clearly performed. For case study, online reviews related with Udo in Tripadvisor web site were analyzed by applying the traditional topic modeling and the proposed extension model. In the process of semantic classification and naming of the extracted topics, the traditional topic modeling technique and the extended model were compared. Findings Since the extended model is a concept that utilizes additional information in the existing topic modeling information, it can be confirmed that it is more effective than the existing topic modeling in semantic division between topics and the process of assigning topic names.

Improving The Performance of Triple Generation Based on Distant Supervision By Using Semantic Similarity (의미 유사도를 활용한 Distant Supervision 기반의 트리플 생성 성능 향상)

  • Yoon, Hee-Geun;Choi, Su Jeong;Park, Seong-Bae
    • Journal of KIISE
    • /
    • v.43 no.6
    • /
    • pp.653-661
    • /
    • 2016
  • The existing pattern-based triple generation systems based on distant supervision could be flawed by assumption of distant supervision. For resolving flaw from an excessive assumption, statistics information has been commonly used for measuring confidence of patterns in previous studies. In this study, we proposed a more accurate confidence measure based on semantic similarity between patterns and properties. Unsupervised learning method, word embedding and WordNet-based similarity measures were adopted for learning meaning of words and measuring semantic similarity. For resolving language discordance between patterns and properties, we adopted CCA for aligning bilingual word embedding models and a translation-based approach for a WordNet-based measure. The results of our experiments indicated that the accuracy of triples that are filtered by the semantic similarity-based confidence measure was 16% higher than that of the statistics-based approach. These results suggested that semantic similarity-based confidence measure is more effective than statistics-based approach for generating high quality triples.

Semantic-Based Web Information Filtering Using WordNet (어휘사전 워드넷을 활용한 의미기반 웹 정보필터링)

  • Byeon, Yeong-Tae;Hwang, Sang-Gyu;O, Gyeong-Muk
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.11S
    • /
    • pp.3399-3409
    • /
    • 1999
  • Information filtering for internet search, in which new information retrieval environment is given, is different from traditional methods such as bibliography information filtering, news-group and E-mail filtering. Therefore, we cannot expect high performance from the traditional information filtering models when they are applied to the new environment. To solve this problem, we inspect the characteristics of the new filtering environment, and propose a semantic-based filtering model which includes a new filtering method using WordNet. For extracting keywords from documents, this model uses the SDCC(Semantic Distance for Common Category) algorithm instead of the TF/IDF method usually used by traditional methods. The world sense ambiguation problem, which is one of causes dropping efficiency of internet search, is solved by this method. The semantic-based filtering model can filter web pages selectively with considering a user level and we show in this paper that it is more convenient for users to search information in internet by the proposed method than by traditional filtering methods.

  • PDF

Word Sense Distinction of Middle Verbs for Korean Verb Wordnet (한국어 동사의 어휘의미망 구축을 위한 중립동사의 의미분할)

  • Lee, Eunr-Young;Yoon, Ae-Sun
    • Language and Information
    • /
    • v.9 no.2
    • /
    • pp.23-48
    • /
    • 2005
  • This study aims to discuss the word sense distinction of Korean middle verbs for restructuring KorLexVerb 1.0. Despite the duality of its meaning and syntactic structure, the word senses of middle verb are not clearly distinguished in current dictionaries. The underspecification causes very often mismatches that a same Korean word sense is used for two different English verb senses. A close examination on the syntactic and semantic properties of middle verb shows us that the word sense distinction and the reconstruction of hierarchical structure are indispensable. Finally, by doing this fine grained word sense distinction, we propose an alternative way of classification and description of the verb polysemy for KorLexVerb 1.0 as well as for dictionary-like language resources.

  • PDF