• Title/Summary/Keyword: word semantic information

Search Result 306, Processing Time 0.026 seconds

Network Analysis between Uncertainty Words based on Word2Vec and WordNet (Word2Vec과 WordNet 기반 불확실성 단어 간의 네트워크 분석에 관한 연구)

  • Heo, Go Eun
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.53 no.3
    • /
    • pp.247-271
    • /
    • 2019
  • Uncertainty in scientific knowledge means an uncertain state where propositions are neither true or false at present. The existing studies have analyzed the propositions written in the academic literature, and have conducted the performance evaluation based on the rule based and machine learning based approaches by using the corpus. Although they recognized that the importance of word construction, there are insufficient attempts to expand the word by analyzing the meaning of uncertainty words. On the other hand, studies for analyzing the structure of networks by using bibliometrics and text mining techniques are widely used as methods for understanding intellectual structure and relationship in various disciplines. Therefore, in this study, semantic relations were analyzed by applying Word2Vec to existing uncertainty words. In addition, WordNet, which is an English vocabulary database and thesaurus, was applied to perform a network analysis based on hypernyms, hyponyms, and synonyms relations linked to uncertainty words. The semantic and lexical relationships of uncertainty words were structurally identified. As a result, we identified the possibility of automatically expanding uncertainty words.

Effect of orthographic, phonological and semantic information on the processes of Korean heteronym (동철이음어 처리 과정에서 형태와 의미 정보의 영향)

  • Kim, Tae Hoon;Cho, Jeung-Ryeul;Lee, Yoonhyoung
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.16 no.6
    • /
    • pp.3819-3828
    • /
    • 2015
  • The present study discusses some of important issues in the word recognition such as the roles of the form(orthographic & phonologic) and semantic information by investigating the processes of Korean heteronym. The priming paradigm has been applied to see whether or not there would be facilitatory effect from form and/or semantic information. In experiment 1, orthographically-related or phonologically-related prime stimuli were presented and a lexical decision task for Korean heteronym was conducted. The same procedure was applied for the experiment 2, except the prime stimulus which was semantically-related. The results showed that orthographic and phonologic information did not influence the processing of the heteronym while semantic information facilitated its processing, suggesting that the semantic information plays an important role in the processes of the Korean heteronym.

Semantic-Based K-Means Clustering for Microblogs Exploiting Folksonomy

  • Heu, Jee-Uk
    • Journal of Information Processing Systems
    • /
    • v.14 no.6
    • /
    • pp.1438-1444
    • /
    • 2018
  • Recently, with the development of Internet technologies and propagation of smart devices, use of microblogs such as Facebook, Twitter, and Instagram has been rapidly increasing. Many users check for new information on microblogs because the content on their timelines is continually updating. Therefore, clustering algorithms are necessary to arrange the content of microblogs by grouping them for a user who wants to get the newest information. However, microblogs have word limits, and it has there is not enough information to analyze for content clustering. In this paper, we propose a semantic-based K-means clustering algorithm that not only measures the similarity between the data represented as a vector space model, but also measures the semantic similarity between the data by exploiting the TagCluster for clustering. Through the experimental results on the RepLab2013 Twitter dataset, we show the effectiveness of the semantic-based K-means clustering algorithm.

Web Image Retrieval using Prior Tags based on WordNet Semantic Information (워드넷 의미정보로 선별된 우선 태그와 이를 이용한 웹 이미지의 검색)

  • Kweon, Dae-Hyeon;Hong, Jun-Hyeok;Cho, Soo-Sun
    • Journal of Korea Multimedia Society
    • /
    • v.12 no.7
    • /
    • pp.1032-1042
    • /
    • 2009
  • This research is for early extraction and utilization of semantic information from the tags in tagged Web image retrieval. Generally, users attach a tag to a Web image with little thought of the order, up to over 100 ones. In this paper, we suggest a method of selecting prior tags based on their importance when tagged images are uploaded, and using them in image retrieval. Ideas came from the recognition of the important tags which give a better description of the image as the tags sharing more semantic information with other tags of the same image. This method includes calculation of relation scores between tags based on WordNet and multilevel search of tagged images with the scores. For evaluation, we compared the suggested method and other retrieval methods searching images with simple matching of tags to a given keyword. As the results, we found the superiority of our method in precision and recall rate.

  • PDF

Development of a Deep Learning Model for Detecting Fake Reviews Using Author Linguistic Features (작성자 언어적 특성 기반 가짜 리뷰 탐지 딥러닝 모델 개발)

  • Shin, Dong Hoon;Shin, Woo Sik;Kim, Hee Woong
    • The Journal of Information Systems
    • /
    • v.31 no.4
    • /
    • pp.01-23
    • /
    • 2022
  • Purpose This study aims to propose a deep learning-based fake review detection model by combining authors' linguistic features and semantic information of reviews. Design/methodology/approach This study used 358,071 review data of Yelp to develop fake review detection model. We employed linguistic inquiry and word count (LIWC) to extract 24 linguistic features of authors. Then we used deep learning architectures such as multilayer perceptron(MLP), long short-term memory(LSTM) and transformer to learn linguistic features and semantic features for fake review detection. Findings The results of our study show that detection models using both linguistic and semantic features outperformed other models using single type of features. In addition, this study confirmed that differences in linguistic features between fake reviewer and authentic reviewer are significant. That is, we found that linguistic features complement semantic information of reviews and further enhance predictive power of fake detection model.

Automatic Construction of Korean Noun Semantic-Marker using WordNet (WordNet을 이용한 한국어 명사 의미지표 자동 구축)

  • 이지선;전현경;김남수;이용석
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2000.04b
    • /
    • pp.333-335
    • /
    • 2000
  • 컴퓨터는 자연언어로 된 문장을 올바르게 이해하기 위해 의미지식을 필요로 하며 이러한 의미지식을 정확하게 구축하기 위해서는 수작업을 필요로 한다. 그러나 수작업에 의한 의미지식 구축은 많은 비용과 시간을 필요로 하고, 작성자의 주관이 개입되며, 응용 도메인에 따라 의미지표 테이블이 수정되면 의미지표 사전의 재구축이 불가피하다. 이러한 문제점을 해결하기 위해 본 논문에서는 영어 WordNet과 한영 사전을 이용한 한국어 명사 의미지표 사전의 자동 구축 방법을 제안한다.

  • PDF

Constructing the Semantic Information Model using A Collective Intelligence Approach

  • Lyu, Ki-Gon;Lee, Jung-Yong;Sun, Dong-Eon;Kwon, Dai-Young;Kim, Hyeon-Cheol
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.5 no.10
    • /
    • pp.1698-1711
    • /
    • 2011
  • Knowledge is often represented as a set of rules or a semantic network in intelligent systems. Recently, ontology has been widely used to represent semantic knowledge, because it organizes thesaurus and hierarchal information between concepts in a particular domain. However, it is not easy to collect semantic relationships among concepts. Much time and expense are incurred in ontology construction. Collective intelligence can be a good alternative approach to solve these problems. In this paper, we propose a collective intelligence approach of Games With A Purpose (GWAP) to collect various semantic resources, such as words and word-senses. We detail how to construct the semantic information model or ontology from the collected semantic resources, constructing a system named FunWords. FunWords is a Korean lexical-based semantic resource collection tool. Experiments demonstrated the resources were grouped as common nouns, abstract nouns, adjective and neologism. Finally, we analyzed their characteristics, acquiring the semantic relationships noted above. Common nouns, with structural semantic relationships, such as hypernym and hyponym, are highlighted. Abstract nouns, with descriptive and characteristic semantic relationships, such as synonym and antonym are underlined. Adjectives, with such semantic relationships, as description and status, illustration - for example, color and sound - are expressed more. Last, neologism, with the semantic relationships, such as description and characteristics, are emphasized. Weighting the semantic relationships with these characteristics can help reduce time and cost, because it need not consider unnecessary or slightly related factors. This can improve the expressive power, such as readability, concentrating on the weighted characteristics. Our proposal to collect semantic resources from the collective intelligence approach of GWAP (our FunWords) and to weight their semantic relationship can help construct the semantic information model or ontology would be a more effective and expressive alternative.

An Approach to Semantic Mapping using Product Ontology for CPC Environment (CPC 환경을 위한 Product 온톨로지 기반 의미 공유 접근법)

  • Kim K.-Y.;Suh H.-W.
    • Korean Journal of Computational Design and Engineering
    • /
    • v.9 no.3
    • /
    • pp.192-202
    • /
    • 2004
  • This paper introduces an approach to semantic mapping using Product ontology for CPC environment. In CPC environment, it is necessary that the participants in a product life cycle should share the same understanding about the semantic of product terms. For example, they should know that although 'COMPONENT' and 'ITEM' are different word-expressions, they could have the same meaning. In order to handle such terms in the information system, it is desirable that the system automatically recognizes that the terms have the same semantics. Serving this purpose, we described an ontology design methodology using first order logic, knowledge interchange format, and knowledge engineering process. In our approach, we investigated domain knowledge of the Bill Of Material, and then designed Product ontology of it. Based on the ontology, we described syntactic translation, semantic translation, and semantic mapping procedure with an example.

Semantic Similarity-Based Contributable Task Identification for New Participating Developers

  • Kim, Jungil;Choi, Geunho;Lee, Eunjoo
    • Journal of information and communication convergence engineering
    • /
    • v.16 no.4
    • /
    • pp.228-234
    • /
    • 2018
  • In software development, the quality of a product often depends on whether its developers can rapidly find and contribute to the proper tasks. Currently, the word data of projects to which newcomers have previously contributed are mainly utilized to find appropriate source files in an ongoing project. However, because of the vocabulary gap between software projects, the accuracy of source file identification based on information retrieval is not guaranteed. In this paper, we propose a novel source file identification method to reduce the vocabulary gap between software projects. The proposed method employs DBPedia Spotlight to identify proper source files based on semantic similarity between source files of software projects. In an experiment based on the Spring Framework project, we evaluate the accuracy of the proposed method in the identification of contributable source files. The experimental results show that the proposed approach can achieve better accuracy than the existing method based on comparison of word vocabularies.

Extraction of ObjectProperty-UsageMethod Relation from Web Documents

  • Pechsiri, Chaveevan;Phainoun, Sumran;Piriyakul, Rapeepun
    • Journal of Information Processing Systems
    • /
    • v.13 no.5
    • /
    • pp.1103-1125
    • /
    • 2017
  • This paper aims to extract an ObjectProperty-UsageMethod relation, in particular the HerbalMedicinalProperty-UsageMethod relation of the herb-plant object, as a semantic relation between two related sets, a herbal-medicinal-property concept set and a usage-method concept set from several web documents. This HerbalMedicinalProperty-UsageMethod relation benefits people by providing an alternative treatment/solution knowledge to health problems. The research includes three main problems: how to determine EDU (where EDU is an elementary discourse unit or a simple sentence/clause) with a medicinal-property/usage-method concept; how to determine the usage-method boundary; and how to determine the HerbalMedicinalProperty-UsageMethod relation between the two related sets. We propose using N-Word-Co on the verb phrase with the medicinal-property/usage-method concept to solve the first and second problems where the N-Word-Co size is determined by the learning of maximum entropy, support vector machine, and naïve Bayes. We also apply naïve Bayes to solve the third problem of determining the HerbalMedicinalProperty-UsageMethod relation with N-Word-Co elements as features. The research results can provide high precision in the HerbalMedicinalProperty-UsageMethod relation extraction.