• 제목/요약/키워드: Knowledge graph

검색결과 218건 처리시간 0.022초

Efficient Mining of Frequent Subgraph with Connectivity Constraint

  • Moon, Hyun-S.;Lee, Kwang-H.;Lee, Do-Heon
    • 한국생물정보학회:학술대회논문집
    • /
    • 한국생물정보시스템생물학회 2005년도 BIOINFO 2005
    • /
    • pp.267-271
    • /
    • 2005
  • The goal of data mining is to extract new and useful knowledge from large scale datasets. As the amount of available data grows explosively, it became vitally important to develop faster data mining algorithms for various types of data. Recently, an interest in developing data mining algorithms that operate on graphs has been increased. Especially, mining frequent patterns from structured data such as graphs has been concerned by many research groups. A graph is a highly adaptable representation scheme that used in many domains including chemistry, bioinformatics and physics. For example, the chemical structure of a given substance can be modelled by an undirected labelled graph in which each node corresponds to an atom and each edge corresponds to a chemical bond between atoms. Internet can also be modelled as a directed graph in which each node corresponds to an web site and each edge corresponds to a hypertext link between web sites. Notably in bioinformatics area, various kinds of newly discovered data such as gene regulation networks or protein interaction networks could be modelled as graphs. There have been a number of attempts to find useful knowledge from these graph structured data. One of the most powerful analysis tool for graph structured data is frequent subgraph analysis. Recurring patterns in graph data can provide incomparable insights into that graph data. However, to find recurring subgraphs is extremely expensive in computational side. At the core of the problem, there are two computationally challenging problems. 1) Subgraph isomorphism and 2) Enumeration of subgraphs. Problems related to the former are subgraph isomorphism problem (Is graph A contains graph B?) and graph isomorphism problem(Are two graphs A and B the same or not?). Even these simplified versions of the subgraph mining problem are known to be NP-complete or Polymorphism-complete and no polynomial time algorithm has been existed so far. The later is also a difficult problem. We should generate all of 2$^n$ subgraphs if there is no constraint where n is the number of vertices of the input graph. In order to find frequent subgraphs from larger graph database, it is essential to give appropriate constraint to the subgraphs to find. Most of the current approaches are focus on the frequencies of a subgraph: the higher the frequency of a graph is, the more attentions should be given to that graph. Recently, several algorithms which use level by level approaches to find frequent subgraphs have been developed. Some of the recently emerging applications suggest that other constraints such as connectivity also could be useful in mining subgraphs : more strongly connected parts of a graph are more informative. If we restrict the set of subgraphs to mine to more strongly connected parts, its computational complexity could be decreased significantly. In this paper, we present an efficient algorithm to mine frequent subgraphs that are more strongly connected. Experimental study shows that the algorithm is scaling to larger graphs which have more than ten thousand vertices.

  • PDF

EdgeCPS 플랫폼을 위한 지식 공유 그래프를 활용한 컴포넌트 기반 AI 응용 지원 시스템 (Component-based AI Application Support System using Knowledge Sharing Graph for EdgeCPS Platform)

  • 김영주
    • 한국정보통신학회논문지
    • /
    • 제26권8호
    • /
    • pp.1103-1110
    • /
    • 2022
  • AI 관련 산업의 급속한 발전으로 인해 무수히 많은 엣지 디바이스가 실세계에서 동작되고 있고, 이들 디바이스로 구성된 스마트 공간에서 발생하는 데이터가 상상을 초월함으로, 엣지 디비이스가 처리하는 것이 점점 어려워지고 있다. 이러한 문제를 해결하기 위해서 EdgeCPS 기술이 등장하게 되었다. EdgeCPS는 엣지 디바이스와 엣지 서버간 연동과 자원 증강 및 기능 증강을 통하여 AI 응용 서비스를 포함한 다양한 응용 서비스의 원활한 수행을 지원하기 위한 기술이다. 따라서, 본 논문에서는 EdgeCPS 플랫폼에 적용 가능한 지식 공유 그래프 기반의 컴포넌트화된 AI 응용 지원 시스템을 제안한다. 지식 공유 그래프는 AI 응용 작성에 필수적인 요소인 학습데이터, 학습된모델, 학습알고리즘, 디바이스 등에 대한 정보를 효과적으로 저장할 수 있도록 설계된다. 그리고 EdgeCPS 플랫폼의 지원 하에서 자원증강 및 기능증강을 손쉽게 변경할 수 있도록 AI 응용이 컴포넌트화 되어 동작한다. AI 응용 지원 시스템은 사용자가 손쉽게 응용을 작성할 수 있고 테스트 해 볼 수 있도록 지식 공유 그래프와 연동되고, 응용에 대한 파이프라인을 통해서 응용의 실행 양상을 사용자에게 시각화를 해 준다.

정량 추론과 정성 추론의 통합 메카니즘 : 주가예측의 적용 (A Mechanism for Combining Quantitative and Qualitative Reasoning)

  • 김명종
    • 지식경영연구
    • /
    • 제10권2호
    • /
    • pp.35-48
    • /
    • 2009
  • The paper proposes a quantitative causal ordering map (QCOM) to combine qualitative and quantitative methods in a framework. The procedures for developing QCOM consist of three phases. The first phase is to collect partially known causal dependencies from experts and to convert them into relations and causal nodes of a model graph. The second phase is to find the global causal structure by tracing causality among relation and causal nodes and to represent it in causal ordering graph with signed coefficient. Causal ordering graph is converted into QCOM by assigning regression coefficient estimated from path analysis in the third phase. Experiments with the prediction model of Korea stock price show results as following; First, the QCOM can support the design of qualitative and quantitative model by finding the global causal structure from partially known causal dependencies. Second, the QCOM can be used as an integration tool of qualitative and quantitative model to offerhigher explanatory capability and quantitative measurability. The QCOM with static and dynamic analysis is applied to investigate the changes in factors involved in the model at present as well discrete times in the future.

  • PDF

일본군 '위안부' 지식그래프: 파편화된 디지털 기록의 연결 (A Knowledge Graph on Japanese "Comfort Women": Interlinking Fragmented Digital Archival Resources)

  • 박하람;김학래
    • 한국기록관리학회지
    • /
    • 제21권3호
    • /
    • pp.61-78
    • /
    • 2021
  • 일본군 '위안부'에 대한 기록은 민간 기관에서 개별적으로 관리하고 있다. 일부 기록은 디지털 아카이브로 구축되어 온라인으로 접근할 수 있다. 그러나, 디지털 아카이브의 기록은 기관에 따라 메타데이터의 구성과 표현 방식이 다르다. 한편, 기록 사이의 관계를 정의할 수 있는 체계가 미흡하기 때문에, 현재 구축된 일본군 '위안부' 기록은 서로 연결되지 않고 파편적인 형식으로 남아있다. 본 연구는 일본군 '위안부' 디지털 기록을 연계하기 위한 지식 모델을 제안하고, 분산화된 디지털 아카이브의 기록을 통합하여 일본군 '위안부' 지식그래프를 구축한다. 일본군 '위안부' 디지털 아카이브의 메타데이터를 분석하여 공통 요소를 도출하고, 표준 어휘를 적용하여 디지털 기록의 다양한 개체와 개체 사이의 관계를 의미적으로 표현한다. 특히, 흩어져 있는 기록을 연계하고 검색하기 위해 수집한 데이터의 정제가 이루어지고, 외부데이터를 활용하여 기록의 맥락 정보를 강화하고 있다. 구축된 지식그래프의 검증은 분산된 기록의 탐색 여부를 측정하는 질의를 통해 수행된다. 검증 결과, 지식그래프는 흩어져 있는 기록을 연계하여 검색할 수 있고, 외부데이터로부터의 강화로 기록의 맥락 정보를 풍부하게 제공하며, 의미 기반의 검색을 통해 사용자의 의도에 맞춘 정확한 검색이 가능하다.

한국어 어휘 의미망(alias. KorLex)의 지식 그래프 임베딩을 이용한 문맥의존 철자오류 교정 기법의 성능 향상 (Performance Improvement of Context-Sensitive Spelling Error Correction Techniques using Knowledge Graph Embedding of Korean WordNet (alias. KorLex))

  • 이정훈;조상현;권혁철
    • 한국멀티미디어학회논문지
    • /
    • 제25권3호
    • /
    • pp.493-501
    • /
    • 2022
  • This paper is a study on context-sensitive spelling error correction and uses the Korean WordNet (KorLex)[1] that defines the relationship between words as a graph to improve the performance of the correction[2] based on the vector information of the word embedded in the correction technique. The Korean WordNet replaced WordNet[3] developed at Princeton University in the United States and was additionally constructed for Korean. In order to learn a semantic network in graph form or to use it for learned vector information, it is necessary to transform it into a vector form by embedding learning. For transformation, we list the nodes (limited number) in a line format like a sentence in a graph in the form of a network before the training input. One of the learning techniques that use this strategy is Deepwalk[4]. DeepWalk is used to learn graphs between words in the Korean WordNet. The graph embedding information is used in concatenation with the word vector information of the learned language model for correction, and the final correction word is determined by the cosine distance value between the vectors. In this paper, In order to test whether the information of graph embedding affects the improvement of the performance of context- sensitive spelling error correction, a confused word pair was constructed and tested from the perspective of Word Sense Disambiguation(WSD). In the experimental results, the average correction performance of all confused word pairs was improved by 2.24% compared to the baseline correction performance.

지식베이스를 이용한 작업자 증상 기반 화학물질 추정 시스템 설계 (Worker Symptom-based Chemical Substance Estimation System Design Using Knowledge Base)

  • 주용택;이동훈;신은지;유상우;신동일
    • 한국가스학회지
    • /
    • 제25권3호
    • /
    • pp.9-15
    • /
    • 2021
  • 본 논문에서는 산업현장 화학물질 인체 접촉 증상 기반 지식베이스 구축 및 화학물질 추정 시스템 설계에 대한 연구이다. 미국NIH에서 제공하는 WISER 프로그램의 499개의 화학물질 접촉 증상 정보로 활용하였다. 지식베이스 구축을 위해 AllegroGraph 7.0.1 프로그램을 이용하였으며 입력된 Chemical structure로 Triple 값인 Cas No., Synonyms, Symptom, SMILES, InChl, Formula를 사용 하였다. 또한 작업자의 증상을 안내하는 방법은 AI 스피커를 활용한 방식이 가능하며 지식베이스 구축 결과 암모니아(CAS No: 7664-41-7)를 기준으로 39개의 증상이 WISER 프로그램과 동일함을 확인 하였다. 이를 통해 화학물질 추정 시스템의 증상 추출 과정에 지식베이스 구축이 가능하였다.

A DoS Detection Method Based on Composition Self-Similarity

  • Jian-Qi, Zhu;Feng, Fu;Kim, Chong-Kwon;Ke-Xin, Yin;Yan-Heng, Liu
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제6권5호
    • /
    • pp.1463-1478
    • /
    • 2012
  • Based on the theory of local-world network, the composition self-similarity (CSS) of network traffic is presented for the first time in this paper for the study of DoS detection. We propose the concept of composition distribution graph and design the relative operations. The $(R/S)^d$ algorithm is designed for calculating the Hurst parameter. Based on composition distribution graph and Kullback Leibler (KL) divergence, we propose the composition self-similarity anomaly detection (CSSD) method for the detection of DoS attacks. We evaluate the effectiveness of the proposed method. Compared to other entropy based anomaly detection methods, our method is more accurate and with higher sensitivity in the detection of DoS attacks.

A Study on a Distributed Data Fabric-based Platform in a Multi-Cloud Environment

  • Moon, Seok-Jae;Kang, Seong-Beom;Park, Byung-Joon
    • International Journal of Advanced Culture Technology
    • /
    • 제9권3호
    • /
    • pp.321-326
    • /
    • 2021
  • In a multi-cloud environment, it is necessary to minimize physical movement for efficient interoperability of distributed source data without building a data warehouse or data lake. And there is a need for a data platform that can easily access data anywhere in a multi-cloud environment. In this paper, we propose a new platform based on data fabric centered on a distributed platform suitable for cloud environments that overcomes the limitations of legacy systems. This platform applies the knowledge graph database technique to the physical linkage of source data for interoperability of distributed data. And by integrating all data into one scalable platform in a multi-cloud environment, it uses the holochain technique so that companies can easily access and move data with security and authority guaranteed regardless of where the data is stored. The knowledge graph database mitigates the problem of heterogeneous conflicts of data interoperability in a decentralized environment, and Holochain accelerates the memory and security processing process on traditional blockchains. In this way, data access and sharing of more distributed data interoperability becomes flexible, and metadata matching flexibility is effectively handled.

Knowledge Representation Using Decision Trees Constructed Based on Binary Splits

  • Azad, Mohammad
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제14권10호
    • /
    • pp.4007-4024
    • /
    • 2020
  • It is tremendously important to construct decision trees to use as a tool for knowledge representation from a given decision table. However, the usual algorithms may split the decision table based on each value, which is not efficient for numerical attributes. The methodology of this paper is to split the given decision table into binary groups as like the CART algorithm, that uses binary split to work for both categorical and numerical attributes. The difference is that it uses split for each attribute established by the directed acyclic graph in a dynamic programming fashion whereas, the CART uses binary split among all considered attributes in a greedy fashion. The aim of this paper is to study the effect of binary splits in comparison with each value splits when building the decision trees. Such effect can be studied by comparing the number of nodes, local and global misclassification rate among the constructed decision trees based on three proposed algorithms.

Development of the Rule-based Smart Tourism Chatbot using Neo4J graph database

  • Kim, Dong-Hyun;Im, Hyeon-Su;Hyeon, Jong-Heon;Jwa, Jeong-Woo
    • International Journal of Internet, Broadcasting and Communication
    • /
    • 제13권2호
    • /
    • pp.179-186
    • /
    • 2021
  • We have been developed the smart tourism app and the Instagram and YouTube contents to provide personalized tourism information and travel product information to individual tourists. In this paper, we develop a rule-based smart tourism chatbot with the khaiii (Kakao Hangul Analyzer III) morphological analyzer and Neo4J graph database. In the proposed chatbot system, we use a morpheme analyzer, a proper noun dictionary including tourist destination names, and a general noun dictionary including containing frequently used words in tourist information search to understand the intention of the user's question. The tourism knowledge base built using the Neo4J graph database provides adequate answers to tourists' questions. In this paper, the nodes of Neo4J are Area based on tourist destination address, Contents with property of tourist information, and Service including service attribute data frequently used for search. A Neo4J query is created based on the result of analyzing the intention of a tourist's question with the property of nodes and relationships in Neo4J database. An answer to the question is made by searching in the tourism knowledge base. In this paper, we create the tourism knowledge base using more than 1300 Jeju tourism information used in the smart tourism app. We plan to develop a multilingual smart tour chatbot using the named entity recognition (NER), intention classification using conditional random field(CRF), and transfer learning using the pretrained language models.