• Title/Summary/Keyword: Knowledge graph database

Search Result 20, Processing Time 0.022 seconds

Efficient Mining of Frequent Subgraph with Connectivity Constraint

  • Moon, Hyun-S.;Lee, Kwang-H.;Lee, Do-Heon
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2005.09a
    • /
    • pp.267-271
    • /
    • 2005
  • The goal of data mining is to extract new and useful knowledge from large scale datasets. As the amount of available data grows explosively, it became vitally important to develop faster data mining algorithms for various types of data. Recently, an interest in developing data mining algorithms that operate on graphs has been increased. Especially, mining frequent patterns from structured data such as graphs has been concerned by many research groups. A graph is a highly adaptable representation scheme that used in many domains including chemistry, bioinformatics and physics. For example, the chemical structure of a given substance can be modelled by an undirected labelled graph in which each node corresponds to an atom and each edge corresponds to a chemical bond between atoms. Internet can also be modelled as a directed graph in which each node corresponds to an web site and each edge corresponds to a hypertext link between web sites. Notably in bioinformatics area, various kinds of newly discovered data such as gene regulation networks or protein interaction networks could be modelled as graphs. There have been a number of attempts to find useful knowledge from these graph structured data. One of the most powerful analysis tool for graph structured data is frequent subgraph analysis. Recurring patterns in graph data can provide incomparable insights into that graph data. However, to find recurring subgraphs is extremely expensive in computational side. At the core of the problem, there are two computationally challenging problems. 1) Subgraph isomorphism and 2) Enumeration of subgraphs. Problems related to the former are subgraph isomorphism problem (Is graph A contains graph B?) and graph isomorphism problem(Are two graphs A and B the same or not?). Even these simplified versions of the subgraph mining problem are known to be NP-complete or Polymorphism-complete and no polynomial time algorithm has been existed so far. The later is also a difficult problem. We should generate all of 2$^n$ subgraphs if there is no constraint where n is the number of vertices of the input graph. In order to find frequent subgraphs from larger graph database, it is essential to give appropriate constraint to the subgraphs to find. Most of the current approaches are focus on the frequencies of a subgraph: the higher the frequency of a graph is, the more attentions should be given to that graph. Recently, several algorithms which use level by level approaches to find frequent subgraphs have been developed. Some of the recently emerging applications suggest that other constraints such as connectivity also could be useful in mining subgraphs : more strongly connected parts of a graph are more informative. If we restrict the set of subgraphs to mine to more strongly connected parts, its computational complexity could be decreased significantly. In this paper, we present an efficient algorithm to mine frequent subgraphs that are more strongly connected. Experimental study shows that the algorithm is scaling to larger graphs which have more than ten thousand vertices.

  • PDF

The Design of Operation and Control Solution with Intelligent Inference Capability for IED based Digital Switchgear Panel (IED를 기반으로 하는 디지털 수배전반의 지적추론기반 운전제어 솔루션 설계)

  • Ko, Yun-Seok
    • The Transactions of the Korean Institute of Electrical Engineers A
    • /
    • v.55 no.9
    • /
    • pp.351-358
    • /
    • 2006
  • In this paper, DSPOCS(Digital Switchgear-Panel Operation and Control Solution) is designed, which is the intelligent inference based operation and control solution to obtain the safety and reliability of electric power supply in substation based on IED. DSPOCS is designed as a scheduled monitoring and control task and a real-time alarm inference task, and is interlinked with BRES(Bus Reconfiguration Expert System) in the required case. The intelligent alarm inference task consists of the alarm knowledge generation part and the real-time pattern matching part. The alarm knowledge generation part generates automatically alarm knowledge from DB saves it in alarm knowledge base. On the other hand, the pattern matching part inferences the real-time event by comparing the real-time event information furnished from IEDs of substation with the patterns of the saved alarm knowledge base.; Especially, alarm knowledge base includes the knowledge patterns related with fault alarm, the overload alarm and the diagnosis alarm. In order to design the database independently in substation structure, busbar is represented as a connectivity node which makes the more generalized graph theory possible. Finally, DSPOCS is implemented in MS Visual $C^{++}$, MFC, the effectiveness and accuracy of the design is verified by simulation study to the typical distribution substation.

An Algorithm for Pattern Classification of ECG Signals Using Frame Knowledge Representation Technique (게임 지식 표현 기법을 이용한 심전도 신호의 패턴해석 알고리즘에 관한 연구)

  • 신건수;이병채;정희교;이명호
    • The Transactions of the Korean Institute of Electrical Engineers
    • /
    • v.41 no.4
    • /
    • pp.433-441
    • /
    • 1992
  • This paper describes an algorithm that can efficiently analyze the ECG signal using frame knowledge representation technique. Input to the analysis process is a set of significant points which have been extracted from an original sampled signal(lead II) by the syntactic peak recognition algorithm. The hierarchical property of ECG signal is represented by hierarchical AND/OR graph. The semantic information and constraints of the ECG signal are desctibed by frame. As the control mechanism for labeling points, the search mechanism with the mixed paradigms of data-driven and model driven hypothesis formation, scoring function, hypothesis modification network and instance inheritance are used. We used the CSE database in order to evaluate the performance of the proposed algorithm.

Automatic Construction of SHACL Schemas for RDF Knowledge Graphs Generated by Direct Mappings

  • Choi, Ji-Woong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.10
    • /
    • pp.23-34
    • /
    • 2020
  • In this paper, we proposes a method to automatically construct SHACL schemas for RDF knowledge graphs(KGs) generated by Direct Mapping(DM). DM and SHACL are all W3C recommendations. DM consists of rules to transform the data in an RDB into an RDF graph. SHACL is a language to describe and validate the structure of RDF graphs. The proposed method automatically translates the integrity constraints as well as the structure information in an RDB schema into SHACL. Thus, our SHACL schemas are able to check integrity instead of RDBMSs. This is a consideration to assure database consistency even when RDBs are served as virtual RDF KGs. We tested our results on 24 DM test cases, published by W3C. It was shown that they are effective in describing and validating RDF KGs.

Applying A Matrix-Based Inference Algorithm to Electronic Commerce

  • Lee, Kun-Chang;Cho, Hyung-Rae
    • Proceedings of the Korea Database Society Conference
    • /
    • 1999.06a
    • /
    • pp.353-359
    • /
    • 1999
  • We present a matrix-based inference algorithm suitable for electronic commerce applications. For this purpose, an Extended AND-OR Graph (EAOG) was developed with the intention that fast inference process is enabled within the electronic commerce situations. The proposed EAOG inference mechanism has the following three characteristics. 1. Real-time inference: The EAOG inference mechanism is suitable for the real-time inference because its computational mechanism is based on matrix computation. 2. Matrix operation: All the subjective knowledge is delineated in a matrix form. so that inference process can proceed based on the matrix operation which is computationally efficient. 3. Bi-directional inference: Traditional inference method of expert systems is based on either forward chaining or backward chaining which is mutually exclusive in terms of logical process and computational efficiency. However, the proposed EAOG inference mechanism is generically bi-directional without loss of both speed and efficiency. We have proved the validity of our approach with several propositions and an illustrative EC example.

  • PDF

A Dynamic exploration of Constructivism Research based on Citespace Software in the Filed of Education (교육학 분야에서 CiteSpace에 기초한 구성주의 연구 동향 탐색)

  • Jiang, Yuxin;Song, Sun-Hee
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.5
    • /
    • pp.576-584
    • /
    • 2022
  • As an important branch of cognitive psychology, "constructivism" is called a "revolution" in contemporary educational psychology, which has a profound influence on the field of pedagogy and psychology. Based on "WOS" database, this study selects "WOS Core database" and "KCI database", uses CiteSpace visualization software as analysis tool, and makes knowledge map analysis on the research literature of "constructivism" in the field of education in recent 35 years. Analysis directions include annual analysis, network connection analysis by country(region) branch, author, institution or University, and keyword analysis. The purpose of the analysis is to grasp the subject areas, research hotspots and future trends of the research on constructivism, and to provide theoretical reference for the research on constructivism. There are three conclusions from the study. 1. Studies on the subject of constructivism have continued from the 1980s to the present. It is now in a period of steady development. 2. Countries concerned with the subject of constructivism mainly include the United States, Canada, Britain, Australia and the Netherlands. The main research institutions and authors are mainly located in these countries. 3. Currently, the keywords constructivism research focus on the clusters of "instructional strategies", and the development of science and technology is affecting individual learning. In the future, instructional strategies will become the focus of structural constructivism research. With the development of instructional technology, it is necessary to conduct research related to the development of new teaching models.

Development of Tourism Information Named Entity Recognition Datasets for the Fine-tune KoBERT-CRF Model

  • Jwa, Myeong-Cheol;Jwa, Jeong-Woo
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.14 no.2
    • /
    • pp.55-62
    • /
    • 2022
  • A smart tourism chatbot is needed as a user interface to efficiently provide smart tourism services such as recommended travel products, tourist information, my travel itinerary, and tour guide service to tourists. We have been developed a smart tourism app and a smart tourism information system that provide smart tourism services to tourists. We also developed a smart tourism chatbot service consisting of khaiii morpheme analyzer, rule-based intention classification, and tourism information knowledge base using Neo4j graph database. In this paper, we develop the Korean and English smart tourism Name Entity (NE) datasets required for the development of the NER model using the pre-trained language models (PLMs) for the smart tourism chatbot system. We create the tourism information NER datasets by collecting source data through smart tourism app, visitJeju web of Jeju Tourism Organization (JTO), and web search, and preprocessing it using Korean and English tourism information Name Entity dictionaries. We perform training on the KoBERT-CRF NER model using the developed Korean and English tourism information NER datasets. The weight-averaged precision, recall, and f1 scores are 0.94, 0.92 and 0.94 on Korean and English tourism information NER datasets.

Gene annotation by the "interactome"analysis in KEGG

  • Kanehisa, Minoru
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2000.11a
    • /
    • pp.56-58
    • /
    • 2000
  • Post-genomics may be defined in different ways depending on how one views the challenges after the genome. A popular view is to follow the concept of the central dogma in molecular biology, namely from genome to transcriptome to proteome. Projects are going on to analyze gene expression profiles both at the mRNA and protein levels and to catalog protein 3D structure families, which will no doubt help the understanding of information in the genome. However complete, such catalogs of genes, RNAs, and proteins only tell us about the building blocks of life. They do not tell us much about the wiring (interaction) of building blocks, which is essential for uncovering systemic functional behaviors of the cell or the organism. Thus, an alternative view of post-genomics is to go up from the molecular level to the cellular level, and to understand, what I call, the "interactome"or a complete picture of molecular interactions in the cell. KEGG (http://www.genome.ad.jp/kegg/) is our attempt to computerize current knowledge on various cellular processes as a collection of "generalized"protein-protein interaction networks, to develop new graph-based algorithms for predicting such networks from the genome information, and to actually reconstruct the interactomes for all the completely sequenced genomes and some partial genomes. During the reconstruction process, it becomes readily apparent that certain pathways and molecular complexes are present or absent in each organism, indicating modular structures of the interactome. In addition, the reconstruction uncovers missing components in an otherwise complete pathway or complex, which may result from misannotation of the genome or misrepresentation of the KEGG pathway. When combined with additional experimental data on protein-protein interactions, such as by yeast two-hybrid systems, the reconstruction possibly uncovers unknown partners for a particular pathway or complex. Thus, the reconstruction is tightly coupled with the annotation of individual genes, which is maintained in the GENES database in KEGG. We are also trying to expand our literature surrey to include in the GENES database most up-to-date information about gene functions.

  • PDF

An Efficient Expert Discrimination Scheme Based on Academic Documents (학술 문헌 기반 효율적인 전문가 판별 기법)

  • Choi, Do-Jin;Oh, Young-Ho;Pyun, Do-Woong;Bang, Min-Ju;Jeon, Jong-Woo;Lee, Hyeon-Byeong;Park, Deukbae;Lim, Jong-Tae;Bok, Kyoung-Soo;Yoo, Hyo-Keun;Yoo, Jae-Soo
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.12
    • /
    • pp.1-12
    • /
    • 2021
  • An objective expert discrimination scheme is needed for finding researchers who have insight and knowledge about a particular field of research. There are two types of expert discrimination schemes such as a citation graph based method and a formula based method. In this paper, we propose an efficient expert discrimination scheme considering various characteristics that have not been considered in the existing formula based methods. In order to discriminate the expertise of researchers, we present six expertise indices such as quality, productivity, contributiveness, recentness, accuracy, and durability. We also consider the number of social citations to apply the characteristics of academic search sites. Finally, we conduct various experiments to prove the validity and feasibility of the proposed scheme.

A News Video Mining based on Multi-modal Approach and Text Mining (멀티모달 방법론과 텍스트 마이닝 기반의 뉴스 비디오 마이닝)

  • Lee, Han-Sung;Im, Young-Hee;Yu, Jae-Hak;Oh, Seung-Geun;Park, Dai-Hee
    • Journal of KIISE:Databases
    • /
    • v.37 no.3
    • /
    • pp.127-136
    • /
    • 2010
  • With rapid growth of information and computer communication technologies, the numbers of digital documents including multimedia data have been recently exploded. In particular, news video database and news video mining have became the subject of extensive research, to develop effective and efficient tools for manipulation and analysis of news videos, because of their information richness. However, many research focus on browsing, retrieval and summarization of news videos. Up to date, it is a relatively early state to discover and to analyse the plentiful latent semantic knowledge from news videos. In this paper, we propose the news video mining system based on multi-modal approach and text mining, which uses the visual-textual information of news video clips and their scripts. The proposed system systematically constructs a taxonomy of news video stories in automatic manner with hierarchical clustering algorithm which is one of text mining methods. Then, it multilaterally analyzes the topics of news video stories by means of time-cluster trend graph, weighted cluster growth index, and network analysis. To clarify the validity of our approach, we analyzed the news videos on "The Second Summit of South and North Korea in 2007".