• 제목/요약/키워드: Biological database

검색결과 501건 처리시간 0.027초

Use of Graph Database for the Integration of Heterogeneous Biological Data

  • Yoon, Byoung-Ha;Kim, Seon-Kyu;Kim, Seon-Young
    • Genomics & Informatics
    • /
    • 제15권1호
    • /
    • pp.19-27
    • /
    • 2017
  • Understanding complex relationships among heterogeneous biological data is one of the fundamental goals in biology. In most cases, diverse biological data are stored in relational databases, such as MySQL and Oracle, which store data in multiple tables and then infer relationships by multiple-join statements. Recently, a new type of database, called the graph-based database, was developed to natively represent various kinds of complex relationships, and it is widely used among computer science communities and IT industries. Here, we demonstrate the feasibility of using a graph-based database for complex biological relationships by comparing the performance between MySQL and Neo4j, one of the most widely used graph databases. We collected various biological data (protein-protein interaction, drug-target, gene-disease, etc.) from several existing sources, removed duplicate and redundant data, and finally constructed a graph database containing 114,550 nodes and 82,674,321 relationships. When we tested the query execution performance of MySQL versus Neo4j, we found that Neo4j outperformed MySQL in all cases. While Neo4j exhibited a very fast response for various queries, MySQL exhibited latent or unfinished responses for complex queries with multiple-join statements. These results show that using graph-based databases, such as Neo4j, is an efficient way to store complex biological relationships. Moreover, querying a graph database in diverse ways has the potential to reveal novel relationships among heterogeneous biological data.

Database of National Species List of Korea: the taxonomical systematics platform for managing scientific names of Korean native species

  • Park, Jongsun;An, Jung-Hyun;Kim, Yongsung;Kim, Donghyun;Yang, Byeong-Gug;Kim, Taeho
    • Journal of Species Research
    • /
    • 제9권3호
    • /
    • pp.233-246
    • /
    • 2020
  • A scientific name is one of changeable terms in biology whenever additional research results of specific taxa is accumulated. The Database of the National Species List of Korea (DBNKo) was developed to manage taxonomic information of Korean species, designed to describe the changeable and complex taxonomical structure and information. A Korean Taxonomical Serial Number (KTSN) was assigned to each taxon, different from the normally used systems that the scientific name was considered as primary key to manage higher rank of taxa systematically. Common names were also treated with the KTSN, reflecting that common name is considered as one type of taxon. Additional taxonomic information (e.g., synonyms, original names, and references) was also added to the database. A web interface with an intuitive dashboard presenting taxonomic hierarchical structure is provided to experts and/or managers of the DBNKo. Currently, several biological databases are available in the National Institute of Biological Resources (NIBR) such as a specimen database, a digital library, a genetic information system, and the shared species data based on the DBNKo. The DBNKo started sharing species information with other institutions such as the Nakdonggang National Institute of Biological Resources. It is an ideal centralized species database to manage standardized information of Korean species.

Higher Order Knowledge Processing: Pathway Database and Ontologies

  • Fukuda, Ken Ichiro
    • Genomics & Informatics
    • /
    • 제3권2호
    • /
    • pp.47-51
    • /
    • 2005
  • Molecular mechanisms of biological processes are typically represented as 'pathways' that have a graph­analogical network structure. However, due to the diversity of topics that pathways cover, their constituent biological entities are highly diverse and the semantics is embedded implicitly. The kinds of interactions that connect biological entities are likewise diverse. Consequently, how to model or process pathway data is not a trivial issue. In this review article, we give an overview of the challenges in pathway database development by taking the INOH project as an example.

하이브리드 데이터베이스 기반의 4단계 레이어 계층구조에서 메타규칙을 적용한 질의어 수행 모델에 관한 연구 (A Study of Query Processing Model to applied Meta Rule in 4-Level Layer based on Hybrid Databases)

  • 오염덕
    • 한국컴퓨터정보학회논문지
    • /
    • 제14권6호
    • /
    • pp.125-134
    • /
    • 2009
  • 웹을 통한 생물 데이터 접근 방식은 많은 과학자들에게 대화식으로 서로 다른 형식의 생물 데이터베이스 내용을 검색할 뿐만 아니라, 한 데이터베이스에서 다른 분자생물 데이터베이스로의 연결을 위한 강력한 도구를 제공한다. 본 논문에서의 생물 개념 모델은 생물 데이터 제어를 위한 4가지 통합 레이어를 기반으로 각 생물 데이터 소스 간의 연관성에 따른 규칙 속성을 적용하고 데이터 소스 중에 관심 대상이 되는 개체를 표현하여 하이브리드 생물 데이터 모델을 구성하였다. 특정 사용자의 응용 서비스 요구가 발생하면 해당 생물 데이터베이스와 웹 서비스를 통한 데이터 소스로부터 정보를 획득한다. 본 논문에서는 통합 레이어를 기반으로 웹 데이터 소스 상에서 정보를 탐색하기 위해 메타 규칙을 적용한 질의어 처리 모형과 수행구조를 정형화하였다.

국립공원 지역의 한국 자생생물자원 전통지식 DB구축을 통한 전통지식 현황 분석 (Traditional Knowledge analysis based on Native Biological Resources Database Construction of the National Park Area)

  • 배세은;김보영;김성하;박정환;배은경;장진화;이상훈;박재원;신진섭
    • 한국콘텐츠학회논문지
    • /
    • 제16권9호
    • /
    • pp.267-275
    • /
    • 2016
  • 의 식 주와 건강, 즉, 인류가 지속적인 삶을 유지하는데 끊임없이 이용되는 생물종은 다양한 곳에 분포되어 있다. 이를 보호하고 자원으로써의 가치를 높이기 위해 전 세계 많은 국가에서는 자원발굴, Database구축 등 다양한 노력을 하고 있으며 그 결과 생물다양성협약이 만들어졌다. 이를 지키기 위해 각 국은 자생생물을 보호하고 주권확립을 위해 최소한의 노력인 DB 구축을 진행하고 있다. 본 연구에서는 이러한 노력의 일환으로 기 수집된 국립공원 지역 일대의 전통지식 자원의 데이터 양식을 통일화 시키고 자연어의 표준화 작업을 통해 자생생물자원의 DB를 구축하였다. 이를 기반으로 전통지식의 분포, 이용방법, 종의 종류 등을 분석해 보았다. 그 결과, 대부분 '식(食)'에 많이 이용되고 있었으며 다양한 질병치료 및 증상호전을 위해서 여러 생물이 다양한 형태로 사용되고 있음을 확인할 수 있었다.

WebChemDB: An Integrated Chemical Database Retrieval System

  • Hou, Bo-Kyeng;Moon, Eun-Joung;Moon, Sung-Chul;Kim, Hae-Jin
    • Genomics & Informatics
    • /
    • 제7권4호
    • /
    • pp.212-216
    • /
    • 2009
  • WebChemDB is an integrated chemical database retrieval system that provides access to over 8 million publicly available chemical structures, including related information on their biological activities and direct links to other public chemical resources, such as PubChem, ChEBI, and DrugBank. The data are publicly available over the web, using two-dimensional (2D) and three-dimensional (3D) structure retrieval systems with various filters and molecular descriptors. The web services API also provides researchers with functionalities to programmatically manipulate, search, and analyze the data.

DADI 기반의 생물다양성정보에 대한 GRM 구축 (Contracture for GRM of Biological Resources Information of based DADI)

  • 이계준;박형선;안부영;양진호
    • 한국정보기술응용학회:학술대회논문집
    • /
    • 한국정보기술응용학회 2002년도 추계공동학술대회 정보환경 변화에 따른 신정보기술 패러다임
    • /
    • pp.479-484
    • /
    • 2002
  • 본 논문에서는 첫째, 생물자원정보 데이터베이스는 크게 생물종 정보 구축과 종정보를 대상으로 구축되어지는 컨텐트(content) 정보로 나눠 XML(eXtensible Markup Language)을 기반으로 데이터베이스화하는 것이다. 둘째, 분류학자들에 의해 정의된 항목과 국제적인 GSD(Global Species Database) 구축의 메타데이터가 되는 항목들을 중심으로 정보가 구축되어야 하며, 효율적인 지역(Local) 정보의 데이터베이스화를 위하여 컴포넌트(Component) 기반의 입력시스템을 구축하여 제공. 셋째, 정보의 서비스 및 공동활용 체제를 구축하기 위하여 DADI(Data Access and Data Interoperability) 기반의 GRM(gobal Road Map)을 구축의 3단계 과정을 통해 생물자원정보에 대한 데이터베이스를 구축하고 원활한 서비스 체제 구축을 위한 연구를 수행하였다.

  • PDF

DADI 기반의 생물다양성정보에 대한 GRM 구축 (Contracture for GRM of Biological Resources Information of based DADI)

  • 이계준;박형선;안부영;양진호
    • 한국산업정보학회:학술대회논문집
    • /
    • 한국산업정보학회 2002년도 추계공동학술대회
    • /
    • pp.479-484
    • /
    • 2002
  • 본 논문에서는 첫째, 생물자원정보 데이터베이스는 3게 생물종 정보 구축과 종정보를 대상으로 구축되어지는 컨텐트(content) 정보로 나눠 XML(eXtensible Markup Language)을 기반으로 데이터베이스화하는 것이다. 둘째, 분류학자들에 의해 정의된 항목과 국제적인 GSD(Global Species Database) 구축의 메타데이터가 되는 항목들을 중심으로 정보가 구축되어야 하며, 효율적인 지역(Local) 정보의 데이터베이스화를 위하여 컴포넌트(Component) 기반의 입력시스템을 구축하여 제공. 셋째, 정보의 서비스 및 공동활용 체제를 구축하기 위하여 DADI((Data Access and Data Interoperability) 기반의 GRM(Global Road Map)을 구축의 3단계 과정을 통해 생물자원정보에 대한 데이터베이스를 구축하고 원활한 서비스체제 구축을 위한 연구를 수행하였다.

  • PDF

GEDA: New Knowledge Base of Gene Expression in Drug Addiction

  • Suh, Young-Ju;Yang, Moon-Hee;Yoon, Suk-Joon;Park, Jong-Hoon
    • BMB Reports
    • /
    • 제39권4호
    • /
    • pp.441-447
    • /
    • 2006
  • Abuse of drugs can elicit compulsive drug seeking behaviors upon repeated administration, and ultimately leads to the phenomenon of addiction. We developed a procedure for the standardization of microarray gene expression data of rat brain in drug addiction and stored them in a single integrated database system, focusing on more effective data processing and interpretation. Another characteristic of the present database is that it has a systematic flexibility for statistical analysis and linking with other databases. Basically, we adopt an intelligent SQL querying system, as the foundation of our DB, in order to set up an interactive module which can automatically read the raw gene expression data in the standardized format. We maximize the usability of this DB, helping users study significant gene expression and identify biological function of the genes through integrated up-to-date gene information such as GO annotation and metabolic pathway. For collecting the latest information of selected gene from the database, we also set up the local BLAST search engine and non-redundant sequence database updated by NCBI server on a daily basis. We find that the present database is a useful query interface and data-mining tool, specifically for finding out the genes related to drug addiction. We apply this system to the identification and characterization of methamphetamine-induced genes' behavior in rat brain.

SOP (Search of Omics Pathway): A Web-based Tool for Visualization of KEGG Pathway Diagrams of Omics Data

  • Kim, Jun-Sub;Yeom, Hye-Jung;Kim, Seung-Jun;Kim, Ji-Hoon;Park, Hye-Won;Oh, Moon-Ju;Hwang, Seung-Yong
    • Molecular & Cellular Toxicology
    • /
    • 제3권3호
    • /
    • pp.208-213
    • /
    • 2007
  • With the help of a development and popularization of microarray technology that enable to us to simultaneously investigate the expression pattern of thousands of genes, the toxicogenomics experimenters can interpret the genome-scale interaction between genes exposed in toxicant or toxicant-related environment. The ultimate and primary goal of toxicogenomics identifies functional context among the group of genes that are differentially or similarly coexpressed under the specific toxic substance. On the other side, public reference databases with transcriptom, proteom, and biological pathway information are needed for the analysis of these complex omics data. However, due to the heterogeneous and independent nature of these databases, it is hard to individually analyze a large omics annotations and their pathway information. Fortunately, several web sites of the public database provide information linked to other. Nevertheless it involves not only approriate information but also unnecessary information to users. Therefore, the systematically integrated database that is suitable to a demand of experimenters is needed. For these reasons, we propose SOP (Search of Omics Pathway) database system which is constructed as the integrated biological database converting heterogeneous feature of public databases into combined feature. In addition, SOP offers user-friendly web interfaces which enable users to submit gene queries for biological interpretation of gene lists derived from omics experiments. Outputs of SOP web interface are supported as the omics annotation table and the visualized pathway maps of KEGG PATHWAY database. We believe that SOP will appear as a helpful tool to perform biological interpretation of genes or proteins traced to omics experiments, lead to new discoveries from their pathway analysis, and design new hypothesis for a next toxicogenomics experiments.