통합 검색 | Korea Science

Use of Graph Database for the Integration of Heterogeneous Biological Data

Yoon, Byoung-Ha;Kim, Seon-Kyu;Kim, Seon-Young
- Genomics & Informatics
- /
- 제15권1호
- /
- pp.19-27
- /
- 2017
Understanding complex relationships among heterogeneous biological data is one of the fundamental goals in biology. In most cases, diverse biological data are stored in relational databases, such as MySQL and Oracle, which store data in multiple tables and then infer relationships by multiple-join statements. Recently, a new type of database, called the graph-based database, was developed to natively represent various kinds of complex relationships, and it is widely used among computer science communities and IT industries. Here, we demonstrate the feasibility of using a graph-based database for complex biological relationships by comparing the performance between MySQL and Neo4j, one of the most widely used graph databases. We collected various biological data (protein-protein interaction, drug-target, gene-disease, etc.) from several existing sources, removed duplicate and redundant data, and finally constructed a graph database containing 114,550 nodes and 82,674,321 relationships. When we tested the query execution performance of MySQL versus Neo4j, we found that Neo4j outperformed MySQL in all cases. While Neo4j exhibited a very fast response for various queries, MySQL exhibited latent or unfinished responses for complex queries with multiple-join statements. These results show that using graph-based databases, such as Neo4j, is an efficient way to store complex biological relationships. Moreover, querying a graph database in diverse ways has the potential to reveal novel relationships among heterogeneous biological data.
https://doi.org/10.5808/GI.2017.15.1.19 인용 PDF KSCI

Web Services Based Biological Data Analysis Tool

Kim, Min Kyung;Choi, Yo Hahn;Yoo, Seong Joon;Park, Hyun Seok
- Genomics & Informatics
- /
- 제2권3호
- /
- pp.142-146
- /
- 2004
Biological data and analysis tools are accumulated in distributed databases and web servers. For this reason, biologists who want to find information from the web should be aware of the various kinds of resources where it is located and how it is retrieved. Integrating the data from heterogeneous biological resources will enable biologists to discover new knowledge across the specific domain boundaries from sequences to expression, structure, and pathway. And inevitably biological databases contain noisy data. Therefore, consensus among databases will confirm the reliability of its contents. We have developed WeSAT that integrates distributed and heterogeneous biological databases and analysis tools, providing through Web Services protocols. In WeSAT, biologists are retrieved specific entries in SWISS-PROT/EMBL, PDB, and KEGG, which have annotated information about sequence, structure, and pathway. And further analysis is carried by integrated services for example homology search and multiple alignments. WeSAT makes it possible to retrieve real time updated data and analysis from the scattered databases in a single platform through Web Services.
PDF KSCI

Nonclassical Chemical Kinetics for Description of Chemical Fluctuation in a Dynamically Heterogeneous Biological System

Lim, Yu-Rim;Park, Seong-Jun;Lee, Sang-Youb;Sung, Jae-Young
- Bulletin of the Korean Chemical Society
- /
- 제33권3호
- /
- pp.963-970
- /
- 2012
We review novel chemical kinetics proposed for quantitative description of fluctuations in reaction times and in the number of product molecules in a heterogeneous biological system, and discuss quantitative interpretation of randomness parameter data in enzymatic turnover times of ${\beta}$-galactosidase. We discuss generalization of renewal theory for description of chemical fluctuation in product level in a multistep biopolymer reaction occurring in a dynamically heterogeneous environment. New stochastic simulation results are presented for the chemical fluctuation of a dynamically heterogeneous reaction system, which clearly show the effects of the initial state distribution on the chemical fluctuation. Our stochastic simulation results are found to be in good agreement with predictions of the analytic results obtained from the generalized master equation.
https://doi.org/10.5012/bkcs.2012.33.3.963 인용 PDF KSCI

Multivariate Procedure for Variable Selection and Classification of High Dimensional Heterogeneous Data

Mehmood, Tahir;Rasheed, Zahid
- Communications for Statistical Applications and Methods
- /
- 제22권6호
- /
- pp.575-587
- /
- 2015
The development in data collection techniques results in high dimensional data sets, where discrimination is an important and commonly encountered problem that are crucial to resolve when high dimensional data is heterogeneous (non-common variance covariance structure for classes). An example of this is to classify microbial habitat preferences based on codon/bi-codon usage. Habitat preference is important to study for evolutionary genetic relationships and may help industry produce specific enzymes. Most classification procedures assume homogeneity (common variance covariance structure for all classes), which is not guaranteed in most high dimensional data sets. We have introduced regularized elimination in partial least square coupled with QDA (rePLS-QDA) for the parsimonious variable selection and classification of high dimensional heterogeneous data sets based on recently introduced regularized elimination for variable selection in partial least square (rePLS) and heterogeneous classification procedure quadratic discriminant analysis (QDA). A comparison of proposed and existing methods is conducted over the simulated data set; in addition, the proposed procedure is implemented to classify microbial habitat preferences by their codon/bi-codon usage. Five bacterial habitats (Aquatic, Host Associated, Multiple, Specialized and Terrestrial) are modeled. The classification accuracy of each habitat is satisfactory and ranges from 89.1% to 100% on test data. Interesting codon/bi-codons usage, their mutual interactions influential for respective habitat preference are identified. The proposed method also produced results that concurred with known biological characteristics that will help researchers better understand divergence of species.
https://doi.org/10.5351/CSAM.2015.22.6.575 인용 PDF KSCI

CT 영상 기반 집속 초음파 시뮬레이션 모델의 불균질 물성과 균질 물성에 따른 모델 분석 결과 비교 (Comparison of Analysis Results According to Heterogeneous or Homogeneous Model for CT-based Focused Ultrasound Simulation)

서현;이은희
- 대한의용생체공학회:의공학회지
- /
- 제43권6호
- /
- pp.369-374
- /
- 2022
Purpose: Focused ultrasound is an emerging technology for treating the brain locally in a noninvasive manner. In this study, we have investigated the influence of skull properties on simulating transcranial pressure field. Methods: A 3D computational model of transcranial focused ultrasound was constructed using female and male CT data to solve for intracranial pressure. For heterogeneous model, the acoustic properties were calculated from CT Hounsfield units based on a porosity. The homogeneous model assigned constant acoustic properties for the single-layered skull. Results: A computational model was validated against empirical data. The homogeneous models were then compared with the heterogeneous model, resulted in 10.87% and 7.19% differences in peak pressure for female and male models respectively. For the focal volume, homogeneous model demonstrated more than 94% overlap compared with the heterogeneous model. Conclusion: Homogeneous model can be constructed using MR images that are commonly used for the segmentation of the skull. We propose the possibility of the homogeneous model for the simulating transcranial pressure field owing to comparable focal volume between homogeneous model and heterogeneous model.
https://doi.org/10.9718/JBER.2022.43.6.369 인용 PDF KSCI

Requirement Analysis for Bio-Information Integration Systems

Lee, Sean;Lee, Phil-Hyoun;Dokyun Na;Lee, Doheon;Lee, Kwanghyung;Bae, Myung-Nam
- 한국지능시스템학회:학술대회논문집
- /
- 한국퍼지및지능시스템학회 2003년도 ISIS 2003
- /
- pp.11-15
- /
- 2003
Amount of biological data information has been increasing exponentially. In order to cope with this bio-information explosion, it is necessary to construct a biological data information integration system. The integration system could provide useful services for bio-application developers by answering general complex queries that require accessing information from heterogeneous bio data sources, and easily accommodate a new database into the integrated systems. In this paper, we analyze architectures and mechanisms of existing integration systems with their advantages and disadvantages. Based on this analysis and user requirement studies, we propose an integration system framework that embraces advantages of the existing systems. More specifically, we propose an integration system architecture composed of a mediator and wrappers, which can offer a service interface layer for various other applications as well as independent biologists, thus playing the role of database management system for biology applications. In other words, the system can help abstract the heterogeneous information structures and formats from the application layer. In the system, the wrappers send database-specific queries and report the result to the mediator using XML. The proposed system could facilitate in silico knowledge discovery by allowing combination of numerous discrete biological information databases.
PDF

BSML 기반 능동 트리거 규칙을 이용한 염기서열정보관리시스템의 구현 (Implementation of an Information Management System for Nucleotide Sequences based on BSML using Active Trigger Rules)

박성희;정광수;류근호
- 한국정보과학회논문지:데이타베이스
- /
- 제32권1호
- /
- pp.24-42
- /
- 2005
유전체 서열을 포함하는 생물정보는 지속적으로 변화하며 이질적이고 다양하다는 특성을 갖는다. 이러한 생물 정보의 특성을 반영한 관리시스템이 요구되지만 현재 대부분의 기존 생물정보 데이타베이스는 생물 데이타에 대한 저장소로만 이용된다. 따라서 이 논문에서는 생물학 연구실 수준에서 시퀀싱 실험을 통해 생산되거나 다양한 공개용 데이타베이스로부터 수집된 염기 서열 데이타를 파일 포맷 변환, 편집, 저장 및 검색을 수행하는 서열정보관리 시스템을 제시한다. 이질적인 서열 포맷간의 파일 변환을 위하여 XML기반 BSML을 공통 포맷으로 이용한다. 서열 저장관리에서는 동일한 DNA 조각에 대한 서열 구성의 변경정보를 저장하기 위해 서열 버전을 정의하고 능동 트리거 규칙을 이용하여 변경 정보 검출 및 생성 방법을 보여준다. 트리거 기능을 이용하여 서열의 변경 정보를 자동적으로 데이타베이스에서 저장관리 할 수 있음을 보이고 성능을 평가하였다.
PDF KSCI

Biological Data Analysis using DDBJ Web services

Sugawara, Hideaki;Miyazaki, Satorn;Abe, Takashi;Shigemoto, Yasumasa
- 한국생물정보학회:학술대회논문집
- /
- 한국생물정보시스템생물학회 2005년도 BIOINFO 2005
- /
- pp.379-382
- /
- 2005
We demonstrate workflows in biological data retrieval and analysis using the DDBJ Web Service; specifically introduce a workflow for the analysis of proteins or proteomics data sets. The workflow mechanically extracts the gene whose protein structure and function are known from all the genes of a human genome in Ensembl (http://www.ensembl.org/) based on cross-references among Ensembl, Swiss-Prot (http://www.ebi.ac.uk/swissprot) and PDB (Protein Data Bank; http://www.wwpdb.org/). The workflow discovered ‘hidden’ linkages among databases. We will be able to integrate distributed and heterogeneous data systems into workflows, if they are provided based on standards for Web services.
PDF

SOP (Search of Omics Pathway): A Web-based Tool for Visualization of KEGG Pathway Diagrams of Omics Data

Kim, Jun-Sub;Yeom, Hye-Jung;Kim, Seung-Jun;Kim, Ji-Hoon;Park, Hye-Won;Oh, Moon-Ju;Hwang, Seung-Yong
- Molecular & Cellular Toxicology
- /
- 제3권3호
- /
- pp.208-213
- /
- 2007
With the help of a development and popularization of microarray technology that enable to us to simultaneously investigate the expression pattern of thousands of genes, the toxicogenomics experimenters can interpret the genome-scale interaction between genes exposed in toxicant or toxicant-related environment. The ultimate and primary goal of toxicogenomics identifies functional context among the group of genes that are differentially or similarly coexpressed under the specific toxic substance. On the other side, public reference databases with transcriptom, proteom, and biological pathway information are needed for the analysis of these complex omics data. However, due to the heterogeneous and independent nature of these databases, it is hard to individually analyze a large omics annotations and their pathway information. Fortunately, several web sites of the public database provide information linked to other. Nevertheless it involves not only approriate information but also unnecessary information to users. Therefore, the systematically integrated database that is suitable to a demand of experimenters is needed. For these reasons, we propose SOP (Search of Omics Pathway) database system which is constructed as the integrated biological database converting heterogeneous feature of public databases into combined feature. In addition, SOP offers user-friendly web interfaces which enable users to submit gene queries for biological interpretation of gene lists derived from omics experiments. Outputs of SOP web interface are supported as the omics annotation table and the visualized pathway maps of KEGG PATHWAY database. We believe that SOP will appear as a helpful tool to perform biological interpretation of genes or proteins traced to omics experiments, lead to new discoveries from their pathway analysis, and design new hypothesis for a next toxicogenomics experiments.
PDF KSCI

Metabolic Pathways Associated with Kimchi, a Traditional Korean Food, Based on In Silico Modeling of Published Data

Shin, Ga Hee;Kang, Byeong-Chul;Jang, Dai Ja
- Genomics & Informatics
- /
- 제14권4호
- /
- pp.222-229
- /
- 2016
Kimchi is a traditional Korean food prepared by fermenting vegetables, such as Chinese cabbage and radishes, which are seasoned with various ingredients, including red pepper powder, garlic, ginger, green onion, fermented seafood (Jeotgal), and salt. The various unique microorganisms and bioactive components in kimchi show antioxidant activity and have been associated with an enhanced immune response, as well as anti-cancer and anti-diabetic effects. Red pepper inhibits decay due to microorganisms and prevents food from spoiling. The vast amount of biological information generated by academic and industrial research groups is reflected in a rapidly growing body of scientific literature and expanding data resources. However, the genome, biological pathway, and related disease data are insufficient to explain the health benefits of kimchi because of the varied and heterogeneous data types. Therefore, we have constructed an appropriate semantic data model based on an integrated food knowledge database and analyzed the functional and biological processes associated with kimchi in silico. This complex semantic network of several entities and connections was generalized to answer complex questions, and we demonstrated how specific disease pathways are related to kimchi consumption.
https://doi.org/10.5808/GI.2016.14.4.222 인용 PDF KSCI

검색결과 45건 처리시간 0.022초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)