• Title/Summary/Keyword: heterogeneous biological data

Search Result 45, Processing Time 0.026 seconds

Use of Graph Database for the Integration of Heterogeneous Biological Data

  • Yoon, Byoung-Ha;Kim, Seon-Kyu;Kim, Seon-Young
    • Genomics & Informatics
    • /
    • v.15 no.1
    • /
    • pp.19-27
    • /
    • 2017
  • Understanding complex relationships among heterogeneous biological data is one of the fundamental goals in biology. In most cases, diverse biological data are stored in relational databases, such as MySQL and Oracle, which store data in multiple tables and then infer relationships by multiple-join statements. Recently, a new type of database, called the graph-based database, was developed to natively represent various kinds of complex relationships, and it is widely used among computer science communities and IT industries. Here, we demonstrate the feasibility of using a graph-based database for complex biological relationships by comparing the performance between MySQL and Neo4j, one of the most widely used graph databases. We collected various biological data (protein-protein interaction, drug-target, gene-disease, etc.) from several existing sources, removed duplicate and redundant data, and finally constructed a graph database containing 114,550 nodes and 82,674,321 relationships. When we tested the query execution performance of MySQL versus Neo4j, we found that Neo4j outperformed MySQL in all cases. While Neo4j exhibited a very fast response for various queries, MySQL exhibited latent or unfinished responses for complex queries with multiple-join statements. These results show that using graph-based databases, such as Neo4j, is an efficient way to store complex biological relationships. Moreover, querying a graph database in diverse ways has the potential to reveal novel relationships among heterogeneous biological data.

Web Services Based Biological Data Analysis Tool

  • Kim, Min Kyung;Choi, Yo Hahn;Yoo, Seong Joon;Park, Hyun Seok
    • Genomics & Informatics
    • /
    • v.2 no.3
    • /
    • pp.142-146
    • /
    • 2004
  • Biological data and analysis tools are accumulated in distributed databases and web servers. For this reason, biologists who want to find information from the web should be aware of the various kinds of resources where it is located and how it is retrieved. Integrating the data from heterogeneous biological resources will enable biologists to discover new knowledge across the specific domain boundaries from sequences to expression, structure, and pathway. And inevitably biological databases contain noisy data. Therefore, consensus among databases will confirm the reliability of its contents. We have developed WeSAT that integrates distributed and heterogeneous biological databases and analysis tools, providing through Web Services protocols. In WeSAT, biologists are retrieved specific entries in SWISS-PROT/EMBL, PDB, and KEGG, which have annotated information about sequence, structure, and pathway. And further analysis is carried by integrated services for example homology search and multiple alignments. WeSAT makes it possible to retrieve real time updated data and analysis from the scattered databases in a single platform through Web Services.

Nonclassical Chemical Kinetics for Description of Chemical Fluctuation in a Dynamically Heterogeneous Biological System

  • Lim, Yu-Rim;Park, Seong-Jun;Lee, Sang-Youb;Sung, Jae-Young
    • Bulletin of the Korean Chemical Society
    • /
    • v.33 no.3
    • /
    • pp.963-970
    • /
    • 2012
  • We review novel chemical kinetics proposed for quantitative description of fluctuations in reaction times and in the number of product molecules in a heterogeneous biological system, and discuss quantitative interpretation of randomness parameter data in enzymatic turnover times of ${\beta}$-galactosidase. We discuss generalization of renewal theory for description of chemical fluctuation in product level in a multistep biopolymer reaction occurring in a dynamically heterogeneous environment. New stochastic simulation results are presented for the chemical fluctuation of a dynamically heterogeneous reaction system, which clearly show the effects of the initial state distribution on the chemical fluctuation. Our stochastic simulation results are found to be in good agreement with predictions of the analytic results obtained from the generalized master equation.

Multivariate Procedure for Variable Selection and Classification of High Dimensional Heterogeneous Data

  • Mehmood, Tahir;Rasheed, Zahid
    • Communications for Statistical Applications and Methods
    • /
    • v.22 no.6
    • /
    • pp.575-587
    • /
    • 2015
  • The development in data collection techniques results in high dimensional data sets, where discrimination is an important and commonly encountered problem that are crucial to resolve when high dimensional data is heterogeneous (non-common variance covariance structure for classes). An example of this is to classify microbial habitat preferences based on codon/bi-codon usage. Habitat preference is important to study for evolutionary genetic relationships and may help industry produce specific enzymes. Most classification procedures assume homogeneity (common variance covariance structure for all classes), which is not guaranteed in most high dimensional data sets. We have introduced regularized elimination in partial least square coupled with QDA (rePLS-QDA) for the parsimonious variable selection and classification of high dimensional heterogeneous data sets based on recently introduced regularized elimination for variable selection in partial least square (rePLS) and heterogeneous classification procedure quadratic discriminant analysis (QDA). A comparison of proposed and existing methods is conducted over the simulated data set; in addition, the proposed procedure is implemented to classify microbial habitat preferences by their codon/bi-codon usage. Five bacterial habitats (Aquatic, Host Associated, Multiple, Specialized and Terrestrial) are modeled. The classification accuracy of each habitat is satisfactory and ranges from 89.1% to 100% on test data. Interesting codon/bi-codons usage, their mutual interactions influential for respective habitat preference are identified. The proposed method also produced results that concurred with known biological characteristics that will help researchers better understand divergence of species.

Comparison of Analysis Results According to Heterogeneous or Homogeneous Model for CT-based Focused Ultrasound Simulation (CT 영상 기반 집속 초음파 시뮬레이션 모델의 불균질 물성과 균질 물성에 따른 모델 분석 결과 비교)

  • Hyeon, Seo;Eun-Hee, Lee
    • Journal of Biomedical Engineering Research
    • /
    • v.43 no.6
    • /
    • pp.369-374
    • /
    • 2022
  • Purpose: Focused ultrasound is an emerging technology for treating the brain locally in a noninvasive manner. In this study, we have investigated the influence of skull properties on simulating transcranial pressure field. Methods: A 3D computational model of transcranial focused ultrasound was constructed using female and male CT data to solve for intracranial pressure. For heterogeneous model, the acoustic properties were calculated from CT Hounsfield units based on a porosity. The homogeneous model assigned constant acoustic properties for the single-layered skull. Results: A computational model was validated against empirical data. The homogeneous models were then compared with the heterogeneous model, resulted in 10.87% and 7.19% differences in peak pressure for female and male models respectively. For the focal volume, homogeneous model demonstrated more than 94% overlap compared with the heterogeneous model. Conclusion: Homogeneous model can be constructed using MR images that are commonly used for the segmentation of the skull. We propose the possibility of the homogeneous model for the simulating transcranial pressure field owing to comparable focal volume between homogeneous model and heterogeneous model.

Requirement Analysis for Bio-Information Integration Systems

  • Lee, Sean;Lee, Phil-Hyoun;Dokyun Na;Lee, Doheon;Lee, Kwanghyung;Bae, Myung-Nam
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2003.09a
    • /
    • pp.11-15
    • /
    • 2003
  • Amount of biological data information has been increasing exponentially. In order to cope with this bio-information explosion, it is necessary to construct a biological data information integration system. The integration system could provide useful services for bio-application developers by answering general complex queries that require accessing information from heterogeneous bio data sources, and easily accommodate a new database into the integrated systems. In this paper, we analyze architectures and mechanisms of existing integration systems with their advantages and disadvantages. Based on this analysis and user requirement studies, we propose an integration system framework that embraces advantages of the existing systems. More specifically, we propose an integration system architecture composed of a mediator and wrappers, which can offer a service interface layer for various other applications as well as independent biologists, thus playing the role of database management system for biology applications. In other words, the system can help abstract the heterogeneous information structures and formats from the application layer. In the system, the wrappers send database-specific queries and report the result to the mediator using XML. The proposed system could facilitate in silico knowledge discovery by allowing combination of numerous discrete biological information databases.

  • PDF

Implementation of an Information Management System for Nucleotide Sequences based on BSML using Active Trigger Rules (BSML 기반 능동 트리거 규칙을 이용한 염기서열정보관리시스템의 구현)

  • Park Sung Hee;Jung Kwang Su;Ryu Keun Ho
    • Journal of KIISE:Databases
    • /
    • v.32 no.1
    • /
    • pp.24-42
    • /
    • 2005
  • Characteristics of biological data including genome sequences are heterogeneous and various. Although the need of management systems for genome sequencing which should reflect biological characteristics has been raised, most current biological databases provide restricted function as repositories for biological data. Therefore, this paper describes a management system of nucleotide sequences at the level of biological laboratories. It includes format transformation, editing, storing and retrieval for collected nucleotide sequences from public databases, and handles sequence produced by experiments. It uses BSML based on XML as a common format in order to extract data fields and transfer heterogeneous sequence formats. To manage sequences and their changes, version management system for originated DNA is required so as to detect transformed new sequencing appearance and trigger database update. Our experimental results show that applying active trigger rules to manage changes of sequences can automatically store changes of sequences into databases.

Biological Data Analysis using DDBJ Web services

  • Sugawara, Hideaki;Miyazaki, Satorn;Abe, Takashi;Shigemoto, Yasumasa
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2005.09a
    • /
    • pp.379-382
    • /
    • 2005
  • We demonstrate workflows in biological data retrieval and analysis using the DDBJ Web Service; specifically introduce a workflow for the analysis of proteins or proteomics data sets. The workflow mechanically extracts the gene whose protein structure and function are known from all the genes of a human genome in Ensembl (http://www.ensembl.org/) based on cross-references among Ensembl, Swiss-Prot (http://www.ebi.ac.uk/swissprot) and PDB (Protein Data Bank; http://www.wwpdb.org/). The workflow discovered ‘hidden’ linkages among databases. We will be able to integrate distributed and heterogeneous data systems into workflows, if they are provided based on standards for Web services.

  • PDF

SOP (Search of Omics Pathway): A Web-based Tool for Visualization of KEGG Pathway Diagrams of Omics Data

  • Kim, Jun-Sub;Yeom, Hye-Jung;Kim, Seung-Jun;Kim, Ji-Hoon;Park, Hye-Won;Oh, Moon-Ju;Hwang, Seung-Yong
    • Molecular & Cellular Toxicology
    • /
    • v.3 no.3
    • /
    • pp.208-213
    • /
    • 2007
  • With the help of a development and popularization of microarray technology that enable to us to simultaneously investigate the expression pattern of thousands of genes, the toxicogenomics experimenters can interpret the genome-scale interaction between genes exposed in toxicant or toxicant-related environment. The ultimate and primary goal of toxicogenomics identifies functional context among the group of genes that are differentially or similarly coexpressed under the specific toxic substance. On the other side, public reference databases with transcriptom, proteom, and biological pathway information are needed for the analysis of these complex omics data. However, due to the heterogeneous and independent nature of these databases, it is hard to individually analyze a large omics annotations and their pathway information. Fortunately, several web sites of the public database provide information linked to other. Nevertheless it involves not only approriate information but also unnecessary information to users. Therefore, the systematically integrated database that is suitable to a demand of experimenters is needed. For these reasons, we propose SOP (Search of Omics Pathway) database system which is constructed as the integrated biological database converting heterogeneous feature of public databases into combined feature. In addition, SOP offers user-friendly web interfaces which enable users to submit gene queries for biological interpretation of gene lists derived from omics experiments. Outputs of SOP web interface are supported as the omics annotation table and the visualized pathway maps of KEGG PATHWAY database. We believe that SOP will appear as a helpful tool to perform biological interpretation of genes or proteins traced to omics experiments, lead to new discoveries from their pathway analysis, and design new hypothesis for a next toxicogenomics experiments.

Metabolic Pathways Associated with Kimchi, a Traditional Korean Food, Based on In Silico Modeling of Published Data

  • Shin, Ga Hee;Kang, Byeong-Chul;Jang, Dai Ja
    • Genomics & Informatics
    • /
    • v.14 no.4
    • /
    • pp.222-229
    • /
    • 2016
  • Kimchi is a traditional Korean food prepared by fermenting vegetables, such as Chinese cabbage and radishes, which are seasoned with various ingredients, including red pepper powder, garlic, ginger, green onion, fermented seafood (Jeotgal), and salt. The various unique microorganisms and bioactive components in kimchi show antioxidant activity and have been associated with an enhanced immune response, as well as anti-cancer and anti-diabetic effects. Red pepper inhibits decay due to microorganisms and prevents food from spoiling. The vast amount of biological information generated by academic and industrial research groups is reflected in a rapidly growing body of scientific literature and expanding data resources. However, the genome, biological pathway, and related disease data are insufficient to explain the health benefits of kimchi because of the varied and heterogeneous data types. Therefore, we have constructed an appropriate semantic data model based on an integrated food knowledge database and analyzed the functional and biological processes associated with kimchi in silico. This complex semantic network of several entities and connections was generalized to answer complex questions, and we demonstrated how specific disease pathways are related to kimchi consumption.