• Title/Summary/Keyword: Bioinformatics data

Search Result 646, Processing Time 0.027 seconds

Genomic and Proteomic Databases: Foundations, Current Status and Future Applications

  • Navathe, Shamkant B.;Patil, Upen;Guan, Wei
    • Journal of Computing Science and Engineering
    • /
    • v.1 no.1
    • /
    • pp.1-30
    • /
    • 2007
  • In this paper we have provided an extensive survey of the databases and other resources related to the current research in bioinformatics and the issues that confront the database researcher in helping the biologists. Initially we give an overview of the concepts and principles that are fundamental in understanding the basis of the data that has been captured in these databases. We briefly trace the evolution of biological advances and point out the importance of capturing data about genes, the fundamental building blocks that encode the characteristics of life and proteins that are the essential ingredients for sustaining life. The study of genes and proteins is becoming extremely important and is being known as genomics and proteomics, respectively. Whereas there are numerous databases related to various subfields of biology, we have maintained a focus on genomic and proteomic databases which are the crucial stepping stones for other fields and are expected to play an important role in the future applications of biology and medicine. A detailed listing of these databases with information about their sizes, formats and current status is presented. Related databases like molecular pathways and interconnection network databases are mentioned, but their full coverage would be beyond the scope of a single paper. We comment on the peculiar nature of the data in biology that presents special problems in organizing and accessing these databases. We also discuss the capabilities needed for database development and information management in the bioinformatics arena with particular attention to ontology development. Two research case studies based on our own research are summarized dealing with the development of a new genome database called Mitomap and the creation of a framework for discovery of relationships among genes from the biomedical literature. The paper concludes with an overview of the applications that will be driven from these databases in medicine and healthcare. A glossary of important terms is provided at the end of the paper.

Application of Bioinformatics for the Functional Genomics Analysis of Prostate Cancer Therapy

  • Mousses, Spyro
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2000.11a
    • /
    • pp.74-82
    • /
    • 2000
  • Prostate cancer initially responds and regresses in response to androgen depletion therapy, but most human prostate cancers will eventually recur, and re-grow as an androgen independent tumor. Once these tumors become hormone refractory, they usually are incurable leading to death for the patient. Little is known about the molecular details of how prostate cancer cells regress following androgen ablation and which genes are involved in the androgen independent growth following the development of resistance to therapy. Such knowledge would reveal putative drug targets useful in the rational therapeutic design to prevent therapy resistance and control androgen independent growth. The application of genome scale technologies have permitted new insights into the molecular mechanisms associated with these processes. Specifically, we have applied functional genomics using high density cDNA microarray analysis for parallel gene expression analysis of prostate cancer in an experimental xenograft system during androgen withdrawal therapy, and following therapy resistance, The large amount of expression data generated posed a formidable bioinformatics challenge. A novel template based gene clustering algorithm was developed and applied to the data to discover the genes that respond to androgen ablation. The data show restoration of expression of androgen dependent genes in the recurrent tumors and other signaling genes. Together, the discovered genes appear to be involved in prostate cancer cell growth and therapy resistance in this system. We have also developed and applied tissue microarray (TMA) technology for high throughput molecular analysis of hundreds to thousands of clinical specimens simultaneously. TMA analysis was used for rapid clinical translation of candidate genes discovered by cDNA microarray analysis to determine their clinical utility as diagnostic, prognostic, and therapeutic targets. Finally, we have developed a bioinformatic approach to combine pharmacogenomic data on the efficacy and specificity of various drugs to target the discovered prostate cancer growth associated candidate genes in an attempt to improve current therapeutics.

  • PDF

Correlations Between the Incidence of National Notifiable Infectious Diseases and Public Open Data, Including Meteorological Factors and Medical Facility Resources

  • Jang, Jin-Hwa;Lee, Ji-Hae;Je, Mi-Kyung;Cho, Myeong-Ji;Bae, Young Mee;Son, Hyeon Seok;Ahn, Insung
    • Journal of Preventive Medicine and Public Health
    • /
    • v.48 no.4
    • /
    • pp.203-215
    • /
    • 2015
  • Objectives: This study was performed to investigate the relationship between the incidence of national notifiable infectious diseases (NNIDs) and meteorological factors, air pollution levels, and hospital resources in Korea. Methods: We collected and stored 660 000 pieces of publicly available data associated with infectious diseases from public data portals and the Diseases Web Statistics System of Korea. We analyzed correlations between the monthly incidence of these diseases and monthly average temperatures and monthly average relative humidity, as well as vaccination rates, number of hospitals, and number of hospital beds by district in Seoul. Results: Of the 34 NNIDs, malaria showed the most significant correlation with temperature (r=0.949, p<0.01) and concentration of nitrogen dioxide (r=-0.884, p<0.01). We also found a strong correlation between the incidence of NNIDs and the number of hospital beds in 25 districts in Seoul (r=0.606, p<0.01). In particular, Geumcheon-gu was found to have the lowest incidence rate of NNIDs and the highest number of hospital beds per patient. Conclusions: In this study, we conducted a correlational analysis of public data from Korean government portals that can be used as parameters to forecast the spread of outbreaks.

Plant Biotechnology and Bioinformatics (식물 생명공학과 생물정보학)

  • Kim, Jung-Eun;Paik, Hyo-Jung;Kim, Young-Cheol;Hur, Cheol-Goo
    • Journal of Plant Biotechnology
    • /
    • v.33 no.3
    • /
    • pp.209-222
    • /
    • 2006
  • The whole genome sequence was completed in arabidopsis and rice. Large amounts of EST data have been available from many other plants. Also, vast quantities of diverse biological data have been generated by various '-omics' technologies such as transcriptomics, proteomics, and metabolomics. Bioinformatics plays an essential role in extracting useful information from these tremendous amounts of biological data. In this review we introduced experimental methods to generate massive data, applications to plant science such as plant disease resistance and molecular breeding and bioinformatics tools and web sites available in plant biotechnology R&D. We concluded that new experimental methods and bioinfomation analysis techniques have made major contributions to the development of plant biotechnology and that bioinformatics has become a critical factor in plant biotechnology R&D.

The BIOWAY System: A Data Warehouse for Generalized Representation & Visualization of Bio-Pathways

  • Kim, Min Kyung;Seo, Young Joo;Lee, Sang Ho;Song, Eun Ha;Lee, Ho Il;Ahn, Chang Shin;Choi, Eun Chung;Park, Hyun Seok
    • Genomics & Informatics
    • /
    • v.2 no.4
    • /
    • pp.191-194
    • /
    • 2004
  • Exponentially increasing biopathway data in recent years provide us with means to elucidate the large-scale modular organization of the cell. Given the existing information on metabolic and regulatory networks, inferring biopathway information through scientific reasoning or data mining of large scale array data or proteomics data get great attention. Naturally, there is a need for a user-friendly system allowing the user to combine large and diverse pathway data sets from different resources. We built a data warehouse - BIOWAY - for analyzing and visualizing biological pathways, by integrating and customizing resources. We have collected many different types of data in regards to pathway information, including metabolic pathway data from KEGG/LIGAND, signaling pathway data from BIND, and protein information data from SWISS-PROT. In addition to providing general data retrieval mechanism, a successful user interface should provide convenient visualization mechanism since biological pathway data is difficult to conceptualize without graphical representations. Still, the visual interface in the previous systems, at best, uses static images only for the specific categorized pathways. Thus, it is difficult to cope with more complex pathways. In the BIOWAY system, all the pathway data can be displayed in computer generated graphical networks, rather than manually drawn image data. Furthermore, it is designed in such a way that all the pathway maps can be expanded or shrinked, by introducing the concept of super node. A subtle graphic layout algorithm has been applied to best display the pathway data.

Parameters Involved in Autophosphorylation in Chronic Myeloid Leukemia: a Systems Biology Approach

  • Kumar, Himansu;Tichkule, Swapnil;Raj, Utkarsh;Gupta, Saurabh;Srivastava, Swati;Varadwaj, Pritish Kumar
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.16 no.13
    • /
    • pp.5273-5278
    • /
    • 2015
  • Background: Chronic myeloid leukemia (CML) is a stem cell disorder characterized by the fusion of two oncogenes namely BCR and ABL with their aberrant expression. Autophosphorylation of BCR-ABL oncogenes results in proliferation of CML. The study deals with estimation of rate constant involved in each step of the cellular autophosphorylation process, which are consequently playing important roles in the proliferation of cancerous cells. Materials and Methods: A mathematical model was proposed for autophosphorylation of BCR-ABL oncogenes utilizing ordinary differential equations to enumerate the rate of change of each responsible system component. The major difficulty to model this process is the lack of experimental data, which are needed to estimate unknown model parameters. Initial concentration data of each substrate and product for BCR-ABL systems were collected from the reported literature. All parameters were optimized through time interval simulation using the fminsearch algorithm. Results: The rate of change versus time was estimated to indicate the role of each state variable that are crucial for the systems. The time wise change in concentration of substrate shows the convergence of each parameter in autophosphorylation process. Conclusions: The role of each constituent parameter and their relative time dependent variations in autophosphorylation process could be inferred.

HisCoM-PAGE: software for hierarchical structural component models for pathway analysis of gene expression data

  • Mok, Lydia;Park, Taesung
    • Genomics & Informatics
    • /
    • v.17 no.4
    • /
    • pp.45.1-45.3
    • /
    • 2019
  • To identify pathways associated with survival phenotypes using gene expression data, we recently proposed the hierarchical structural component model for pathway analysis of gene expression data (HisCoM-PAGE) method. The HisCoM-PAGE software can consider hierarchical structural relationships between genes and pathways and analyze multiple pathways simultaneously. It can be applied to various types of gene expression data, such as microarray data or RNA sequencing data. We expect that the HisCoM-PAGE software will make our method more easily accessible to researchers who want to perform pathway analysis for survival times.

A Comparison Study of Classification Algorithms in Data Mining

  • Lee, Seung-Joo;Jun, Sung-Rae
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.8 no.1
    • /
    • pp.1-5
    • /
    • 2008
  • Generally the analytical tools of data mining have two learning types which are supervised and unsupervised learning algorithms. Classification and prediction are main analysis tools for supervised learning. In this paper, we perform a comparison study of classification algorithms in data mining. We make comparative studies between popular classification algorithms which are LDA, QDA, kernel method, K-nearest neighbor, naive Bayesian, SVM, and CART. Also, we use almost all classification data sets of UCI machine learning repository for our experiments. According to our results, we are able to select proper algorithms for given classification data sets.

Cancer Genomics Object Model: An Object Model for Cancer Research Using Microarray

  • Park, Yu-Rang;Lee, Hye-Won;Cho, Sung-Bum;Kim, Ju-Han
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2005.09a
    • /
    • pp.29-34
    • /
    • 2005
  • DNA microarray becomes a major tool for the investigation of global gene expression in all aspects of cancer and biomedical research. DNA microarray experiment generates enormous amounts of data and they are meaningful only in the context of a detailed description of microarrays, biomaterials, and conditions under which they were generated. MicroArray Gene Expression Data (MGED) society has established microarray standard for structured management of these diverse and large amount data. MGED MAGE-OM (MicroArray Gene Expression Object Model) is an object oriented data model, which attempts to define standard objects for gene expression. To assess the relevance of DNA microarray analysis of cancer research it is required to combine clinical and genomics data. MAGE-OM, however, does not have an appropriate structure to describe clinical information of cancer. For systematic integration of gene expression and clinical data, we create a new model, Cancer Genomics Object Model.

  • PDF

A GEOSENSOR FILTER FOR PROCESSING GEOSENSOR QUERIES ON DATA STREAMS

  • Lee, Dong-Gyu;Ryu, Keun-Ho
    • Proceedings of the KSRS Conference
    • /
    • 2008.10a
    • /
    • pp.119-121
    • /
    • 2008
  • Pattern matching is increasingly being employed in various researches as health care service, RFID-based system, facility management, and surveillance. Geosensor filter correlates a data stream to match specific patterns in distribution environments. In this paper, we present a geosensor query language to represent efficiently declarative geosensor query. Geosensor operators are proposed to use for fast query processing in terms of spatial and temporal area in distribution environments. We also propose a geosensor filter to match new query predicates into incoming stream predicates. Our filter can reduce the volume of transmission data and save power consumption of sensors. It can be utilized the stream data mining system to process in real-time various data as location, time, and geosensor information in distribution environments.

  • PDF