• Title/Summary/Keyword: Omics data analysis

Search Result 45, Processing Time 0.03 seconds

XPERNATO-TOX: an Integrated Toxicogenomics Knowledgebase

  • Woo Jung-Hoon;Kim Hyeoun-Eui;Kong Gu;Kim Ju-Han
    • Genomics & Informatics
    • /
    • v.4 no.1
    • /
    • pp.40-44
    • /
    • 2006
  • Toxicogenomics combines transcriptome, proteome and metabolome profiling with conventional toxicology to investigate the interaction between biological molecules and toxicant or environmental stress in disease caution. Toxicogenomics faces the problems of comparison and integration across different sources of data. Cause of unusual characteristics of toxicogenomic data, researcher should be assisted by data analysis and annotation for getting meaningful information. There are already existing repositories which claim to stand for toxicogenomics database. However, those just contain limited abilities for toxicogenomic research. For supporting toxicologist who comes up against toxicogenomic data flood, now we propose novel toxicogenomics knowledgebase system, XPERANTO-TOX. XPERANTO-TOX is an integrated system for toxicogenomic data management and analysis. It is composed of three distinct but closely connected parts. Firstly, Data Storage System is for reposit many kinds of '-omics' data and conventional toxicology data. Secondly, Data Analysis System consists of analytical modules for integrated toxicogenomics data. At last, Data Annotation System is for giving extensive insight of data to researcher.

Review of statistical methods for survival analysis using genomic data

  • Lee, Seungyeoun;Lim, Heeju
    • Genomics & Informatics
    • /
    • v.17 no.4
    • /
    • pp.41.1-41.12
    • /
    • 2019
  • Survival analysis mainly deals with the time to event, including death, onset of disease, and bankruptcy. The common characteristic of survival analysis is that it contains "censored" data, in which the time to event cannot be completely observed, but instead represents the lower bound of the time to event. Only the occurrence of either time to event or censoring time is observed. Many traditional statistical methods have been effectively used for analyzing survival data with censored observations. However, with the development of high-throughput technologies for producing "omics" data, more advanced statistical methods, such as regularization, should be required to construct the predictive survival model with high-dimensional genomic data. Furthermore, machine learning approaches have been adapted for survival analysis, to fit nonlinear and complex interaction effects between predictors, and achieve more accurate prediction of individual survival probability. Presently, since most clinicians and medical researchers can easily assess statistical programs for analyzing survival data, a review article is helpful for understanding statistical methods used in survival analysis. We review traditional survival methods and regularization methods, with various penalty functions, for the analysis of high-dimensional genomics, and describe machine learning techniques that have been adapted to survival analysis.

Plant Biotechnology and Bioinformatics (식물 생명공학과 생물정보학)

  • Kim, Jung-Eun;Paik, Hyo-Jung;Kim, Young-Cheol;Hur, Cheol-Goo
    • Journal of Plant Biotechnology
    • /
    • v.33 no.3
    • /
    • pp.209-222
    • /
    • 2006
  • The whole genome sequence was completed in arabidopsis and rice. Large amounts of EST data have been available from many other plants. Also, vast quantities of diverse biological data have been generated by various '-omics' technologies such as transcriptomics, proteomics, and metabolomics. Bioinformatics plays an essential role in extracting useful information from these tremendous amounts of biological data. In this review we introduced experimental methods to generate massive data, applications to plant science such as plant disease resistance and molecular breeding and bioinformatics tools and web sites available in plant biotechnology R&D. We concluded that new experimental methods and bioinfomation analysis techniques have made major contributions to the development of plant biotechnology and that bioinformatics has become a critical factor in plant biotechnology R&D.

QCanvas: An Advanced Tool for Data Clustering and Visualization of Genomics Data

  • Kim, Nayoung;Park, Herin;He, Ningning;Lee, Hyeon Young;Yoon, Sukjoon
    • Genomics & Informatics
    • /
    • v.10 no.4
    • /
    • pp.263-265
    • /
    • 2012
  • We developed a user-friendly, interactive program to simultaneously cluster and visualize omics data, such as DNA and protein array profiles. This program provides diverse algorithms for the hierarchical clustering of two-dimensional data. The clustering results can be interactively visualized and optimized on a heatmap. The present tool does not require any prior knowledge of scripting languages to carry out the data clustering and visualization. Furthermore, the heatmaps allow the selective display of data points satisfying user-defined criteria. For example, a clustered heatmap of experimental values can be differentially visualized based on statistical values, such as p-values. Including diverse menu-based display options, QCanvas provides a convenient graphical user interface for pattern analysis and visualization with high-quality graphics.

Bioinformatics services for analyzing massive genomic datasets

  • Ko, Gunhwan;Kim, Pan-Gyu;Cho, Youngbum;Jeong, Seongmun;Kim, Jae-Yoon;Kim, Kyoung Hyoun;Lee, Ho-Yeon;Han, Jiyeon;Yu, Namhee;Ham, Seokjin;Jang, Insoon;Kang, Byunghee;Shin, Sunguk;Kim, Lian;Lee, Seung-Won;Nam, Dougu;Kim, Jihyun F.;Kim, Namshin;Kim, Seon-Young;Lee, Sanghyuk;Roh, Tae-Young;Lee, Byungwook
    • Genomics & Informatics
    • /
    • v.18 no.1
    • /
    • pp.8.1-8.10
    • /
    • 2020
  • The explosive growth of next-generation sequencing data has resulted in ultra-large-scale datasets and ensuing computational problems. In Korea, the amount of genomic data has been increasing rapidly in the recent years. Leveraging these big data requires researchers to use large-scale computational resources and analysis pipelines. A promising solution for addressing this computational challenge is cloud computing, where CPUs, memory, storage, and programs are accessible in the form of virtual machines. Here, we present a cloud computing-based system, Bio-Express, that provides user-friendly, cost-effective analysis of massive genomic datasets. Bio-Express is loaded with predefined multi-omics data analysis pipelines, which are divided into genome, transcriptome, epigenome, and metagenome pipelines. Users can employ predefined pipelines or create a new pipeline for analyzing their own omics data. We also developed several web-based services for facilitating downstream analysis of genome data. Bio-Express web service is freely available at https://www. bioexpress.re.kr/.

Global Transcriptome-Wide Association Studies (TWAS) Reveal a Gene Regulation Network of Eating and Cooking Quality Traits in Rice

  • Weiguo Zhao;Qiang He;Kyu-Won Kim;Feifei Xu;Thant Zin Maung;Aueangporn Somsri;Min-Young Yoon;Sang-Beom Lee;Seung-Hyun Kim;Joohyun Lee;Soon-Wook Kwon;Gang-Seob Lee;Bhagwat Nawade;Sang-Ho Chu;Wondo Lee;Yoo-Hyun Cho;Chang-Yong Lee;Ill-Min Chung;Jong-Seong Jeon;Yong-Jin Park
    • Proceedings of the Korean Society of Crop Science Conference
    • /
    • 2022.10a
    • /
    • pp.207-207
    • /
    • 2022
  • Eating and cooking quality (ECQ) is one of the most complex quantitative traits in rice. The understanding of genetic regulation of transcript expression levels attributing to phenotypic variation in ECQ traits is limited. We integrated whole-genome resequencing, transcriptome, and phenotypic variation data from 84 Japonica accessions to build a transcriptome-wide association study (TWAS) based regulatory network. All ECQ traits showed a large phenotypic variation and significant phenotypic correlations among the traits. TWAS analysis identified a total of 285 transcripts significantly associated with six ECQ traits. Genome-wide mapping of ECQ-associated transcripts revealed 66,905 quantitative expression traits (eQTLs), including 21,747 local eQTLs, and 45,158 trans-eQTLs, regulating the expression of 43 genes. The starch synthesis-related genes (SSRGs), starch synthase IV-1 (SSIV-1), starch branching enzyme 1 (SBE1), granule-bound starch synthase 2 (GBSS2), and ADP-glucose pyrophosphorylase small subunit 2a (OsAGPS2a) were found to have eQTLs regulating the expression of ECQ associated transcripts. Further, in co-expression analysis, 130 genes produced at least one network with 22 master regulators. In addition, we developed CRISPR/Cas9-edited glbl mutant lines that confirmed the role of alpha-globulin (glbl) in starch synthesis to validate the co-expression analysis. This study provided novel insights into the genetic regulation of ECQ traits, and transcripts associated with these traits were discovered that could be used in further rice breeding.

  • PDF

Applying a modified AUC to gene ranking

  • Yu, Wenbao;Chang, Yuan-Chin Ivan;Park, Eunsik
    • Communications for Statistical Applications and Methods
    • /
    • v.25 no.3
    • /
    • pp.307-319
    • /
    • 2018
  • High-throughput technologies enable the simultaneous evaluation of thousands of genes that could discriminate different subclasses of complex diseases. Ranking genes according to differential expression is an important screening step for follow-up analysis. Many statistical measures have been proposed for this purpose. A good ranked list should provide a stable rank (at least for top-ranked gene), and the top ranked genes should have a high power in differentiating different disease status. However, there is a lack of emphasis in the literature on ranking genes based on these two criteria simultaneously. To achieve the above two criteria simultaneously, we proposed to apply a previously reported metric, the modified area under the receiver operating characteristic cure, to gene ranking. The proposed ranking method is found to be promising in leading to a stable ranking list and good prediction performances of top ranked genes. The findings are illustrated through studies on both synthesized data and real microarray gene expression data. The proposed method is recommended for ranking genes or other biomarkers for high-dimensional omics studies.

Perspectives on Clinical Informatics: Integrating Large-Scale Clinical, Genomic, and Health Information for Clinical Care

  • Choi, In Young;Kim, Tae-Min;Kim, Myung Shin;Mun, Seong K.;Chung, Yeun-Jun
    • Genomics & Informatics
    • /
    • v.11 no.4
    • /
    • pp.186-190
    • /
    • 2013
  • The advances in electronic medical records (EMRs) and bioinformatics (BI) represent two significant trends in healthcare. The widespread adoption of EMR systems and the completion of the Human Genome Project developed the technologies for data acquisition, analysis, and visualization in two different domains. The massive amount of data from both clinical and biology domains is expected to provide personalized, preventive, and predictive healthcare services in the near future. The integrated use of EMR and BI data needs to consider four key informatics areas: data modeling, analytics, standardization, and privacy. Bioclinical data warehouses integrating heterogeneous patient-related clinical or omics data should be considered. The representative standardization effort by the Clinical Bioinformatics Ontology (CBO) aims to provide uniquely identified concepts to include molecular pathology terminologies. Since individual genome data are easily used to predict current and future health status, different safeguards to ensure confidentiality should be considered. In this paper, we focused on the informatics aspects of integrating the EMR community and BI community by identifying opportunities, challenges, and approaches to provide the best possible care service for our patients and the population.

Applications of Metabolic Modeling to Drive Bioprocess Development for the Production of Value-added Chemicals

  • Mahadevan, Radhakrishnan;Burgard, Anthony P.;Famili, Iman;Dien, Steve Van;Schilling, Christophe H.
    • Biotechnology and Bioprocess Engineering:BBE
    • /
    • v.10 no.5
    • /
    • pp.408-417
    • /
    • 2005
  • Increasing numbers of value added chemicals are being produced using microbial fermentation strategies. Computational modeling and simulation of microbial metabolism is rapidly becoming an enabling technology that is driving a new paradigm to accelerate the bioprocess development cycle. In particular, constraint-based modeling and the development of genome-scale models of industrial microbes are finding increasing utility across many phases of the bioprocess development workflow. Herein, we review and discuss the requirements and trends in the industrial application of this technology as we build toward integrated computational/experimental platforms for bioprocess engineering. Specifically we cover the following topics: (1) genome-scale models as genetically and biochemically consistent representations of metabolic networks; (2) the ability of these models to predict, assess, and interpret metabolic physiology and flux states of metabolism; (3) the model-guided integrative analysis of high throughput 'omics' data; (4) the reconciliation and analysis of on- and off-line fermentation data as well as flux tracing data; (5) model-aided strain design strategies and the integration of calculated biotransformation routes; and (6) control and optimization of the fermentation processes. Collectively, constraint-based modeling strategies are impacting the iterative characterization of metabolic flux states throughout the bioprocess development cycle, while also driving metabolic engineering strategies and fermentation optimization.

NEUROD1 Intrinsically Initiates Differentiation of Induced Pluripotent Stem Cells into Neural Progenitor Cells

  • Choi, Won-Young;Hwang, Ji-Hyun;Cho, Ann-Na;Lee, Andrew J.;Jung, Inkyung;Cho, Seung-Woo;Kim, Lark Kyun;Kim, Young-Joon
    • Molecules and Cells
    • /
    • v.43 no.12
    • /
    • pp.1011-1022
    • /
    • 2020
  • Cell type specification is a delicate biological event in which every step is under tight regulation. From a molecular point of view, cell fate commitment begins with chromatin alteration, which kickstarts lineage-determining factors to initiate a series of genes required for cell specification. Several important neuronal differentiation factors have been identified from ectopic over-expression studies. However, there is scarce information on which DNA regions are modified during induced pluripotent stem cell (iPSC) to neuronal progenitor cell (NPC) differentiation, the cis regulatory factors that attach to these accessible regions, or the genes that are initially expressed. In this study, we identified the DNA accessible regions of iPSCs and NPCs via the Assay for Transposase-Accessible Chromatin sequencing (ATAC-seq). We identified which chromatin regions were modified after neuronal differentiation and found that the enhancer regions had more active histone modification changes than the promoters. Through motif enrichment analysis, we found that NEUROD1 controls iPSC differentiation to NPC by binding to the accessible regions of enhancers in cooperation with other factors such as the Hox proteins. Finally, by using Hi-C data, we categorized the genes that directly interacted with the enhancers under the control of NEUROD1 during iPSC to NPC differentiation.