• 제목/요약/키워드: Gene expression data

Search Result 1,311, Processing Time 0.027 seconds

Statistical bioinformatics for gene expression data

  • Lee, Jae-K.
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2001.08a
    • /
    • pp.103-127
    • /
    • 2001
  • Gene expression studies require statistical experimental designs and validation before laboratory confirmation. Various clustering approaches, such as hierarchical, Kmeans, SOM are commonly used for unsupervised learning in gene expression data. Several classification methods, such as gene voting, SVM, or discriminant analysis are used for supervised lerning, where well-defined response classification is possible. Estimating gene-condition interaction effects require advanced, computationally-intensive statistical approaches.

  • PDF

Finding associations between genes by time-series microarray sequential patterns analysis

  • Nam, Ho-Jung;Lee, Do-Heon
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2005.09a
    • /
    • pp.161-164
    • /
    • 2005
  • Data mining techniques can be applied to identify patterns of interest in the gene expression data. One goal in mining gene expression data is to determine how the expression of any particular gene might affect the expression of other genes. To find relationships between different genes, association rules have been applied to gene expression data set [1]. A notable limitation of association rule mining method is that only the association in a single profile experiment can be detected. It cannot be used to find rules across different condition profiles or different time point profile experiments. However, with the appearance of time-series microarray data, it became possible to analyze the temporal relationship between genes. In this paper, we analyze the time-series microarray gene expression data to extract the sequential patterns which are similar to the association rules between genes among different time points in the yeast cell cycle. The sequential patterns found in our work can catch the associations between different genes which express or repress at diverse time points. We have applied sequential pattern mining method to time-series microarray gene expression data and discovered a number of sequential patterns from two groups of genes (test, control) and more sequential patterns have been discovered from test group (same CO term group) than from the control group (different GO term group). This result can be a support for the potential of sequential patterns which is capable of catching the biologically meaningful association between genes.

  • PDF

HisCoM-PAGE: software for hierarchical structural component models for pathway analysis of gene expression data

  • Mok, Lydia;Park, Taesung
    • Genomics & Informatics
    • /
    • v.17 no.4
    • /
    • pp.45.1-45.3
    • /
    • 2019
  • To identify pathways associated with survival phenotypes using gene expression data, we recently proposed the hierarchical structural component model for pathway analysis of gene expression data (HisCoM-PAGE) method. The HisCoM-PAGE software can consider hierarchical structural relationships between genes and pathways and analyze multiple pathways simultaneously. It can be applied to various types of gene expression data, such as microarray data or RNA sequencing data. We expect that the HisCoM-PAGE software will make our method more easily accessible to researchers who want to perform pathway analysis for survival times.

Cancer Genomics Object Model: An Object Model for Cancer Research Using Microarray

  • Park, Yu-Rang;Lee, Hye-Won;Cho, Sung-Bum;Kim, Ju-Han
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2005.09a
    • /
    • pp.29-34
    • /
    • 2005
  • DNA microarray becomes a major tool for the investigation of global gene expression in all aspects of cancer and biomedical research. DNA microarray experiment generates enormous amounts of data and they are meaningful only in the context of a detailed description of microarrays, biomaterials, and conditions under which they were generated. MicroArray Gene Expression Data (MGED) society has established microarray standard for structured management of these diverse and large amount data. MGED MAGE-OM (MicroArray Gene Expression Object Model) is an object oriented data model, which attempts to define standard objects for gene expression. To assess the relevance of DNA microarray analysis of cancer research it is required to combine clinical and genomics data. MAGE-OM, however, does not have an appropriate structure to describe clinical information of cancer. For systematic integration of gene expression and clinical data, we create a new model, Cancer Genomics Object Model.

  • PDF

Classification of Gene Data Using Membership Function and Neural Network (소속 함수와 유전자 정보의 신경망을 이용한 유전자 타입의 분류)

  • Yeom, Hae-Young;Kim, Jae-Hyup;Moon, Young-Shik
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.42 no.4 s.304
    • /
    • pp.33-42
    • /
    • 2005
  • This paper proposes a classification method for gene expression data, using membership function and neural network. The gene expression is a process to produce mRNA and protains which generate a living body, and the gene expression data is important to find out the functions and correlations of genes. Such gene expression data can be obtained from DNA 칩 massively and quickly. However, thousands of gene expression data may not be useful until it is well organized. Therefore a classification method is necessary to find the characteristics of gene data acquired from the gene expression. In the proposed method, a set of gene data is extracted according to the fisher's criterion, because we assume that selected gene data is the well-classified data sample. However, the selected gene data does not guarantee well-classified data sample and we calculate feature values using membership function to reduce the influence of outliers in gene data. Feature vectors estimated from the selected feature values are used to train back propagation neural network. The experimental results show that the clustering performance of the proposed method has been improved compared to other existing methods in various gene expression data.

Cancer-Subtype Classification Based on Gene Expression Data (유전자 발현 데이터를 이용한 암의 유형 분류 기법)

  • Cho Ji-Hoon;Lee Dongkwon;Lee Min-Young;Lee In-Beum
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.10 no.12
    • /
    • pp.1172-1180
    • /
    • 2004
  • Recently, the gene expression data, product of high-throughput technology, appeared in earnest and the studies related with it (so-called bioinformatics) occupied an important position in the field of biological and medical research. The microarray is a revolutionary technology which enables us to monitor several thousands of genes simultaneously and thus to gain an insight into the phenomena in the human body (e.g. the mechanism of cancer progression) at the molecular level. To obtain useful information from such gene expression measurements, it is essential to analyze the data with appropriate techniques. However the high-dimensionality of the data can bring about some problems such as curse of dimensionality and singularity problem of matrix computation, and hence makes it difficult to apply conventional data analysis methods. Therefore, the development of method which can effectively treat the data becomes a challenging issue in the field of computational biology. This research focuses on the gene selection and classification for cancer subtype discrimination based on gene expression (microarray) data.

Gene Expression Pattern Analysis via Latent Variable Models Coupled with Topographic Clustering

  • Chang, Jeong-Ho;Chi, Sung Wook;Zhang, Byoung Tak
    • Genomics & Informatics
    • /
    • v.1 no.1
    • /
    • pp.32-39
    • /
    • 2003
  • We present a latent variable model-based approach to the analysis of gene expression patterns, coupled with topographic clustering. Aspect model, a latent variable model for dyadic data, is applied to extract latent patterns underlying complex variations of gene expression levels. Then a topographic clustering is performed to find coherent groups of genes, based on the extracted latent patterns as well as individual gene expression behaviors. Applied to cell cycle­regulated genes of the yeast Saccharomyces cerevisiae, the proposed method could discover biologically meaningful patterns related with characteristic expression behavior in particular cell cycle phases. In addition, the display of the variation in the composition of these latent patterns on the cluster map provided more facilitated interpretation of the resulting cluster structure. From this, we argue that latent variable models, coupled with topographic clustering, are a promising tool for explorative analysis of gene expression data.

COEX-Seq: Convert a Variety of Measurements of Gene Expression in RNA-Seq

  • Kim, Sang Cheol;Yu, Donghyeon;Cho, Seong Beom
    • Genomics & Informatics
    • /
    • v.16 no.4
    • /
    • pp.36.1-36.3
    • /
    • 2018
  • Next generation sequencing (NGS), a high-throughput DNA sequencing technology, is widely used for molecular biological studies. In NGS, RNA-sequencing (RNA-Seq), which is a short-read massively parallel sequencing, is a major quantitative transcriptome tool for different transcriptome studies. To utilize the RNA-Seq data, various quantification and analysis methods have been developed to solve specific research goals, including identification of differentially expressed genes and detection of novel transcripts. Because of the accumulation of RNA-Seq data in the public databases, there is a demand for integrative analysis. However, the available RNA-Seq data are stored in different formats such as read count, transcripts per million, and fragments per kilobase million. This hinders the integrative analysis of the RNA-Seq data. To solve this problem, we have developed a web-based application using Shiny, COEX-seq (Convert a Variety of Measurements of Gene Expression in RNA-Seq) that easily converts data in a variety of measurement formats of gene expression used in most bioinformatic tools for RNA-Seq. It provides a workflow that includes loading data set, selecting measurement formats of gene expression, and identifying gene names. COEX-seq is freely available for academic purposes and can be run on Windows, Mac OS, and Linux operating systems. Source code, sample data sets, and supplementary documentation are available as well.

Veri cation of Improving a Clustering Algorith for Microarray Data with Missing Values

  • Kim, Su-Young
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.2
    • /
    • pp.315-321
    • /
    • 2011
  • Gene expression microarray data often include multiple missing values. Most gene expression analysis (including gene clustering analysis); however, require a complete data matric as an input. In ordinary clustering methods, just a single missing value makes one abandon the whole data of a gene even if the rest of data for that gene was intact. The quality of analysis may decrease seriously as the missing rate is increased. In the opposite aspect, the imputation of missing value may result in an artifact that reduces the reliability of the analysis. To clarify this contradiction in microarray clustering analysis, this paper compared the accuracy of clustering with and without imputation over several microarray data having different missing rates. This paper also tested the clustering efficiency of several imputation methods including our propose algorithm. The results showed it is worthwhile to check the clustering result in this alternative way without any imputed data for the imperfect microarray data.

CONSTRUCTING GENE REGULATORY NETWORK USING FREQUENT GENE EXPRESSION PATTERN MINING AND CHAIN RULES

  • Park, Hong-Kyu;Lee, Heon-Gyu;Cho, Kyung-Hwan;Ryu, Keun-Ho
    • Proceedings of the KSRS Conference
    • /
    • v.2
    • /
    • pp.623-626
    • /
    • 2006
  • Group of genes controls the functioning of a cell by complex interactions. These interacting gene groups are called Gene Regulatory Networks (GRNs). Two previous data mining approaches, clustering and classification have been used to analyze gene expression data. While these mining tools are useful for determining membership of genes by homology, they don't identify the regulatory relationships among genes found in the same class of molecular actions. Furthermore, we need to understand the mechanism of how genes relate and how they regulate one another. In order to detect regulatory relationships among genes from time-series Microarray data, we propose a novel approach using frequent pattern mining and chain rule. In this approach, we propose a method for transforming gene expression data to make suitable for frequent pattern mining, and detect gene expression patterns applying FP-growth algorithm. And then, we construct gene regulatory network from frequent gene patterns using chain rule. Finally, we validated our proposed method by showing that our experimental results are consistent with published results.

  • PDF