• Title/Summary/Keyword: Microarray Data

Search Result 471, Processing Time 0.021 seconds

Bayesian Variable Selection in the Proportional Hazard Model

  • Lee, Kyeong-Eun
    • Journal of the Korean Data and Information Science Society
    • /
    • v.15 no.3
    • /
    • pp.605-616
    • /
    • 2004
  • In this paper we consider the proportional hazard models for survival analysis in the microarray data. For a given vector of response values and gene expressions (covariates), we address the issue of how to reduce the dimension by selecting the significant genes. In our approach, rather than fixing the number of selected genes, we will assign a prior distribution to this number. To implement our methodology, we use a Markov Chain Monte Carlo (MCMC) method.

  • PDF

Development of Clustering Algorithm and Tool for DNA Microarray Data (DNA 마이크로어레이 데이타의 클러스터링 알고리즘 및 도구 개발)

  • 여상수;김성권
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.30 no.10
    • /
    • pp.544-555
    • /
    • 2003
  • Since the result data from DNA microarray experiments contain a lot of gene expression information, adequate analysis methods are required. Hierarchical clustering is widely used for analysis of gene expression profiles. In this paper, we study leaf-ordering, which is a post-processing for the dendrograms output by hierarchical clusterings to improve the efficiency of DNA microarray data analysis. At first, we analyze existing leaf-ordering algorithms and then present new approaches for leaf-ordering. And we introduce a software HCLO(Hierarchical Clustering & Leaf-Ordering Tool) that is our implementation of hierarchical clustering, some of existing leaf-ordering algorithms and those presented in this paper.

Classification of Ovarian Cancer Microarray Data based on Intelligent Systems with Marker gene (선별 시스템 기반 표지 유전자를 포함한 난소암 마이크로어레이 데이터 분류)

  • Park, Su-Young;Jung, Chai-Yeoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.15 no.3
    • /
    • pp.747-752
    • /
    • 2011
  • Microarray classification typically possesses two striking attributes: (1) classifier design and error estimation are based on remarkably small samples and (2) cross-validation error estimation is employed in the majority of the papers. A Microarray data of ovarian cancer consists of the expressions of thens of thousands of genes, and there is no systematic procedure to analyze this information instantaneously. In this paper, gene markers are selected by ranking genes according to statistics, popular classification rules - linear discriminant analysis, k-nearest-neighbor and decision trees - has been performed comparing classification accuracy of data selecting gene markers and not selecting gene markers. The Result that apply linear classification analysis at Microarray data set including marker gene that are selected using ANOVA method represent the highest classification accuracy of 97.78% and the lowest prediction error estimate.

A Bayesian Validation Method for Classification of Microarray Expression Data (마이크로어레이 발현 데이터 분류를 위한 베이지안 검증 기법)

  • Park, Su-Young;Jung, Jong-Pil;Jung, Chai-Yeoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.10 no.11
    • /
    • pp.2039-2044
    • /
    • 2006
  • Since the bio-information now even exceeds the capability of human brain, the techniques of data mining and artificial intelligent are needed to deal with the information in this field. There are many researches about using DNA microarray technique which can obtain information from thousands of genes at once, for developing new methods of analyzing and predicting of diseases. Discovering the mechanisms of unknown genes by using these new method is expecting to develop the new drugs and new curing methods. In this Paper, We tested accuracy on classification of microarray in Bayesian method to compare normalization method's Performance after dividing data in two class that is a feature abstraction method through a normalization process which reduce or remove noise generating in microarray experiment by various factors. And We represented that it improve classification performance in 95.89% after Lowess normalization.

Cluster Analysis of Incomplete Microarray Data with Fuzzy Clustering

  • Kim, Dae-Won
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.17 no.3
    • /
    • pp.397-402
    • /
    • 2007
  • In this paper, we present a method for clustering incomplete Microarray data using alternating optimization in which a prior imputation method is not required. To reduce the influence of imputation in preprocessing, we take an alternative optimization approach to find better estimates during iterative clustering process. This method improves the estimates of missing values by exploiting the cluster Information such as cluster centroids and all available non-missing values in each iteration. The clustering results of the proposed method are more significantly relevant to the biological gene annotations than those of other methods, indicating its effectiveness and potential for clustering incomplete gene expression data.

Supervised Model for Identifying Differentially Expressed Genes in DNA Microarray Gene Expression Dataset Using Biological Pathway Information

  • Chung, Tae Su;Kim, Keewon;Kim, Ju Han
    • Genomics & Informatics
    • /
    • v.3 no.1
    • /
    • pp.30-34
    • /
    • 2005
  • Microarray technology makes it possible to measure the expressions of tens of thousands of genes simultaneously under various experimental conditions. Identifying differentially expressed genes in each single experimental condition is one of the most common first steps in microarray gene expression data analysis. Reasonable choices of thresholds for determining differentially expressed genes are used for the next-stap-analysis with suitable statistical significances. We present a supervised model for identifying DEGs using pathway information based on the global connectivity structure. Pathway information can be regarded as a collection of biological knowledge, thus we are trying to determine the optimal threshold so that the consequential connectivity structure can be the most compatible with the existing pathway information. The significant feature of our model is that it uses established knowledge as a reference to determine the direction of analyzing microarray dataset. In the most of previous work, only intrinsic information in the miroarray is used for the identifying DEGs. We hope that our proposed method could contribute to construct biologically meaningful structure from microarray datasets.

Cross Platform Data Analysis in Microarray Experiment (서로 다른 플랫폼의 마이크로어레이 연구 통합 분석)

  • Lee, Jangmee;Lee, Sunho
    • The Korean Journal of Applied Statistics
    • /
    • v.26 no.2
    • /
    • pp.307-319
    • /
    • 2013
  • With the rapid accumulation of microarray data, it is a significant challenge to integrate available data sets addressing the same biological questions that can provide more samples and better experimental results. Sometimes, different microarray platforms make it difficult to effectively integrate data from several studies and there is no consensus on which method is the best to produce a single and unified data set. Methods using median rank score, quantile discretization and standardization (which directly combine rescaled gene expression values) and meta-analysis (which combine the results of individual studies at the interpretative level) are reviewed. Real data examples downloaded from GEO are used to compare the performance of these methods and to evaluate if the combined data set detects more reliable information from the separated data sets or not.

Learning Graphical Models for DNA Chip Data Mining

  • Zhang, Byoung-Tak
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2000.11a
    • /
    • pp.59-60
    • /
    • 2000
  • The past few years have seen a dramatic increase in gene expression data on the basis of DNA microarrays or DNA chips. Going beyond a generic view on the genome, microarray data are able to distinguish between gene populations in different tissues of the same organism and in different states of cells belonging to the same tissue. This affords a cell-wide view of the metabolic and regulatory processes under different conditions, building an effective basis for new diagnoses and therapies of diseases. In this talk we present machine learning techniques for effective mining of DNA microarray data. A brief introduction to the research field of machine learning from the computer science and artificial intelligence point of view is followed by a review of recently-developed learning algorithms applied to the analysis of DNA chip gene expression data. Emphasis is put on graphical models, such as Bayesian networks, latent variable models, and generative topographic mapping. Finally, we report on our own results of applying these learning methods to two important problems: the identification of cell cycle-regulated genes and the discovery of cancer classes by gene expression monitoring. The data sets are provided by the competition CAMDA-2000, the Critical Assessment of Techniques for Microarray Data Mining.

  • PDF

Transcriptional profiles of rock bream iridovirus (RBIV) using microarray approaches

  • Myung-Hwa, Jung;Jun-Young, Song;Sung-Ju, Jung
    • Journal of fish pathology
    • /
    • v.35 no.2
    • /
    • pp.141-155
    • /
    • 2022
  • Rock bream iridovirus (RBIV) causes high mortality and economic losses in the rock bream (Oplegnathus fasciatus) aquaculture industry in Korea. Viral open reading frames (ORFs) expression profiling at different RBIV infection stages was investigated using microarray approaches. Rock bream were exposed to the virus and held for 7 days at 23 ℃ before the water temperature was reduced to 17 ℃. Herein, 28% mortality was observed from 24 to 35 days post infection (dpi), after which no mortality was observed until 70 dpi (end of the experiment). A total of 27 ORFs were significantly up- or down-regulated after RBIV infection. In RBIV-infected rock bream, four viral genes were expressed after 2 dpi. Most RBIV ORFs (26 genes, 96.2%) were significantly elevated between 7 and 20 dpi. Among them, 12 ORF (44.4%) transcripts reached their peak expression intensity at 15 dpi, and 14 ORFs (51.8%) were at peak expression intensity at 20 dpi. Expression levels began to decrease after 25 dpi, and 92.6% of ORFs (25 genes) were expressed below 1-fold at 70 dpi. From the microarray data, in addition to the viral infection, viral gene expression profiles were categorized into three infection stages, namely, early (2 dpi), middle (7 to 20 dpi), and recovery (25 and 70 dpi). RBIV ORFs 009R, 023R, 032L, 049L, and 056L were remarkably expressed during RBIV infection. Furthermore, six ORFs (001L, 013R, 052L, 053L, 058L, and 061L) were significantly expressed only at 20 dpi. To verify the cDNA microarray data, we performed quantitative real-time PCR, and the results were similar to that of the microarray. Our results provide novel observations on broader RBIV gene expression at different stages of infection and the development of control strategies against RBIV infection.