• 제목/요약/키워드: Microarray Data Analysis

검색결과 322건 처리시간 0.034초

Descriptive and Systematic Comparison of Clustering Methods in Microarray Data Analysis

  • Kim, Seo-Young
    • 응용통계연구
    • /
    • 제22권1호
    • /
    • pp.89-106
    • /
    • 2009
  • There have been many new advances in the development of improved clustering methods for microarray data analysis, but traditional clustering methods are still often used in genomic data analysis, which maY be more due to their conceptual simplicity and their broad usability in commercial software packages than to their intrinsic merits. Thus, it is crucial to assess the performance of each existing method through a comprehensive comparative analysis so as to provide informed guidelines on choosing clustering methods. In this study, we investigated existing clustering methods applied to microarray data in various real scenarios. To this end, we focused on how the various methods differ, and why a particular method does not perform well. We applied both internal and external validation methods to the following eight clustering methods using various simulated data sets and real microarray data sets.

The Application of Machine Learning Algorithm In The Analysis of Tissue Microarray; for the Prediction of Clinical Status

  • Cho, Sung-Bum;Kim, Woo-Ho;Kim, Ju-Han
    • 한국생물정보학회:학술대회논문집
    • /
    • 한국생물정보시스템생물학회 2005년도 BIOINFO 2005
    • /
    • pp.366-370
    • /
    • 2005
  • Tissue microarry is one of the high throughput technologies in the post-genomic era. Using tissue microarray, the researchers are able to investigate large amount of gene expressions at the level of DNA, RNA, and protein The important aspect of tissue microarry is its ability to assess a lot of biomarkers which have been used in clinical practice. To manipulate the categorical data of tissue microarray, we applied Bayesian network classifier algorithm. We identified that Bayesian network classifier algorithm could analyze tissue microarray data and integrating prior knowledge about gastric cancer could achieve better performance result. The results showed that relevant integration of prior knowledge promote the prediction accuracy of survival status of the immunohistochemical tissue microarray data of 18 tumor suppressor genes. In conclusion, the application of Bayesian network classifier seemed appropriate for the analysis of the tissue microarray data with clinical information.

  • PDF

A DNA Microarray LIMS System for Integral Genomic Analysis of Multi-Platform Microarrays

  • Cho, Mi-Kyung;Kang, Jason Jong-ho;Park, Hyun-Seok
    • Genomics & Informatics
    • /
    • 제5권2호
    • /
    • pp.83-87
    • /
    • 2007
  • The analysis of DNA microarray data is a rapidly evolving area of bioinformatics, and various types of microarray are emerging as some of the most exciting technologies for use in biological and clinical research. In recent years, microarray technology has been utilized in various applications such as the profiling of mRNAs, assessment of DNA copy number, genotyping, and detection of methylated sequences. However, the analysis of these heterogeneous microarray platform experiments does not need to be performed separately. Rather, these platforms can be co-analyzed in combination, for cross-validation. There are a number of separate laboratory information management systems (LIMS) that individually address some of the needs for each platform. However, to our knowledge there are no unified LIMS systems capable of organizing all of the information regarding multi-platform microarray experiments, while additionally integrating this information with tools to perform the analysis. In order to address these requirements, we developed a web-based LIMS system that provides an integrated framework for storing and analyzing microarray information generated by the various platforms. This system enables an easy integration of modules that transform, analyze and/or visualize multi-platform microarray data.

마이크로어레이 데이터 공유 시스템 (Microarray Data Sharing System)

  • 윤지희;홍동완;이종근
    • 한국콘텐츠학회논문지
    • /
    • 제9권8호
    • /
    • pp.18-31
    • /
    • 2009
  • 최근, 마이크로어레이 실험 데이터의 품질과 재생산성에 대한 신뢰도가 증가하고 있어 마이크로어레이 데이터의 공유 및 활용에 대한 요구가 급속히 증가하고 있다. 그러나 공개되어 있는 국내, 외 마이크로어레이 데이터는 실험 방식, 플랫폼 등에 따라 서로 다른 데이터 항목과 포맷을 가지므로 데이터의 실제적 접근 및 활용이 어려운 상황이다. 본 논문에서는 실험 플랫폼, 데이터 포맷, 정규화 기법, 분석 방식 등이 서로 다른 기존의 마이크로어레이 데이터를 효율적으로 검색, 공유, 통합할 수 있는 마이크로어레이 데이터 공유 시스템을 제안한다. 제안된 시스템은 웹 서비스 기반 기술을 이용하여 분산된 마이크로어레이 데이터를 통합하며, 각 사이트의 사용자는 UDDI를 통하여 검색한 데이터를 표준 MGED 기반의 공통 데이터 구조로 자동 변환하여 다운 받을 수 있다. 정의된 공통 데이터 구조는 IDF,ADF,SDRF,EDF로 구성되어 다양한 구조의 마이크로어레이를 통합할 수 있는 템플릿 역할을 수행하며, MAGE-ML, MAGE-TAB, XML Schema 문서로 저장할 수 있다. 또한 제안된 시스템의 자동 데이터 제출기, 파일 관리자 등은 마이크로어레이 데이터 공유를 위한 다양한 부가 기능을 제공한다.

TMA-OM(Tissue Microarray Object Model)과 주요 유전체 정보 통합

  • 김주한
    • 한국생물정보학회:학술대회논문집
    • /
    • 한국생물정보시스템생물학회 2006년도 Principles and Practice of Microarray for Biomedical Researchers
    • /
    • pp.30-36
    • /
    • 2006
  • Tissue microarray (TMA) is an array-based technology allowing the examination of hundreds of tissue samples on a single slide. To handle, exchange, and disseminate TMA data, we need standard representations of the methods used, of the data generated, and of the clinical and histopathological information related to TMA data analysis. This study aims to create a comprehensive data model with flexibility that supports diverse experimental designs and with expressivity and extensibility that enables an adequate and comprehensive description of new clinical and histopathological data elements. We designed a Tissue Microarray Object Model (TMA-OM). Both the Array Information and the Experimental Procedure models are created by referring to Microarray Gene Expression Object Model, Minimum Information Specification For In Situ Hybridization and Immunohistochemistry Experiments (MISFISHIE), and the TMA Data Exchange Specifications (TMA DES). The Clinical and Histopathological Information model is created by using CAP Cancer Protocols and National Cancer Institute Common Data Elements (NCI CDEs). MGED Ontology, UMLS and the terms extracted from CAP Cancer Protocols and NCI CDEs are used to create a controlled vocabulary for unambiguous annotation. We implemented a web-based application for TMA-OM, supporting data export in XML format conforming to the TMA DES or the DTD derived from TMA-OM. TMA-OM provides a comprehensive data model for storage, analysis and exchange of TMA data and facilitates model-level integration of other biological models.

  • PDF

Normalization of Microarray Data: Single-labeled and Dual-labeled Arrays

  • Do, Jin Hwan;Choi, Dong-Kug
    • Molecules and Cells
    • /
    • 제22권3호
    • /
    • pp.254-261
    • /
    • 2006
  • DNA microarray is a powerful tool for high-throughput analysis of biological systems. Various computational tools have been created to facilitate the analysis of the large volume of data produced in DNA microarray experiments. Normalization is a critical step for obtaining data that are reliable and usable for subsequent analysis such as identification of differentially expressed genes and clustering. A variety of normalization methods have been proposed over the past few years, but no methods are still perfect. Various assumptions are often taken in the process of normalization. Therefore, the knowledge of underlying assumption and principle of normalization would be helpful for the correct analysis of microarray data. We present a review of normalization techniques from single-labeled platforms such as the Affymetrix GeneChip array to dual-labeled platforms like spotted array focusing on their principles and assumptions.

Clustering Approaches to Identifying Gene Expression Patterns from DNA Microarray Data

  • Do, Jin Hwan;Choi, Dong-Kug
    • Molecules and Cells
    • /
    • 제25권2호
    • /
    • pp.279-288
    • /
    • 2008
  • The analysis of microarray data is essential for large amounts of gene expression data. In this review we focus on clustering techniques. The biological rationale for this approach is the fact that many co-expressed genes are co-regulated, and identifying co-expressed genes could aid in functional annotation of novel genes, de novo identification of transcription factor binding sites and elucidation of complex biological pathways. Co-expressed genes are usually identified in microarray experiments by clustering techniques. There are many such methods, and the results obtained even for the same datasets may vary considerably depending on the algorithms and metrics for dissimilarity measures used, as well as on user-selectable parameters such as desired number of clusters and initial values. Therefore, biologists who want to interpret microarray data should be aware of the weakness and strengths of the clustering methods used. In this review, we survey the basic principles of clustering of DNA microarray data from crisp clustering algorithms such as hierarchical clustering, K-means and self-organizing maps, to complex clustering algorithms like fuzzy clustering.

Comparison of methods for the proportion of true null hypotheses in microarray studies

  • Kang, Joonsung
    • Communications for Statistical Applications and Methods
    • /
    • 제27권1호
    • /
    • pp.141-148
    • /
    • 2020
  • We consider estimating the proportion of true null hypotheses in multiple testing problems. A traditional multiple testing rate, family-wise error rate is too conservative and old to control type I error in multiple testing setups; however, false discovery rate (FDR) has received significant attention in many research areas such as GWAS data, FMRI data, and signal processing. Identify differentially expressed genes in microarray studies involves estimating the proportion of true null hypotheses in FDR procedures. However, we need to account for unknown dependence structures among genes in microarray data in order to estimate the proportion of true null hypothesis since the genuine dependence structure of microarray data is unknown. We compare various procedures in simulation data and real microarray data. We consider a hidden Markov model for simulated data with dependency. Cai procedure (2007) and a sliding linear model procedure (2011) have a relatively smaller bias and standard errors, being more proper for estimating the proportion of true null hypotheses in simulated data under various setups. Real data analysis shows that 5 estimation procedures among 9 procedures have almost similar values of the estimated proportion of true null hypotheses in microarray data.

A Comparative Study of Microarray Data with Survival Times Based on Several Missing Mechanism

  • Kim Jee-Yun;Hwang Jin-Soo;Kim Seong-Sun
    • Communications for Statistical Applications and Methods
    • /
    • 제13권1호
    • /
    • pp.101-111
    • /
    • 2006
  • One of the most widely used method of handling missingness in microarray data is the kNN(k Nearest Neighborhood) method. Recently Li and Gui (2004) suggested, so called PCR(Partial Cox Regression) method which deals with censored survival times and microarray data efficiently via kNN imputation method. In this article, we try to show that the way to treat missingness eventually affects the further statistical analysis.

UML을 활용한 마이크로어레이 정보시스템의 객체지향분석 (Application of UML (Unified Modeling Language) in Object-oriented Analysis of Microarray Information System)

  • Park, Ji-Yeon;Chung, Hee-Joon;Kim, Ju-Han
    • 한국생물정보학회:학술대회논문집
    • /
    • 한국생물정보시스템생물학회 2003년도 제2차 연례학술대회 발표논문집
    • /
    • pp.147-154
    • /
    • 2003
  • Microarray information system is a complex system to manage, analyze and interpretate microarray gene expression data. Establishment of well-defined development process is very essential for understanding the complexity and organization of the system. We performed object-oriented analysis using Unified Modeling Language (UML) in specifying, visualizing and documenting microarray information system. The object-oriented analysis consists of three major steps: (i) use case modeling to describe various functionalities from the user's perspective (ii) dynamic modeling to illustrate behavioral aspects of the system (iii) object modeling to represent structural aspects of the system. As a result of our modeling activities we provide the UML diagrams showing various views of the microarray information system. We believe that the object-oriented analysis ensures effective documentations and communication of information system requirements. Another useful feature of object-oriented technique is structural continuity to standard microarray data model MAGE-OM (Microarray Gene Expression Object Model). The proposed modeling e(forts can be applicable for integration of biomedical information system.

  • PDF