• 제목/요약/키워드: Microarray Data

검색결과 473건 처리시간 0.03초

자기 조직화 지도에 기반한 유전자 발현 데이터의 계층적 군집화 (Hierarchical Clustering of Gene Expression Data Based on Self Organizing Map)

  • Park, Chang-Beom;Lee, Dong-Hwan;Lee, Seong-Whan
    • 한국생물정보학회:학술대회논문집
    • /
    • 한국생물정보시스템생물학회 2003년도 제2차 연례학술대회 발표논문집
    • /
    • pp.170-177
    • /
    • 2003
  • Gene expression data are the quantitative measurements of expression levels and ratios of numberous genes in different situations based on microarray image analysis results. The process to draw meaningful information related to genomic diseases and various biological activities from gene expression data is known as gene expression data analysis. In this paper, we present a hierarchical clustering method of gene expression data based on self organizing map which can analyze the clustering result of gene expression data more efficiently. Using our proposed method, we could eliminate the uncertainty of cluster boundary which is the inherited disadvantage of self organizing map and use the visualization function of hierarchical clustering. And, we could process massive data using fast processing speed of self organizing map and interpret the clustering result of self organizing map more efficiently and user-friendly. To verify the efficiency of our proposed algorithm, we performed tests with following 3 data sets, animal feature data set, yeast gene expression data and leukemia gene expression data set. The result demonstrated the feasibility and utility of the proposed clustering algorithm.

  • PDF

An Efficient Functional Analysis Method for Micro-array Data Using Gene Ontology

  • Hong, Dong-Wan;Lee, Jong-Keun;Park, Sung-Soo;Hong, Sang-Kyoon;Yoon, Jee-Hee
    • Journal of Information Processing Systems
    • /
    • 제3권1호
    • /
    • pp.38-42
    • /
    • 2007
  • Microarray data includes tens of thousands of gene expressions simultaneously, so it can be effectively used in identifying the phenotypes of diseases. However, the retrieval of functional information from a large corpus of gene expression data is still a time-consuming task. In this paper, we propose an efficient method for identifying functional categories of differentially expressed genes from a micro-array experiment by using Gene Ontology (GO). Our method is as follows: (1) The expression data set is first filtered to include only genes with mean expression values that differ by at least 3-fold between the two groups. (2) The genes are then ranked based on the t-statistics. The 100 most highly ranked genes are selected as informative genes. (3) The t-value of each informative gene is imposed as a score on the associated GO terms. High-scoring GO terms are then listed with their associated genes and represent the functional category information of the micro-array experiment. A system called HMDA (Hallym Micro-array Data analysis) is implemented on publicly available micro-array data sets and validated. Our results were also compared with the original analysis.

마이크로어레이 자료에서 서포트벡터머신과 데이터 뎁스를 이용한 분류방법의 비교연구 (A comparison study of classification method based of SVM and data depth in microarray data)

  • 황진수;김지연
    • Journal of the Korean Data and Information Science Society
    • /
    • 제20권2호
    • /
    • pp.311-319
    • /
    • 2009
  • 군집과 분류분석에서 L1 데이터 뎁스를 이용한 DDclust와 DDclass라고 불리는 로버스트한 방법이 Jornsten (2004)에 의하여 제안되었다. SVM-기반방법이 많이 사용되나 이상치가 있는 경우에는 약간의 문제가 있다. 유전자 자료에서는 유전자 수가 많기 때문에 적절한 유전자 선택과정이 필요하다. 따라서 적절한 유전자 또는 유전자 군집을 선택하여 분류에 이용하면 분류의 성능을 향상시킬 수 있다. 이러한 관점에서 뎁스 기반 분류방법과 SVM-기반 분류방법을 비교 연구하여 그 성능을 비교 하였다.

  • PDF

The Sliding Window Gene-Shaving Algorithm for Microarray Data Analysis

  • 이혜선;최대우;전치혁
    • 한국생물정보학회:학술대회논문집
    • /
    • 한국생물정보시스템생물학회 2002년도 제1차워크샵
    • /
    • pp.139-152
    • /
    • 2002
  • Gene-shaving(Hastie et al, 2000) is a very useful method to identify a meaningful group of genes when the variation of expression is large. By shaving off the low-correlated genes with the leading principal component, the primary genes with the coherent expression pattern can be identified. Gene-shaving method works well If expression levels are varied enough, but it may not catch the meaningful cluster in low expression level or different expression time even with coherent patterns. The sliding window gene-shaving method which is to apply gene-shaving in each sliding window after hierarchical clustering is to compensate losing a meaningful set of genes whose variation is not large but distinct. The performance to identify expression patterns is compared for the simulated profile data by the different variance and expression level.

  • PDF

시계열 마이크로어레이 데이터 마이닝을 위한 분별력 있는 유전자 선정 방법 (Selection of Discriminative Genes for Data Mining of Time-series Microarray Data)

  • 이민수;박승수;강성희;박웅양
    • 한국정보과학회:학술대회논문집
    • /
    • 한국정보과학회 2006년도 한국컴퓨터종합학술대회 논문집 Vol.33 No.1 (A)
    • /
    • pp.25-27
    • /
    • 2006
  • 본 논문에서는 시계열 마이크로어레이데이터 마이닝을 위한 전처리 작업으로 시계열 마이크로어레이 데이터에 특징 추출 방법 및 상관관계 분석을 이용하여 분화 과정에 대해 분별력 있는 유전자들을 선정하기 위한 방법을 제안하고, 줄기세포가 신경세포로 분화하는 과정에서 특이적으로 발현되는 유전자들을 찾기 위한 시계열 마이크로어레이 데이터 분석 과정을 하나의 예로 제시한다. 분석 결과, 제안한 방법이 분화 특이적으로 발현되는 분별력 있는 유전자들, 분화 과정에서 공통적으로 발현되는 유전자들, 그리고 경계선에 존재하는 유전자들을 통해서 줄기세포 신경분화의 특징들을 규명하는데 매우 유용함을 보였다.

  • PDF

Applying a modified AUC to gene ranking

  • Yu, Wenbao;Chang, Yuan-Chin Ivan;Park, Eunsik
    • Communications for Statistical Applications and Methods
    • /
    • 제25권3호
    • /
    • pp.307-319
    • /
    • 2018
  • High-throughput technologies enable the simultaneous evaluation of thousands of genes that could discriminate different subclasses of complex diseases. Ranking genes according to differential expression is an important screening step for follow-up analysis. Many statistical measures have been proposed for this purpose. A good ranked list should provide a stable rank (at least for top-ranked gene), and the top ranked genes should have a high power in differentiating different disease status. However, there is a lack of emphasis in the literature on ranking genes based on these two criteria simultaneously. To achieve the above two criteria simultaneously, we proposed to apply a previously reported metric, the modified area under the receiver operating characteristic cure, to gene ranking. The proposed ranking method is found to be promising in leading to a stable ranking list and good prediction performances of top ranked genes. The findings are illustrated through studies on both synthesized data and real microarray gene expression data. The proposed method is recommended for ranking genes or other biomarkers for high-dimensional omics studies.

Principal Component Analysis를 이용한 Gene Selection (Gene Selection using Principal Component Analysis for Molecular classification)

  • 임수홍;손기락;홍성룡
    • 한국정보과학회:학술대회논문집
    • /
    • 한국정보과학회 2005년도 한국컴퓨터종합학술대회 논문집 Vol.32 No.1 (B)
    • /
    • pp.259-261
    • /
    • 2005
  • 수천개의 Gene Expression Measurement를 생성해 내는 DNA Microarray 연구는 조직과 세포의 표본으로부터 진단에 유용한 Gene Expression 정보를 모으게 된다. 이런 종류의 Data를 분석하기 위하여 SVM(Support Vector Machine)을 사용한 새로운 방법이 연구되어왔다. 본 논문에서는 Gene Expression Data에 대한 고유벡터(Eigen Vector)를 이용하여 SVM의 성능을 향상시키고 질병진단에 유용한 Gene을 찾아 내는 알고리즘을 기술한다. 고유벡터를 통하여 Gene을 선택적으로 SVM Learning에 참가 시키고 분류의 결과를 통하여 추가된 Gene이 질병 진단에 미치는 영향력을 알아냄으로써 질병에 대한 Gene 역할을 파악 하는데 활용할 수 있다.

  • PDF

생명정보학과 유전체의학 (Bioinformatics and Genomic Medicine)

  • 김주한
    • Journal of Preventive Medicine and Public Health
    • /
    • 제35권2호
    • /
    • pp.83-91
    • /
    • 2002
  • Bioinformatics is a rapidly emerging field of biomedical research. A flood of large-scale genomic and postgenomic data means that many of the challenges in biomedical research are now challenges in computational sciences. Clinical informatics has long developed methodologies to improve biomedical research and clinical care by integrating experimental and clinical information systems. The informatics revolutions both in bioinformatics and clinical informatics will eventually change the current practice of medicine, including diagnostics, therapeutics, and prognostics. Postgenome informatics, powered by high throughput technologies and genomic-scale databases, is likely to transform our biomedical understanding forever much the same way that biochemistry did a generation ago. The paper describes how these technologies will impact biomedical research and clinical care, emphasizing recent advances in biochip-based functional genomics and proteomics. Basic data preprocessing with normalization, primary pattern analysis, and machine learning algorithms will be presented. Use of integrated biochip informatics technologies, text mining of factual and literature databases, and integrated management of biomolecular databases will be discussed. Each step will be given with real examples in the context of clinical relevance. Issues of linking molecular genotype and clinical phenotype information will be discussed.

부분최소자승법과 주성분분석을 이용한 유전자 선택과 분류 (Gene Selection and Classification by Partial Least Squares and Principal component analysis)

  • Park, Hoseok;Kim, Hey-Jin;Park, Seugj in;Bang, Sung-Yang
    • 한국정보과학회:학술대회논문집
    • /
    • 한국정보과학회 2001년도 가을 학술발표논문집 Vol.28 No.2 (1)
    • /
    • pp.598-600
    • /
    • 2001
  • DNA chip technology enables us to monitor thousands of gene expressions per sample simultaneously. Typically, DNA microarray data has at least several thousands of variables (genes) wish relatively smal1 number of samples. Thus feature (gene) selection by dimensionality reduction is necessary for efficient data analysis. In this paper we employ the partial least squares (PLS) method for gene selection and the principal component analysis (PCA) method for classification. The useful behavior of the PLS is verified by computer simulations.

  • PDF

Knock-out 데이터를 이용한 유전자 조절망의 구성 (Constructing Gene Regulatory Networks using Knock-out Data)

  • 홍성룡;손기락
    • 한국컴퓨터정보학회논문지
    • /
    • 제12권6호
    • /
    • pp.105-113
    • /
    • 2007
  • 유전자 조절망은 유전자의 발현이 다른 유전자에게 영향을 주는 것을 표현하는 유전자 망이다. 오늘날 마이크로 어레이 실험으로부터 유전자의 발현량을 측정한 대용량의 데이터가 이용 가능하다. 전형적인 데이터중의 하나는 특정 유전자를 제거한 후 다른 유전자의 발현량을 측정한 steady-state data이다. 본 논문은 이런 측정 데이터를 이용하여 중복 정보를 최소화하는 유전자 조절망을 재구성하는 방법을 제시한다. 제시한 모델은 기존 연구에서는 고려되지 않았던 사이클 형태로 나타나는 자동 조절 기능을 고려하였고, 또한 유전자의 억제자 또는 촉진자 역할을 고려하였다.

  • PDF