Browse > Article
http://dx.doi.org/10.5391/JKIIS.2009.19.6.802

Improving Clustering Performance Using Gene Ontology  

Ko, Song (중앙대학교 컴퓨터공학부)
Kang, Bo-Yeong (경북대학교 기계공학부)
Kim, Dae-Won (중앙대학교 컴퓨터공학부)
Publication Information
Journal of the Korean Institute of Intelligent Systems / v.19, no.6, 2009 , pp. 802-808 More about this Journal
Abstract
Recently many researches have been presented to improve the clustering performance of gene expression data by incorporating Gene Ontology into the process of clustering. In particular, Kustra et al. showed higher performance improvement by exploiting Biological Process Ontology compared to the typical expression-based clustering. This paper extends the work of Kustra et al. by performing extensive experiments on the way of incorporating GO structures. To this end, we used three ontological distance measures (Lin's, Resnik's, Jiang's) and three GO structures (BP, CC, MF) for the yeast expression data. From all test cases, We found that clustering performances were remarkably improved by incorporating GO; especially, Resnik's distance measure based on Biological Process Ontology was the best.
Keywords
Semi-supervised Clustering; GO; Microarray; Gene Function Prediction; Semantic Distance;
Citations & Related Records
연도 인용수 순위
  • Reference
1 R. Sharan et al. 'CLICK and EXPANDER : a system for clustering and visualizing gene expression data,', Bioinformatics, Vol. 19, no. 14, pp. 1787-1799, 2003   DOI   ScienceOn
2 MB. Eisen et al. 'Cluster analysis and display of genome-wide expression patterns,' Proc Natl Acad Sci, Vol. 95, pp. 14863-14868, 1998   DOI   ScienceOn
3 The Gene Ontology Consortium, 'Gene Ontology : tool for the unification of biology,', Nature Genetics, Vol. 25, 2000
4 D. Dotan-Cohen et al. 'Hierarchical tree snipping : clustering guided by prior knowledge,', Bioinformatics, Vol. 23, no. 24, 3335-3342, 2007   DOI   ScienceOn
5 P. Resnik, 'Using Information Content to Evaluate Semantic Similarity in a Taxonomy,', cmp-lg/9511007, 1995
6 Z. Fang et al. 'Knowledge guided analysis of microarray data,', Journal of Biomedical Informatics, Vol. 39, pp. 401-411, 2006   DOI   ScienceOn
7 PT. Spellman et al. 'Comprehensive Identification of Cell Cycle-regulated Genes of the Yeast Saccharomyces cerevisiae by Microarray Hybridization,', Molecular Biological of the Cell, Vol. 9, pp. 3273-3297, 1998   DOI
8 . Tamayo et al. 'Interpreting patterns of gene expression with self-organizing maps : Methods and application to hematopoietic differentiation,', Proc. Natl. Acad. Sci. USA, Vol. 96, pp. 2907-2912, 1999   DOI   ScienceOn
9 J. Cheng et al. 'A Knowledge-Based Clustering Algorithm Driven by Gene Ontology,', Journal of Biopharmaceutical Statistics, Vol. 14, no. 3, pp. 687- 700, 2004   DOI   ScienceOn
10 RJ. Cho et al. 'A Genome-Wide Transcriptional Analysis of the Mitotic Cell Cycle,', Molecular Cell, Vol. 2, pp. 65-73, 1998   DOI   ScienceOn
11 JJ. Jiang and DW. Conrath, 'Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy,', ROCLING X, 1997
12 R. Kustra and A. Zagdanski, 'Incorporating Gene Ontology in Clustering Gene Expression Data,', CBMS'06, 2006
13 S. Tavazoie et al. 'Systematic determination of genetic network architecture,', Nature Genetics, Vol. 22, pp. 281-285, 1999   DOI   ScienceOn
14 J.Herrero et al. 'A hierarchical unsupervised growing neural network for clustering gene expression patterns,', Bioinformatics, Vol. 17, no. 2, pp. 126-136, 2001   DOI   ScienceOn
15 D. Huang and W. Pan, 'Incorporating biological knowledge into distance-based clustering analysis of microarray gene expression data,', Bioinformatics, Vol. 22, no. 10, 1259-1268, 2006   DOI   ScienceOn
16 D. Lin, 'An Information-Theoretic Definition of Similarity,', In Proceedings of the 15th International Conference on Machine Learning, 1998