DOI QR코드

DOI QR Code

Improving Clustering Performance Using Gene Ontology

유전자 온톨로지를 활용한 클러스터링 성능 향상 기법

  • 고송 (중앙대학교 컴퓨터공학부) ;
  • 강보영 (경북대학교 기계공학부) ;
  • 김대원 (중앙대학교 컴퓨터공학부)
  • Received : 2009.07.16
  • Accepted : 2009.12.02
  • Published : 2009.12.25

Abstract

Recently many researches have been presented to improve the clustering performance of gene expression data by incorporating Gene Ontology into the process of clustering. In particular, Kustra et al. showed higher performance improvement by exploiting Biological Process Ontology compared to the typical expression-based clustering. This paper extends the work of Kustra et al. by performing extensive experiments on the way of incorporating GO structures. To this end, we used three ontological distance measures (Lin's, Resnik's, Jiang's) and three GO structures (BP, CC, MF) for the yeast expression data. From all test cases, We found that clustering performances were remarkably improved by incorporating GO; especially, Resnik's distance measure based on Biological Process Ontology was the best.

마이크로어레이 데이터의 클러스터링 성능을 향상시키기 위하여 유전자 온톨로지(GO)를 활용하는 연구가 최근 진행 중에 있다. 그 중 Biological Process(BP) GO를 활용한 Kustra et al.의 연구가 2006년에 소개된 바 있다. 본 연구는 Kustra et al.의 연구를 확장하여 일반적이고 실질적인 GO의 활용 방안을 위한 분석 결과를 제시하기 위하여 다양한 활용 방법을 적용한다. (1) GO의 거리를 측정하기 위하여 Lin et al, Resnik et al과 Jiang et al의 방법을 적용하였으며, (2) BP를 포함한 세 가지 GO 유형의 구조에 대해 적용하여 각 방법에 따른 성능 향상 정도를 분석한다. 각 방법에 대한 성능 분석 비교를 위하여 효모 유전자를 관측하여 형성한 데이터를 활용한다. 실험 결과를 통하여 GO 정보를 클러스터링에 적용하면 전반적으로 성능 향상을 유도하지만, 활용 방법에 따라서 성능 개선 정도의 차이가 발생한다. 그 중 Resnik의 거리 측정 척도와 BP GO를 활용하였을 때, 가장 개선된 성능을 유도함을 볼 수 있다.

Keywords

References

  1. J.Herrero et al. 'A hierarchical unsupervised growing neural network for clustering gene expression patterns,', Bioinformatics, Vol. 17, no. 2, pp. 126-136, 2001 https://doi.org/10.1093/bioinformatics/17.2.126
  2. R. Sharan et al. 'CLICK and EXPANDER : a system for clustering and visualizing gene expression data,', Bioinformatics, Vol. 19, no. 14, pp. 1787-1799, 2003 https://doi.org/10.1093/bioinformatics/btg232
  3. RJ. Cho et al. 'A Genome-Wide Transcriptional Analysis of the Mitotic Cell Cycle,', Molecular Cell, Vol. 2, pp. 65-73, 1998 https://doi.org/10.1016/S1097-2765(00)80114-8
  4. MB. Eisen et al. 'Cluster analysis and display of genome-wide expression patterns,' Proc Natl Acad Sci, Vol. 95, pp. 14863-14868, 1998 https://doi.org/10.1073/pnas.95.25.14863
  5. PT. Spellman et al. 'Comprehensive Identification of Cell Cycle-regulated Genes of the Yeast Saccharomyces cerevisiae by Microarray Hybridization,', Molecular Biological of the Cell, Vol. 9, pp. 3273-3297, 1998 https://doi.org/10.1091/mbc.9.12.3273
  6. . Tamayo et al. 'Interpreting patterns of gene expression with self-organizing maps : Methods and application to hematopoietic differentiation,', Proc. Natl. Acad. Sci. USA, Vol. 96, pp. 2907-2912, 1999 https://doi.org/10.1073/pnas.96.6.2907
  7. S. Tavazoie et al. 'Systematic determination of genetic network architecture,', Nature Genetics, Vol. 22, pp. 281-285, 1999 https://doi.org/10.1038/10343
  8. The Gene Ontology Consortium, 'Gene Ontology : tool for the unification of biology,', Nature Genetics, Vol. 25, 2000
  9. P. Resnik, 'Using Information Content to Evaluate Semantic Similarity in a Taxonomy,', cmp-lg/9511007, 1995
  10. D. Lin, 'An Information-Theoretic Definition of Similarity,', In Proceedings of the 15th International Conference on Machine Learning, 1998
  11. JJ. Jiang and DW. Conrath, 'Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy,', ROCLING X, 1997
  12. D. Dotan-Cohen et al. 'Hierarchical tree snipping : clustering guided by prior knowledge,', Bioinformatics, Vol. 23, no. 24, 3335-3342, 2007 https://doi.org/10.1093/bioinformatics/btm526
  13. Z. Fang et al. 'Knowledge guided analysis of microarray data,', Journal of Biomedical Informatics, Vol. 39, pp. 401-411, 2006 https://doi.org/10.1016/j.jbi.2005.08.004
  14. D. Huang and W. Pan, 'Incorporating biological knowledge into distance-based clustering analysis of microarray gene expression data,', Bioinformatics, Vol. 22, no. 10, 1259-1268, 2006 https://doi.org/10.1093/bioinformatics/btl065
  15. J. Cheng et al. 'A Knowledge-Based Clustering Algorithm Driven by Gene Ontology,', Journal of Biopharmaceutical Statistics, Vol. 14, no. 3, pp. 687- 700, 2004 https://doi.org/10.1081/BIP-200025659
  16. R. Kustra and A. Zagdanski, 'Incorporating Gene Ontology in Clustering Gene Expression Data,', CBMS'06, 2006