Global Optimization of Clusters in Gene Expression Data of DNA Microarrays by Deterministic Annealing

  • Lee, Kwon Moo (Bioinformatics Project, IT R&D Center, Samsung SDS) ;
  • Chung, Tae Su (Human Genome Research Institute, Seoul National University College of Medicine) ;
  • Kim, Ju Han (SNUBiomedical Informatics, Seoul National University College of Medicine)
  • Published : 2003.09.01

Abstract

The analysis of DNA microarry data is one of the most important things for functional genomics research. The matrix representation of microarray data and its successive 'optimal' incisional hyperplanes is a useful platform for developing optimization algorithms to determine the optimal partitioning of pairwise proximity matrix representing completely connected and weighted graph. We developed Deterministic Annealing (DA) approach to determine the successive optimal binary partitioning. DA algorithm demonstrated good performance with the ability to find the 'globally optimal' binary partitions. In addition, the objects that have not been clustered at small non­zero temperature, are considered to be very sensitive to even small randomness, and can be used to estimate the reliability of the clustering.

Keywords

References

  1. Alon, U., Barkai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D., and Levine, A.J. (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. USA 96, 6745-6750 https://doi.org/10.1073/pnas.96.12.6745
  2. Bazaraa, M.S., Sherali, H.D., and Shetty, C.M. (1993). Nonlinear Programing Theory and Algorithms, 2nd eds., (Wiley)
  3. Eisen, M.B., Spellman, P.T., Brown, P.O., Botstein, D. (1998) Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95, 14863-14868 https://doi.org/10.1073/pnas.95.25.14863
  4. Fisher, R.A. (1936). The use of multiple measurements in taxonomic problems, Annals of Eugenics 7, Part II, 179-188 https://doi.org/10.1111/j.1469-1809.1936.tb02137.x
  5. Golub, T.R., Sionim, D.K., Tamayo, P., Huard, C., Caasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L. , Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S. (1999). Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531-537 https://doi.org/10.1126/science.286.5439.531
  6. Holmes, I. and Bruno, W.J. (2000). Finding regulatory elements using joint likelihoods for sequence and expression profile data. Intelligent Systems for Molecular Biology, 202-210
  7. Hopfield, J.J. and Tank, D.W. (1985). Neural computation of decision in optimization problems. BioI. Cybern. 52, 141-152
  8. Kim, J.H., Ohno-Machado, L., and Kohane, I.S. (2001). Unsupervised learning from complex data: The Matrix incision tree algorithm. Pacific Symposium on Biocomputing, 30-41
  9. Kim, J.H., Ohno-Machado, L., and Kohane, I.S. (2002) Visualization and evaluation of clustering structures for gene expression data analysis. J Biomed Inform 35, 25-36 https://doi.org/10.1016/S1532-0464(02)00001-1
  10. Kirkpatrick, S., Gelatt, C., and Vecchi, M. (1983). Optimization by simulated annealing. Science 220, 671-680 https://doi.org/10.1126/science.220.4598.671
  11. Lee, K., Kim, J.H., Chung, T.S., Moon, B.S., Lee, H.,and Kohane, I.S. (2001). Evolution strategy applied to global optimization of clusters in gene expression data of DNA microarrays. Proc. IEEE Cong. on Evol. Comp. 845-850
  12. Lukashin, A.V. and Fuchs, R. (2001). Analysis of temporal gene expression profiles: clustering by simulated annealing and determining the optimal number of clusters. Bioinformatics 17, 405-414 https://doi.org/10.1093/bioinformatics/17.5.405
  13. Rose, K., Gurewitz, E., and Fox, G. (1990). Statistical mechanics and phase transition in clustering. Phys. Rev. Lett. 65, 945-948 https://doi.org/10.1103/PhysRevLett.65.945
  14. Rose, K. (1998). Deterministic annealing for clustering, compression, classification, regression, and related optimization problems. Proc. IEEE 86, 2210-2239 https://doi.org/10.1109/5.726788
  15. Tavazoie, S. and Church, G.M. (1998). Quantitative wholegenome analysis of DNA-protein interactions by in vivo methylase protection in E. coli. Nature Biotechnol. 16, 566-71 https://doi.org/10.1038/nbt0698-566
  16. Tavazoie, S., Hughes, J.D., Campbell, M.J., Cho, R.J., and Church, G.M. (1999). Systematic determination of genetic network architecture. Nature Genetics 22, 281-285 https://doi.org/10.1038/10343