DOI QR코드

DOI QR Code

Use of Factor Analyzer Normal Mixture Model with Mean Pattern Modeling on Clustering Genes

  • Kim Seung-Gu (Department of Applied Statistics, SangJi University)
  • Published : 2006.04.01

Abstract

Normal mixture model(NMM) frequently used to cluster genes on microarray gene expression data. In this paper some of component means of NMM are modelled by a linear regression model so that its design matrix presents the pattern between sample classes in microarray matrix. This modelling for the component means by given design matrices certainly has an advantage that we can lead the clusters that are previously designed. However, it suffers from 'overfitting' problem because in practice genes often are highly dimensional. This problem also arises when the NMM restricted by the linear model for component-means is fitted. To cope with this problem, in this paper, the use of the factor analyzer NMM restricted by linear model is proposed to cluster genes. Also several design matrices which are useful for clustering genes are provided.

Keywords

References

  1. Alon, U., Barkai, N., Notterman, D.A., Gish, K., Ybrra, S., Mack, D., and Levine, A.J. (1999). Broad patterns of gene expression revealed by clustering analysis of tomor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the National Academy of Sciences USA 96, 6745-6750
  2. Ben-Dor, A., Bruhn, L., Friedman, N., Nachman, I., Schummer, M., and Yakhimi, Z. (2000). Tissue classification with gene expression profiles. Journal of Computational Biology, Vol. 7, 559-584 https://doi.org/10.1089/106652700750050943
  3. Francesco and Chiaramonte (2001). Analyzing Gene Expression Data From Microarrays: A Mixture-Based Approach. ENAR 2001 Spring Meeting, 25-28 March 2001, Charlotte, North Carolina, USA
  4. McLachlan, G.J., and Peel, D. (2000). Finite Mixture Models. New York: Wiely
  5. McLachlan, G.J., Bean, R.W., and Peel, D. (2002). A mixture model-based approach to the clustering of micorarray expression data. Bioinformatics, Vol. 18, 413-422 https://doi.org/10.1093/bioinformatics/18.3.413
  6. McLachlan, G.J., Do, K-A., Ambroise, C. (2004). Analyzing Microarray Gene Expression Data, Wiely and Sons
  7. Meng, X.L., and van Dyk (1997). The EM algorithm-an old folk song sung to a fast new tune (with discussion). Journal of the Royal Statistical Society, Series B, Vol. 59, 511-567 https://doi.org/10.1111/1467-9868.00082
  8. Segal, E. Wang, H., and Koller, D. (2003). Discovering molercular pathways from protein interaction and gene expression data. Bioinformatics, Vol. 19(Suppl. 1), i264-i272 https://doi.org/10.1093/bioinformatics/btg1037

Cited by

  1. Variable Selection in Normal Mixture Model Based Clustering under Heteroscedasticity vol.24, pp.6, 2011, https://doi.org/10.5351/KJAS.2011.24.6.1213