Browse > Article

A Method for Microarray Data Analysis based on Bayesian Networks using an Efficient Structural learning Algorithm and Data Dimensionality Reduction  

황규백 (서울대학교 컴퓨터공학부)
장정호 (서울대학교 컴퓨터공학부)
장병탁 (서울대학교 컴퓨터공학부)
Abstract
Microarray data, obtained from DNA chip technologies, is the measurement of the expression level of thousands of genes in cells or tissues. It is used for gene function prediction or cancer diagnosis based on gene expression patterns. Among diverse methods for data analysis, the Bayesian network represents the relationships among data attributes in the form of a graph structure. This property enables us to discover various relations among genes and the characteristics of the tissue (e.g., the cancer type) through microarray data analysis. However, most of the present microarray data sets are so sparse that it is difficult to apply general analysis methods, including Bayesian networks, directly. In this paper, we harness an efficient structural learning algorithm and data dimensionality reduction in order to analyze microarray data using Bayesian networks. The proposed method was applied to the analysis of real microarray data, i.e., the NC160 data set. And its usefulness was evaluated based on the accuracy of the teamed Bayesian networks on representing the known biological facts.
Keywords
microarray data analysis; Bayesian networks; data dimensionality reduction;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Spellman, P.T., Sherlock, G., Zhang, M.Q., Iyer, V.R., Anders, K., Eisen, M.B., Brown, P.O., Botstein, D., and Futcher, B., Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization, Molecular Biology of the Cell, vol. 9, no. 12, pp. 3273-3297, 1998   DOI
2 Scherf, U., Ross, D.T., Waltham, M., Smith, L.H., Lee, J.K., Tanabe, L., Kohn, K.W., Reinhold, W.C., Myers, T.G., Andrews, D.T., Scudiero, D.A., Eisen, M.B., Sausville, E.A., Pommier, Y., Botstein, D., Brown, P.O., and Weinstein, J.N., A gene expression database for the molecular pharmacology of cancer, Nature Genetics, vol. 24, no. 3, pp. 236-244, 2000   DOI   ScienceOn
3 Heckerman, D., Geiger, D., and Chickering, D.M., Learning Bayesian networks: the combination of knowledge and statistical data, Machine Learning, vol. 20, no. 3, pp. 197-243, 1995
4 Chickering, D.M., Learning Bayesian networks is NP-complete, Fisher, D. and Lenz, H.-J. (eds.), Learning from Data: Artificial Intelligence and Statistics V, Springer-Verlag, NY, pp. 121-130, 1996
5 Cooper, G.F., Computational complexity of probabilistic inference using Bayesian belief networks, Artificial Intelligence, vol. 42, no. 2-3, pp. 393-405, 1990   DOI   ScienceOn
6 Schena, M. (ed.), Microarray Biochip Technology, Eaton Publishing, MA, 2000
7 Heckerman, D., A tutorial on learning with Bayesian networks, Jordan, M.I. (ed.), Learning in Graphical Models, MIT Press, MA, pp. 301-354, 1999
8 Dagum, P. and Luby, M., Approximating probabilistic inference in Bayesian belief networks is NP-hard, Artificial Intelligence, vol. 60, no. 1, pp. 141-153, 1993   DOI   ScienceOn
9 Pearl, J., Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, Morgan Kaufmann Publishers, CA, 1988
10 Jensen, F.V., An Introduction to Bayesian Networks, Springer-Verlag, NY, 1996
11 Friedman, N. and Goldszmidt, M., Learning Bayesian networks with local structure, Jordan, M.I. (ed.), Learning in Graphical Models, MIT Press, MA, pp. 421-459, 1999
12 Zhang. B.-T. and Cho, D-Y, System identification using evolutionary Markov chain Monte Carlo, Journal of Systems Architecture, vol. 47, no. 7, pp. 587-599, 2001   DOI   ScienceOn
13 Raychaudhuri, S., Stuart, J.M., and Altman, R.B., Principal components analysis to summarize microarray experiments: application to sporulation time series, Pacific Symposium on Biocomputing 5 (Proceedings of PSB'00), pp. 452-463, 1999
14 Friedman, N., Linial, M., Nachman, I., and Pe'er, D., Using Bayesian networks to analyze expression data, In Proceedings of the 4th Annual International Conference on Computational Molecular Biology (RECOMB'00), pp. 127-135, 2000   DOI
15 Hartemink, A.J., Gifford, D.K., Jaakkola, T.S., and Young, R.A., Using graphical models and genomic expression data to statistically validate models of genetic regulatory networks, Pacific Symposium on Biocomputing 6 (Proceedings of PSB'01), pp. 422-433, 2000
16 Graepel, T., Burger, M., and Obermayer, K., Self-organizing maps: generalizations and new optimization techniques, Neurocomputing, vol. 21, pp. 173-190, 1998   DOI   ScienceOn
17 Hartemink, A.J., Gifford, D.K., Jaakkola, T.S., and Young, R.A., Combining location and expression data for principled discovery of genetic regulatory network models, Pacific Symposium on Biocomputing 7 (Proceedings of PSB'02), pp. 437-449, 2001
18 Eisen, M.B., Spellman, P.T., Brown, P.O., and Botstein, D., Cluster analysis and display of genome-wide expression patterns, Proceedings of the National Academy of Sciences of the United States of America, vol. 95, no. 25, pp. 14863-14868, 1998   DOI
19 Khan, J., Wei, J.S., Ringner, M., Saal, L.H., Ladanyi, M., Westermann, F., Berthold, F., Schwab, M., Antonescu, C.R., Peterson, C, and Meltzer, P.S., Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks, Nature Medicine, vol. 7, no. 6, pp. 673-679, 2001   DOI   ScienceOn
20 Dempster, A.P., Laird, N.M., and Rubin, D.B., Maximum likelihood from incomplete data via the EM algorithm(with discussion), Journal of Royal Statistical Society B, vol. 39, no. 1, pp. 1-38, 1977
21 Hwang, K.-B., Lee, J.W, Chung, S.-W, and Zhang, B.-T., Construction of large-scale Bayesian networks by local to global search, Lecture Notes in Artificial Intelligence (Proceedings of PRICAT02), vol. 2417, pp. 375- 384, 2002
22 Hwang, K.-B., Cho, D.-Y., Park, S.-W., Kim, S.-D., and Zhang, B.-T., Applying machine learning techniques to analysis of gene expression data: cancer diagnosis, Lin, S.M. and Johnson, K.F. (eds.), Methods of Microarray Data Analysis (Proceedings of CAMDA'00), Kluwer Academic Publishers, MA, pp. 167-182, 2002
23 Leping, L., Pedersen, L.G., Darden, T.A., and Weinberg, C.R., Computational analysis of leukemia microarray expression data using the GA/KNN method, Lin, S.M. and Johnson, K.F. (eds.), Methods of Microarray Data Analysis (Proceedings of CAMDA'00), Kluwer Academic Publishers, MA, pp. 81-95, 2002
24 Spirtes, P., Glymour, C, and Scheines, R., Causation, Prediction, and Search, 2nd edition, MIT Press, MA, 2000
25 Friedman, N., Nachman, I., and Pe'er, D., Learning Bayesian network structure from massive datasets: the 'sparse candidate' algorithm, In Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence(UAI'99), pp. 206-215, 1999
26 Friedman, N., Goldszmidt, M., and Wyner, A., Data analysis with Bayesian networks: a bootstrap approach, In Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence(UAI'99), pp. 196-205, 1999