Application of Random Forests to Association Studies Using Mitochondrial Single Nucleotide Polymorphisms

  • Kim, Yoon-Hee (Department of Biostatistics and Epidemiology, School of Public Health, Seoul National University) ;
  • Kim, Ho (Inherited Disease Research Branch, NHGRI/NIH)
  • Published : 2007.12.31

Abstract

In previous nuclear genomic association studies, Random Forests (RF), one of several up-to-date machine learning methods, has been used successfully to generate evidence of association of genetic polymorphisms with diseases or other phenotypes. Compared with traditional statistical analytic methods, such as chi-square tests or logistic regression models, the RF method has advantages in handling large numbers of predictor variables and examining gene-gene interactions without a specific model. Here, we applied the RF method to find the association between mitochondrial single nucleotide polymorphisms (mtSNPs) and diabetes risk. The results from a chi-square test validated the usage of RF for association studies using mtDNA. Indexes of important variables such as the Gini index and mean decrease in accuracy index performed well compared with chi-square tests in favor of finding mtSNPs associated with a real disease example, type 2 diabetes.

Keywords

References

  1. Bureau, A., Dupuis, J., Falls, K., Lunetta, K.L., Hayward, B., Keith, T.P., and Van Eerdewegh, P. (2005). Identifying SNPs predictive of phenotype using random forests. Genet Epidemiol. 28, 171-82 https://doi.org/10.1002/gepi.20041
  2. Bureau, A., Dupuis, J., Hayward, B., Falls, K., and Van Eerdewegh, P. (2003). Mapping complex traits using Random Forests. BMC Genet. 4 Suppl 1, S64 https://doi.org/10.1186/1471-2156-4-S1-S64
  3. Burger, G., Gray, M.W., and Lang, B.F. (2003). Mitochondrial genomes: anything goes. Trends Genet. 19, 709-16 https://doi.org/10.1016/j.tig.2003.10.012
  4. Chinnery, P.F., Howell, N., Andrews, R.M., and Turnbull, D.M. (1999). Clinical mitochondrial genetics. J Med Genet. 36, 425-36
  5. Cho, Y.M., Ritchie, M.D., Moore, J.H., Park, J.Y., Lee, K.U., Shin, H.D., Lee, H.K., and Park, K.S. (2004). Multifactor-dimensionality reduction shows a two-locus interaction associated with Type 2 diabetes mellitus. Diabetologia. 47, 549-54 https://doi.org/10.1007/s00125-003-1321-3
  6. Diaz-Uriarte, R., and Alvarez de Andres, S. (2006). Gene selection and classification of microarray data using random forest. BMC Bioinformatics. 7, 3 https://doi.org/10.1186/1471-2105-7-3
  7. Grajski, K. A., Breiman, L., Viana Di Prisco, G., and Freeman, W.J. (1986). Classification of EEG spatial patterns with a tree-structured methodology: CART. IEEE Trans Biomed Eng. 33, 1076-86 https://doi.org/10.1109/TBME.1986.325684
  8. Guo, L.J., Oshida, Y., Fuku, N., Takeyasu, T., Fujita, Y., Kurata, M., Sato, Y., Ito, M., and Tanaka, M. (2005). Mitochondrial genome polymorphisms associated with type-2 diabetes or obesity. Mitochondrion. 5, 15-33 https://doi.org/10.1016/j.mito.2004.09.001
  9. Ingman, M., Kaessmann, H., Paabo, S., and Gyllensten, U. (2000). Mitochondrial genome variation and the origin of modern humans. Nature 408, 708-13 https://doi.org/10.1038/35047064
  10. Kahn, C.R., Vicent, D., and Doria, A. (1996). Genetics of non-insulin-dependent (type-II) diabetes mellitus. Annu Rev Med. 47, 509-31 https://doi.org/10.1146/annurev.med.47.1.509
  11. Kato, Y., Miura, Y., Inagaki, A., Itatsu, T., and Oiso, Y. (2002). Age of onset possibly associated with the degree of heteroplasmy in two male siblings with diabetes mellitus having an A to G transition at 3243 of mitochondrial DNA. Diabet Med. 19, 784-6 https://doi.org/10.1046/j.1464-5491.2002.00777.x
  12. Ladoukakis, E.D., and Eyre-Walker, A. (2004). Evolutionary genetics: direct evidence of recombination in human mitochondrial DNA. Heredity. 93, 321 https://doi.org/10.1038/sj.hdy.6800572
  13. Lee, J.W., Lee, J.B., Park, M., and Song, S.H. (2005). An extensive comparison of recent classification tools applied to microarray data. Comp Stat & Data Analysis. 48, 869-885 https://doi.org/10.1016/j.csda.2004.03.017
  14. Lunetta, K.L., Hayward, L.B., Segal, J., and Van Eerdewegh, P. (2004). Screening large-scale association study data: exploiting interactions using random forests. BMC Genet. 5, 32 https://doi.org/10.1186/1471-2156-5-32
  15. Matsunaga, H., Tanaka, Y., Tanaka, M., Gong, J.S., Zhang, J., Nomiyama, T., Ogawa, O., Ogihara, T., Yamada, Y., Yagi, K., and Kawamori, R. (2001). Antiatherogenic mitochondrial genotype in patients with type 2 diabetes. Diabetes Care. 24, 500-3 https://doi.org/10.2337/diacare.24.3.500
  16. McKinney, B.A., Reif, D.M., Ritchie, M.D., and Moore, J.H. (2006). Machine learning for detecting gene-gene interactions: a review. Appl Bioinformatics. 5, 77-88 https://doi.org/10.2165/00822942-200605020-00002
  17. Mukae, S., Aoki, S., Itoh, S., Sato, R., Nishio, K., Iwata, T., and Katagiri, T. (2003). Mitochondrial 5178A/C genotype is associated with acute myocardial infarction. Circ J. 67, 16-20 https://doi.org/10.1253/circj.67.16
  18. Niemi, A.K., Hervonen, A., Hurme, M., Karhunen, P.J., Jylha, M., and Majamaa, K. (2003). Mitochondrial DNA polymorphisms associated with longevity in a Finnish population. Hum Genet. 112, 29-33 https://doi.org/10.1007/s00439-002-0843-y
  19. Nigou, M., Parfait, B., Clauser, E., and Olivier, J.L. (1998). Detection and quantification of the A3243G mutation of mitochondrial DNA by ligation detection reaction. Mol Cell Probes. 12, 273-82 https://doi.org/10.1006/mcpr.1998.0191
  20. Ohkubo, K., Yamano, A., Nagashima, M., Mori, Y., Anzai, K., Akehi, Y., Nomiyama, R., Asano, T., Urae, A., and Ono, J. (2001). Mitochondrial gene mutations in the tRNA(Leu(UUR)) region and diabetes: prevalence and clinical phenotypes in Japan. Clin Chem. 47, 1641-8
  21. Park, H.S., and Lee, S.U. (2004). MitGEN: Single Nucleotide Polymorphism DB Browser for Human Mitochondrial Genome. Genomics & Informatics 2(3), 147-148
  22. Poulton, J., Luan, J., Macaulay, V., Hennings, S., Mitchell, J., and Wareham, N.J. (2002). Type 2 diabetes is associated with a common mitochondrial variant: evidence from a population-based case-control study. Hum Mol Genet. 11, 1581-3 https://doi.org/10.1093/hmg/11.13.1581
  23. Rosenbloom, A.L., Joe, J.R., Young, R.S., and Winter, W.E. (1999). Emerging epidemic of type 2 diabetes in youth. Diabetes Care. 22, 345-54 https://doi.org/10.2337/diacare.22.2.345
  24. Shi, T., Seligson, D., Belldegrun, A.S., Palotie, A., and Horvath, S. (2005). Tumor classification by tissue microarray profiling: random forest clustering applied to renal cell carcinoma. Mod Pathol, 18, 547-57 https://doi.org/10.1038/modpathol.3800322
  25. Suzuki, S. (2004). Diabetes mellitus with mitochondrial gene mutations in Japan. Ann N Y Acad Sci. 1011, 185-92 https://doi.org/10.1196/annals.1293.019
  26. Suzuki, S., Oka, Y., Kadowaki, T., Kanatsuka, A., Kuzuya, T., Kobayashi, M., Sanke, T., Seino, Y., and Nanjo, K. (2003). Clinical features of diabetes mellitus with the mitochondrial DNA 3243 (A-G) mutation in Japanese: maternal inheritance and mitochondria-related complications. Diabetes Res Clin Pract. 59, 207-17 https://doi.org/10.1016/S0168-8227(02)00246-2