DOI QR코드

DOI QR Code

Risk Prediction Using Genome-Wide Association Studies on Type 2 Diabetes

  • Choi, Sungkyoung (Interdisciplinary Program in Bioinformatics, Seoul National University) ;
  • Bae, Sunghwan (Interdisciplinary Program in Bioinformatics, Seoul National University) ;
  • Park, Taesung (Interdisciplinary Program in Bioinformatics, Seoul National University)
  • 투고 : 2016.11.10
  • 심사 : 2016.12.05
  • 발행 : 2016.12.31

초록

The success of genome-wide association studies (GWASs) has enabled us to improve risk assessment and provide novel genetic variants for diagnosis, prevention, and treatment. However, most variants discovered by GWASs have been reported to have very small effect sizes on complex human diseases, which has been a big hurdle in building risk prediction models. Recently, many statistical approaches based on penalized regression have been developed to solve the "large p and small n" problem. In this report, we evaluated the performance of several statistical methods for predicting a binary trait: stepwise logistic regression (SLR), least absolute shrinkage and selection operator (LASSO), and Elastic-Net (EN). We first built a prediction model by combining variable selection and prediction methods for type 2 diabetes using Affymetrix Genome-Wide Human SNP Array 5.0 from the Korean Association Resource project. We assessed the risk prediction performance using area under the receiver operating characteristic curve (AUC) for the internal and external validation datasets. In the internal validation, SLR-LASSO and SLR-EN tended to yield more accurate predictions than other combinations. During the external validation, the SLR-SLR and SLR-EN combinations achieved the highest AUC of 0.726. We propose these combinations as a potentially powerful risk prediction model for type 2 diabetes.

키워드

참고문헌

  1. Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, et al. Finding the missing heritability of complex diseases. Nature 2009;461:747-753. https://doi.org/10.1038/nature08494
  2. Wang WY, Barratt BJ, Clayton DG, Todd JA. Genome-wide association studies: theoretical and practical concerns. Nat Rev Genet 2005;6:109-118. https://doi.org/10.1038/nrg1522
  3. Evans DM, Visscher PM, Wray NR. Harnessing the information contained within genome-wide association studies to improve individual prediction of complex disease risk. Hum Mol Genet 2009;18:3525-3531. https://doi.org/10.1093/hmg/ddp295
  4. International Schizophrenia Consortium, Purcell SM, Wray NR, Stone JL, Visscher PM, O'Donovan MC, et al. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature 2009;460:748-752.
  5. Davies RW, Dandona S, Stewart AF, Chen L, Ellis SG, Tang WH, et al. Improved prediction of cardiovascular disease based on a panel of single nucleotide polymorphisms identified through genome-wide association studies. Circ Cardiovasc Genet 2010;3:468-474. https://doi.org/10.1161/CIRCGENETICS.110.946269
  6. Hughes MF, Saarela O, Stritzke J, Kee F, Silander K, Klopp N, et al. Genetic markers enhance coronary risk prediction in men: the MORGAM prospective cohorts. PLoS One 2012; 7:e40922. https://doi.org/10.1371/journal.pone.0040922
  7. Janssens AC, van Duijn CM. Genome-based prediction of common diseases: advances and prospects. Hum Mol Genet 2008;17:R166-R173. https://doi.org/10.1093/hmg/ddn250
  8. van der Net JB, Janssens AC, Sijbrands EJ, Steyerberg EW. Value of genetic profiling for the prediction of coronary heart disease. Am Heart J 2009;158:105-110. https://doi.org/10.1016/j.ahj.2009.04.022
  9. Weedon MN, McCarthy MI, Hitman G, Walker M, Groves CJ, Zeggini E, et al. Combining information from common type 2 diabetes risk polymorphisms improves disease prediction. PLoS Med 2006;3:e374. https://doi.org/10.1371/journal.pmed.0030374
  10. Wacholder S, Hartge P, Prentice R, Garcia-Closas M, Feigelson HS, Diver WR, et al. Performance of common genetic variants in breast-cancer risk models. N Engl J Med 2010;362:986-993. https://doi.org/10.1056/NEJMoa0907727
  11. Jostins L, Barrett JC. Genetic risk prediction in complex disease. Hum Mol Genet 2011;20:R182-R188. https://doi.org/10.1093/hmg/ddr378
  12. Lindstrom S,, Schumacher FR, Cox D, Travis RC, Albanes D, Allen NE, et al. Common genetic variants in prostate cancer risk prediction--results from the NCI Breast and Prostate Cancer Cohort Consortium (BPC3). Cancer Epidemiol Biomarkers Prev 2012;21:437-444. https://doi.org/10.1158/1055-9965.EPI-11-1038
  13. Kundu S, Mihaescu R, Meijer CM, Bakker R, Janssens AC. Estimating the predictive ability of genetic risk models in simulated data based on published results from genome-wide association studies. Front Genet 2014;5:179.
  14. Cortes C, Vapnik V. Support-vector networks. Mach Learn 1995;20:273-297.
  15. Burges CJ. A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 1998;2:121-167. https://doi.org/10.1023/A:1009715923555
  16. Breiman L. Random forests. Mach Learn 2001;45:5-32. https://doi.org/10.1023/A:1010933404324
  17. Banfield RE, Hall LO, Bowyer KW, Kegelmeyer WP. A comparison of decision tree ensemble creation techniques. IEEE Trans Pattern Anal Mach Intell 2007;29:173-180. https://doi.org/10.1109/TPAMI.2007.250609
  18. Yoon D, Kim YJ, Park T. Phenotype prediction from genome-wide association studies: application to smoking behaviors. BMC Syst Biol 2012;6 Suppl 2:S11.
  19. John Lu ZQ. The elements of statistical learning: data mining, inference, and prediction. J R Stat Soc Ser A Stat Soc 2010;173: 693-694. https://doi.org/10.1111/j.1467-985X.2010.00646_6.x
  20. Hoerl AE. Ridge regression. Biometrics 1970;26:603.
  21. Hoerl AE, Kennard RW. Ridge regression: applications to nonorthogonal problems. Technometrics 1970;12:69-82. https://doi.org/10.1080/00401706.1970.10488635
  22. Hoerl AE, Kennard RW. Ridge regression: biased estimation for nonorthogonal problems. Technometrics 1970;12:55-67. https://doi.org/10.1080/00401706.1970.10488634
  23. Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc Ser B Methodol 1996;58:267-288.
  24. Zou H, Hastie T. Regularization and variable selection via the elastic net. J R Stat Soc Ser B Stat Methodol 2005;67:301-320. https://doi.org/10.1111/j.1467-9868.2005.00503.x
  25. Kooperberg C, LeBlanc M, Obenchain V. Risk prediction using genome-wide association studies. Genet Epidemiol 2010;34: 643-652. https://doi.org/10.1002/gepi.20509
  26. Wei Z, Wang W, Bradfield J, Li J, Cardinale C, Frackelton E, et al. Large sample size, wide variant spectrum, and advanced machine-learning technique boost risk prediction for inflammatory bowel disease. Am J Hum Genet 2013;92:1008-1012. https://doi.org/10.1016/j.ajhg.2013.05.002
  27. Cho YS, Go MJ, Kim YJ, Heo JY, Oh JH, Ban HJ, et al. A large-scale genome-wide association study of Asian populations uncovers genetic factors influencing eight quantitative traits. Nat Genet 2009;41:527-534. https://doi.org/10.1038/ng.357
  28. Browning SR, Browning BL. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am J Hum Genet 2007;81:1084-1097. https://doi.org/10.1086/521987
  29. Cho YS, Chen CH, Hu C, Long J, Ong RT, Sim X, et al. Meta-analysis of genome-wide association studies identifies eight new loci for type 2 diabetes in east Asians. Nat Genet 2011;44:67-72. https://doi.org/10.1038/ng.1019
  30. Go MJ, Hwang JY, Park TJ, Kim YJ, Oh JH, Kim YJ, et al. Genome-wide association study identifies two novel loci with sex-specific effects for type 2 diabetes mellitus and glycemic traits in a Korean population. Diabetes Metab J 2014;38: 375-387. https://doi.org/10.4093/dmj.2014.38.5.375
  31. Health Examinees Study Group. The Health Examinees (HEXA) study: rationale, study design and baseline characteristics. Asian Pac J Cancer Prev 2015;16:1591-1597. https://doi.org/10.7314/APJCP.2015.16.4.1591
  32. Wen W, Kato N, Hwang JY, Guo X, Tabara Y, Li H, et al. Genome-wide association studies in East Asians identify new loci for waist-hip ratio and waist circumference. Sci Rep 2016;6:17958. https://doi.org/10.1038/srep17958
  33. Lim J, Koh I, Cho YS. Identification of genetic loci stratified by diabetic status and microRNA related SNPs influencing kidney function in Korean populations. Genes Genom 2016;38: 601-609. https://doi.org/10.1007/s13258-016-0411-9
  34. Welter D, MacArthur J, Morales J, Burdett T, Hall P, Junkins H, et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res 2014;42:D1001-D1006. https://doi.org/10.1093/nar/gkt1229
  35. Akaike H. A new look at the statistical model identification. IEEE Trans Automat Contr 1974;19:716-723. https://doi.org/10.1109/TAC.1974.1100705
  36. Ripley B, Venables B, Bates DM, Hornik K, Gebhardt A, Firth D, et al. Package 'MASS'. CRAN Repository, 2013. Accessed 2016 Nov 1. Available from: http://cran r-project org/web/packages/MASS/MASS pdf.
  37. Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J Stat Softw 2010;33:1-22.
  38. DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 1988;44: 837-845. https://doi.org/10.2307/2531595
  39. Lasko TA, Bhagwat JG, Zou KH, Ohno-Machado L. The use of receiver operating characteristic curves in biomedical informatics. J Biomed Inform 2005;38:404-415. https://doi.org/10.1016/j.jbi.2005.02.008
  40. Diabetes Genetics Initiative of Broad Institute of Harvard and MIT, Lund University, and Novartis Institutes of BioMedical Research, Saxena R, Voight BF, Lyssenko V, Burtt NP, de Bakker PI, et al. Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels. Science 2007;316:1331-1336. https://doi.org/10.1126/science.1142358
  41. Zeggini E, Weedon MN, Lindgren CM, Frayling TM, Elliott KS, Lango H, et al. Replication of genome-wide association signals in UK samples reveals risk loci for type 2 diabetes. Science 2007;316:1336-1341. https://doi.org/10.1126/science.1142364
  42. Scott LJ, Mohlke KL, Bonnycastle LL, Willer CJ, Li Y, Duren WL, et al. A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants. Science 2007;316:1341-1345. https://doi.org/10.1126/science.1142382
  43. Kwak SH, Kim SH, Cho YM, Go MJ, Cho YS, Choi SH, et al. A genome-wide association study of gestational diabetes mellitus in Korean women. Diabetes 2012;61:531-541. https://doi.org/10.2337/db11-1034
  44. DIAbetes Genetics Replication And Meta-analysis (DIAGRAM) Consortium; Asian Genetic Epidemiology Network Type 2 Diabetes (AGEN-T2D) Consortium; South Asian Type 2 Diabetes (SAT2D) Consortium; Mexican American Type 2 Diabetes (MAT2D) Consortium; Type 2 Diabetes Genetic Exploration by Nex-generation sequencing in muylti-Ethnic Samples (T2D-GENES) Consortium, Mahajan A, et al. Genome-wide trans-ancestry meta-analysis provides insight into the genetic architecture of type 2 diabetes susceptibility. Nat Genet 2014;46:234-244. https://doi.org/10.1038/ng.2897
  45. Hara K, Fujita H, Johnson TA, Yamauchi T, Yasuda K, Horikoshi M, et al. Genome-wide association study identifies three novel loci for type 2 diabetes. Hum Mol Genet 2014; 23:239-246. https://doi.org/10.1093/hmg/ddt399
  46. Pasquale LR, Loomis SJ, Aschard H, Kang JH, Cornelis MC, Qi L, et al. Exploring genome-wide - dietary heme iron intake interactions and the risk of type 2 diabetes. Front Genet 2013;4:7.
  47. Sim X, Ong RT, Suo C, Tay WT, Liu J, Ng DP, et al. Transferability of type 2 diabetes implicated loci in multi-ethnic cohorts from Southeast Asia. PLoS Genet 2011;7: e1001363. https://doi.org/10.1371/journal.pgen.1001363
  48. Hanson RL, Muller YL, Kobes S, Guo T, Bian L, Ossowski V, et al. A genome-wide association study in American Indians implicates DNER as a susceptibility locus for type 2 diabetes. Diabetes 2014;63:369-376. https://doi.org/10.2337/db13-0416
  49. Anderson D, Cordell HJ, Fakiola M, Francis RW, Syn G, Scaman ES, et al. First genome-wide association study in an Australian aboriginal population provides insights into genetic risk factors for body mass index and type 2 diabetes. PLoS One 2015;10:e0119333. https://doi.org/10.1371/journal.pone.0119333
  50. Timpson NJ, Lindgren CM, Weedon MN, Randall J, Ouwehand WH, Strachan DP, et al. Adiposity-related heterogeneity in patterns of type 2 diabetes susceptibility observed in genome-wide association data. Diabetes 2009;58:505-510. https://doi.org/10.2337/db08-0906
  51. Ng MC, Shriner D, Chen BH, Li J, Chen WM, Guo X, et al. Meta-analysis of genome-wide association studies in African Americans provides insights into the genetic architecture of type 2 diabetes. PLoS Genet 2014;10:e1004517. https://doi.org/10.1371/journal.pgen.1004517
  52. Saxena R, Saleheen D, Been LF, Garavito ML, Braun T, Bjonnes A, et al. Genome-wide association study identifies a novel locus contributing to type 2 diabetes susceptibility in Sikhs of Punjabi origin from India. Diabetes 2013;62:1746-1755. https://doi.org/10.2337/db12-1077
  53. Cui B, Zhu X, Xu M, Guo T, Zhu D, Chen G, et al. A genome-wide association study confirms previously reported loci for type 2 diabetes in Han Chinese. PLoS One 2011;6: e22353. https://doi.org/10.1371/journal.pone.0022353
  54. Wei C, Lu Q. Collapsing ROC approach for risk prediction research on both common and rare variants. BMC Proc 2011;5 Suppl 9:S42. https://doi.org/10.1186/1753-6561-5-S9-S42
  55. Wu C, Walsh KM, Dewan AT, Hoh J, Wang Z. Disease risk prediction with rare and common variants. BMC Proc 2011;5 Suppl 9:S61. https://doi.org/10.1186/1753-6561-5-S9-S61
  56. Eleftherohorinou H, Wright V, Hoggart C, Hartikainen AL, Jarvelin MR, Balding D, et al. Pathway analysis of GWAS provides new insights into genetic susceptibility to 3 inflammatory diseases. PLoS One 2009;4:e8068. https://doi.org/10.1371/journal.pone.0008068
  57. Qian DC, Han Y, Byun J, Shin HR, Hung RJ, McLaughlin JR, et al. A novel pathway-based approach improves lung cancer risk prediction using germline genetic variations. Cancer Epidemiol Biomarkers Prev 2016;25:1208-1215. https://doi.org/10.1158/1055-9965.EPI-15-1318

피인용 문헌

  1. 基于家系数据集群化似然比算法的疾病基因组遗传风险预测研究 vol.19, pp.12, 2018, https://doi.org/10.1631/jzus.B1800162