DOI QR코드

DOI QR Code

A Classifier for the association study between SNPs and quantitative traits

SNP와 양적 표현형의 연관성 분석을 위한 분류기

  • 엄상용 (한림대학교 컴퓨터공학과) ;
  • 이광모 (한림대학교 컴퓨터공학과)
  • Received : 2012.05.14
  • Accepted : 2012.09.17
  • Published : 2012.11.30

Abstract

The advance of technologies for human genome makes it possible that the analysis of association between genetic variants and diseases and the application of the results to predict risk or susceptibility to them. Many of those studies carried out in case-control study. For quantitative traits, statistical analysis methods are applied to find single nucleotide polymorphisms (SNP) relevant to the diseases and consider them one by one. In this study, we presented methods to select informative single nucleotide polymorphisms and predict risk for quantitative traits and compared their performance. We adopted two SNP selection methods: one considering single SNP only and the other of all possible pairs of SNPs.

인간 유전체 정보와 관련된 기술이 발전함으로 인하여 이를 이용한 질환 또는 질병에 대한 연관성을 분석하여 그 위험도나 치료 예후 등에 대한 예측하기 위한 연구가 활발히 진행되고 있다. 이러한 연구의 대부분은 대표적인 질적 표현형을 대상으로 하는 환자-대조군 연구(case-control study) 방법을 이용하고 있으며 양적 표현형에 대해서는 개별 단일 염기 변이의 연관성을 회기 분석 방법을 이용하여 규명하는 연구가 주로 수행되고 있다. 특히 복합 질병(complex disease)에 대한 위험도를 예측하기 위한 연구의 경우 흔한 변이 흔한 질환(common variants common disease)의 가정아래 주로 각각의 단일 염기 변이가 보이는 연관성 정보를 기반으로 진행되고 있으며 여러 변이의 상호 작용에 의한 영향을 분석한 결과는 상대적으로 미비하다. 이 논문에서는 양적 표현형에 대한 SNP의 연관성을 분석하고 그 결과로 발견된 SNP을 이용하여 대상 표현형의 값을 예측하기 위한 분류기를 구성하고 그 성능을 평가하였으며 분류기의 단일 염기 변이의 선택에 있어서 각각의 단일 염기 변이의 연관성을 고려할 때와 단일 염기 변이의 쌍이 보이는 연관성을 고려할 때의 분류 성능을 비교하였다.

Keywords

References

  1. Human Genome Project. http://www.ornl.gov/sci/techresources/HumanGenome/home.shtml
  2. Y. S. Cho et al., "A large-scale genome-wide association study of Asian populations uncovers genetic factors influencing eight quantitative traits," Nature Genetics, vol. 41, no. 5, pp. 527-534, May 2009. https://doi.org/10.1038/ng.357
  3. S. Ripatti et al., "A multilocus genetic risk score for coronary heart disease: case-control and prospective cohort analyses," Lancet, vol.376, no. 9750, pp. 1393- 1400, Oct. 2010. https://doi.org/10.1016/S0140-6736(10)61267-6
  4. A. C. Heath et al., "A quantitative-trait genome-wide association study of alcoholismrisk in the community: Findings and implications," Biological Psychiatry, vol. 70, no. 6, pp. 513-518, Sep. 2011. https://doi.org/10.1016/j.biopsych.2011.02.028
  5. H. D. Daetwyler, B. Villanueva, and J. A. Woolliams, "Accuracy of predicting the genetic risk of disease using a genome-wide approach," PLoS One, vol. 3, no. 10, p. e3395, Oct. 2008 https://doi.org/10.1371/journal.pone.0003395
  6. S. Waaijenborg and A. H. Zwinderman, "Association of repeatedly measured intermediate risk factors for complex diseases with high dimensional SNP data," Algorithms formolecular biology :AMB, vol. 5, p. 17, 2010.
  7. N. P. Paynter, D. I. Chasman, J. E. Buring,D. Shiffman, N. R. Cook, and P. M. Ridker, "Cardiovascular disease risk prediction with and without knowledge of genetic variation at chromosome 9p21.3," Annals of internal medicine, vol. 150, no. 2, pp. 65-72, Jan. 2009. https://doi.org/10.7326/0003-4819-150-2-200901200-00003
  8. J. Batsis and F. Lopez-Jimenez, "Cardiovascular risk assessment - From individual risk prediction to estimation of global risk and change in risk in the population," BMC medicine, vol. 8, no. 1, p. 29, 2010. https://doi.org/10.1186/1741-7015-8-29
  9. Z.Wei, K.Wang,H. Qu,H. Zhang, J. Bradfield, C. Kim, E. Frackleton, C. Hou, J. Glessner, and R. Chiavacci, "From disease association to risk assessment: an optimistic viewfromgenome-wide association studies on Type 1 Diabetes," PLoS genetics, vol. 5, no. 10, p.e1000678, 2009. https://doi.org/10.1371/journal.pgen.1000678
  10. C. Kooperberg, M. LeBlanc, and V. Obenchain, "Risk prediction using genome-wide association studies," Genetic Epidemiology, vol. 34,no. 7, pp. 643-652, Sep. 2010. https://doi.org/10.1002/gepi.20509
  11. P. Kraft and D. J. Hunter, "Genetic risk prediction-are we there yet?" New England Journal of Medicine, vol. 360, no. 17, pp. 1701-1703, Apr. 2009. https://doi.org/10.1056/NEJMp0810107
  12. T. A.Manolio et al., "Finding themissing heritability of complex diseases," Nature, vol. 461, no. 7265, pp. 747- 753, Oct. 2009. https://doi.org/10.1038/nature08494
  13. H. Siu, Y. Zhu, L. Jin, andM. Xiong, "Implication of next-generation sequencing on association studies," BMC genomics, vol. 12, no. 1, p. 322, Jun. 2011. https://doi.org/10.1186/1471-2164-12-322
  14. K. Kahrizi et al., "Next generation sequencing in a family with autosomal recessive Kahrizi syndrome (OMIM612713) reveals a homozygous frameshift mutation in SRD5A3," European journal of human genetics : EJHG, vol. 19, no. 1, pp. 115-117, 2011. https://doi.org/10.1038/ejhg.2010.132
  15. S. Uhmn, D.-H. Kim, Y.-W. Ko, S. Cho, J. Cheong, and J. Kim, "A study on application of single nucleotide polymorphism and machine learning techniques to diagnosis of chronic hepatitis," Expert Systems, vol. 26, no. 1, pp. 60-69, Feb. 2009. https://doi.org/10.1111/j.1468-0394.2008.00491.x
  16. Online Mendelian Inheritance in Man. http://www.ncbi.nlm.nih.gov/omim.
  17. S. J. Pocock, V.McCormack, F.Gueyffier, F. Boutitie, R. H. Fagard, and J.-P. Boissel, "A score for predicting risk of death fromcardiovascular disease in adults with raised blood pressure, based on individual patient data fromrandomized controlled trials," BMJ, vol. 323, no. 7304, pp. 75-81, Jul. 2001. https://doi.org/10.1136/bmj.323.7304.75
  18. P. M. Ridker, J. E. Buring, N. Rifai, and N. R. Cook, "Development and Validation of Improved Algorithms for the Assessment of Global Cardiovascular Risk in Women The Reynolds Risk Score," JAMA, vol. 297, no. 6, pp. 611-619, Feb. 2007. https://doi.org/10.1001/jama.297.6.611
  19. A Catalog of Genome-Wide Association Studies. http://www.genome.gov/26525384
  20. The Jackson Laboratory, http://www.jax.org.
  21. C. F. Deschepper, J. L. Olson, M. Otis, and N. Gallo-Payet, "Characterization of blood pressure and morphological traits in cardiovascular-related organs in 13 different inbredmouse strains," Journal of applied physiology (Bethesda,Md. : 1985), vol. 97, no. 1, pp. 369-376, Jul. 2004. https://doi.org/10.1152/japplphysiol.00073.2004
  22. P. Pudil and J. Novovicova, "Floating searchmethods in feature selection," Pattern recognition letters, 1994.
  23. M.H. Cho et al., "Cluster analysis in severe emphysema subjects using phenotype and genotype data: an exploratory investigation," Respiratory Research, 11:30, March 2010. https://doi.org/10.1186/1465-9921-11-30
  24. Y Guan and M Stephens, "Bayesian variable selection regression for genome-wide association studies and other large-scale problems," Ann. Appl. Stat. Volume 5, Number 3, pp.1780-1815, 2011. https://doi.org/10.1214/11-AOAS455