DOI QR코드

DOI QR Code

Identification of the associations between genes and quantitative traits using entropy-based kernel density estimation

  • Yee, Jaeyong (Department of Physiology and Biophysics, Eulji University) ;
  • Park, Taesung (Department of Statistics, Seoul National University) ;
  • Park, Mira (Department of Preventive Medicine, Eulji University)
  • Received : 2022.05.18
  • Accepted : 2022.06.15
  • Published : 2022.06.30

Abstract

Genetic associations have been quantified using a number of statistical measures. Entropy-based mutual information may be one of the more direct ways of estimating the association, in the sense that it does not depend on the parametrization. For this purpose, both the entropy and conditional entropy of the phenotype distribution should be obtained. Quantitative traits, however, do not usually allow an exact evaluation of entropy. The estimation of entropy needs a probability density function, which can be approximated by kernel density estimation. We have investigated the proper sequence of procedures for combining the kernel density estimation and entropy estimation with a probability density function in order to calculate mutual information. Genotypes and their interactions were constructed to set the conditions for conditional entropy. Extensive simulation data created using three types of generating functions were analyzed using two different kernels as well as two types of multifactor dimensionality reduction and another probability density approximation method called m-spacing. The statistical power in terms of correct detection rates was compared. Using kernels was found to be most useful when the trait distributions were more complex than simple normal or gamma distributions. A full-scale genomic dataset was explored to identify associations using the 2-h oral glucose tolerance test results and γ-glutamyl transpeptidase levels as phenotypes. Clearly distinguishable single-nucleotide polymorphisms (SNPs) and interacting SNP pairs associated with these phenotypes were found and listed with empirical p-values.

Keywords

Acknowledgement

This research was supported by a National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (NRF-2021R1A2C1007788).

References

  1. Uffelmann E, Huang QQ, Munung NS, de Vries J, Okada Y, Martin AR, et al. Genome-wide association studies. Nat Rev Methods Primers 2021;1:59. https://doi.org/10.1038/s43586-021-00056-9
  2. Tam V, Patel N, Turcotte M, Bosse Y, Pare G, Meyre D. Benefits and limitations of genome-wide association studies. Nat Rev Genet 2019;20:467-484. https://doi.org/10.1038/s41576-019-0127-1
  3. McCarthy MI, Abecasis GR, Cardon LR, Goldstein DB, Little J, Ioannidis JP, et al. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet 2008;9:356-369. https://doi.org/10.1038/nrg2344
  4. Mackay TF. Q&A: Genetic analysis of quantitative traits. J Biol 2009;8:23. https://doi.org/10.1186/jbiol133
  5. Chobanian AV, Bakris GL, Black HR, Cushman WC, Green LA, Izzo JL Jr, et al. Seventh report of the Joint National Committee on Prevention, Detection, Evaluation, and Treatment of High Blood Pressure. Hypertension 2003;42:1206-1252. https://doi.org/10.1161/01.HYP.0000107251.49515.c2
  6. Ritchie MD, Hahn LW, Roodi N, Bailey LR, Dupont WD, Parl FF, et al. Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. Am J Hum Genet 2001;69:138-147. https://doi.org/10.1086/321276
  7. Kim K, Kwon MS, Oh S, Park T. Identification of multiple genegene interactions for ordinal phenotypes. BMC Med Genomics 2013;6:Suppl 2:S9.
  8. Lou XY, Chen GB, Yan L, Ma JZ, Zhu J, Elston RC, et al. A generalized combinatorial approach for detecting gene-by-gene and gene-by-environment interactions with application to nicotine dependence. Am J Hum Genet 2007;80:1125-1137. https://doi.org/10.1086/518312
  9. Gui J, Moore JH, Williams SM, Andrews P, Hillege HL, van der Harst P, et al. A simple and computationally efficient approach to multifactor dimensionality reduction analysis of gene-gene interactions for quantitative traits. PLoS One 2013;8:e66545. https://doi.org/10.1371/journal.pone.0066545
  10. Chanda P, Costa E, Hu J, Sukumar S, Van Hemert J, Walia R. Information theory in computational biology: where we stand today. Entropy (Basel) 2020;22:627. https://doi.org/10.3390/e22060627
  11. Gray RM. Entropy and Information Theory. 2nd ed. New York: Springer, 2011.
  12. Paninski L. Estimation of entropy and mutual information. Neural Comput 2003;15:1191-1253. https://doi.org/10.1162/089976603321780272
  13. Song L, Langfelder P, Horvath S. Comparison of co-expression measures: mutual information, correlation, and model based indices. BMC Bioinformatics 2012;13:328. https://doi.org/10.1186/1471-2105-13-328
  14. Zeng G. A unified definition of mutual information with applications in machine learning. Math Problems Eng 2015;2015:201874.
  15. Ross BC. Mutual information between discrete and continuous data sets. PLoS One 2014;9:e87357. https://doi.org/10.1371/journal.pone.0087357
  16. Dong C, Chu X, Wang Y, Wang Y, Jin L, Shi T, et al. Exploration of gene-gene interaction effects using entropy-based methods. Eur J Hum Genet 2008;16:229-235. https://doi.org/10.1038/sj.ejhg.5201921
  17. Yee J, Kwon MS, Park T, Park M. A modified entropy-based approach for identifying gene-gene interactions in case-control study. PLoS One 2013;8:e69321. https://doi.org/10.1371/journal.pone.0069321
  18. Chanda P, Sucheston L, Liu S, Zhang A, Ramanathan M. Information-theoretic gene-gene and gene-environment interaction analysis of quantitative traits. BMC Genomics 2009;10:509. https://doi.org/10.1186/1471-2164-10-509
  19. Yee J, Kwon MS, Jin S, Park T, Park M. Detecting genetic interactions for quantitative traits using m-Spacing entropy measure. Biomed Res Int 2015;2015:523641.
  20. Silverman BW. Density estimation for statistics and data analysis. London: Chapman and Hall, 1986.
  21. Cho YS, Go MJ, Kim YJ, Heo JY, Oh JH, Ban HJ, et al. A largescale genome-wide association study of Asian populations uncovers genetic factors influencing eight quantitative traits. Nat Genet 2009;41:527-534. https://doi.org/10.1038/ng.357
  22. Hall P, Morton SC. On the estimation of entropy. Ann Inst Stat Math 1993;45:69-88. https://doi.org/10.1007/BF00773669
  23. Jones MC. The performance of kernel density functions in kernel distribution function estimation. Stat Prob Lett 1990;9:129-132. https://doi.org/10.1016/0167-7152(92)90006-Q
  24. Charpentier A, Flachaire E. Log-transform kernel density estimation of income distribution. Actual Econ 2015;91:141-159.
  25. Fortmann-Roe S, Starfield R, Getz WM. Contingent kernel density estimation. PLoS One 2012;7:e30549. https://doi.org/10.1371/journal.pone.0030549
  26. Sheather SJ. Density estimation. Stat Sci 2004;19:588-597. https://doi.org/10.1214/088342304000000297
  27. Velez DR, White BC, Motsinger AA, Bush WS, Ritchie MD, Williams SM, et al. A balanced accuracy function for epistasis modeling in imbalanced datasets using multifactor dimensionality reduction. Genet Epidemiol 2007;31:306-315. https://doi.org/10.1002/gepi.20211
  28. American Diabetes A. Diagnosis and classification of diabetes mellitus. Diabetes Care 2005;28:Suppl 1:S37-S42. https://doi.org/10.2337/diacare.28.suppl_1.S37
  29. Erlich HA, Valdes AM, Julier C, Mirel D, Noble JA; Type I Diabetes Genetics Consortium. Evidence for association of the TCF7 locus with type I diabetes. Genes Immun 2009;10:Suppl 1:S54-S59. https://doi.org/10.1038/gene.2009.92
  30. Kanazawa A, Kawamura Y, Sekine A, Iida A, Tsunoda T, Kashiwagi A, et al. Single nucleotide polymorphisms in the gene encoding Kruppel-like factor 7 are associated with type 2 diabetes. Diabetologia 2005;48:1315-1322. https://doi.org/10.1007/s00125-005-1797-0
  31. Lamkin DM, Spitz DR, Shahzad MM, Zimmerman B, Lenihan DJ, Degeest K, et al. Glucose as a prognostic factor in ovarian carcinoma. Cancer 2009;115:1021-1027. https://doi.org/10.1002/cncr.24126
  32. Song H, Ramus SJ, Shadforth D, Quaye L, Kjaer SK, Dicioccio RA, et al. Common variants in RB1 gene and risk of invasive ovarian cancer. Cancer Res 2006;66:10220-10226. https://doi.org/10.1158/0008-5472.CAN-06-2222
  33. Gallo de Moraes A, Surani S. Effects of diabetic ketoacidosis in the respiratory system. World J Diabetes 2019;10:16-22. https://doi.org/10.4239/wjd.v10.i1.16
  34. Li X, Jin T, Zhang M, Yang H, Huang X, Zhou X, et al. Genome-wide association study of high-altitude pulmonary edema in a Han Chinese population. Oncotarget 2017;8:31568-31580. https://doi.org/10.18632/oncotarget.16362
  35. Cho SB, Jang J. A genome-wide association study of a Korean population identifies genetic susceptibility to hypertension based on sex-specific differences. Genes (Basel) 2021;12:1804. https://doi.org/10.3390/genes12111804
  36. Kim J, Oh B, Lim JE, Kim MK. No interaction with alcohol consumption, but independent effect of C12orf51 (HECTD4) on type 2 diabetes mellitus in Korean adults aged 40-69 years: the KoGES_Ansan and Ansung Study. PLoS One 2016;11:e0149321. https://doi.org/10.1371/journal.pone.0149321
  37. Lee YS, Cho Y, Burgess S, Davey Smith G, Relton CL, Shin SY, et al. Serum gamma-glutamyl transferase and risk of type 2 diabetes in the general Korean population: a Mendelian randomization study. Hum Mol Genet 2016;25:3877-3886. https://doi.org/10.1093/hmg/ddw226