Browse > Article
http://dx.doi.org/10.1186/s40781-015-0081-1

Comparison of three boosting methods in parent-offspring trios for genotype imputation using simulation study  

Mikhchi, Abbas (Department of Animal Science, Science and Research Branch, Islamic Azad University)
Honarvar, Mahmood (Department of Animal Science, Shahr-e-Qods Branch, Islamic Azad University)
Kashan, Nasser Emam Jomeh (Department of Animal Science, Science and Research Branch, Islamic Azad University)
Zerehdaran, Saeed (3Department of Animal Science, Ferdowsi University of Mashhad)
Aminafshar, Mehdi (Department of Animal Science, Science and Research Branch, Islamic Azad University)
Publication Information
Journal of Animal Science and Technology / v.58, no.1, 2016 , pp. 1.1-1.6 More about this Journal
Abstract
Background: Genotype imputation is an important process of predicting unknown genotypes, which uses reference population with dense genotypes to predict missing genotypes for both human and animal genetic variations at a low cost. Machine learning methods specially boosting methods have been used in genetic studies to explore the underlying genetic profile of disease and build models capable of predicting missing values of a marker. Methods: In this study strategies and factors affecting the imputation accuracy of parent-offspring trios compared from lower-density SNP panels (5 K) to high density (10 K) SNP panel using three different Boosting methods namely TotalBoost (TB), LogitBoost (LB) and AdaBoost (AB). The methods employed using simulated data to impute the un-typed SNPs in parent-offspring trios. Four different datasets of G1 (100 trios with 5 k SNPs), G2 (100 trios with 10 k SNPs), G3 (500 trios with 5 k SNPs), and G4 (500 trio with 10 k SNPs) were simulated. In four datasets all parents were genotyped completely, and offspring genotyped with a lower density panel. Results: Comparison of the three methods for imputation showed that the LB outperformed AB and TB for imputation accuracy. The time of computation were different between methods. The AB was the fastest algorithm. The higher SNP densities resulted the increase of the accuracy of imputation. Larger trios (i.e. 500) was better for performance of LB and TB. Conclusions: The conclusion is that the three methods do well in terms of imputation accuracy also the dense chip is recommended for imputation of parent-offspring trios.
Keywords
Trios; Boosting methods; Imputation accuracy; Computation time;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Meuwissen TH, Hayes BJ, Goddard ME. Prediction of total genetic value using genome-wide dense marker maps. Genetics. 2001;157(4):1819-29.
2 Boichard D, Chung H, Dassonneville R, David X, Eggen A, Fritz S, et al. Design of a bovine low-density SNP array optimized for imputation. PLoS One. 2012;7(3), e34130.   DOI
3 Chen J, Zhang J-G, Li J, Pei Y-F, Deng H-W. On combining reference data to improve imputation accuracy. PLoS One. 2013;8(1), e55600.   DOI
4 Li Y, Willer C, Sanna S, Abecasis G. Genotype imputation. Annu Rev Genomics Hum Genet. 2010;10:387-406.
5 Hu Y, Lin D. Analysis of untyped SNPs: maximum likelihood and imputation methods. Genet Epidemiol. 2010;34(8):803-15.   DOI
6 Sargolzaei M, Jansen GB, Schenkel FS. A new approach for efficient genotype imputation using information from relatives. BMC Genomics. 2014;15:478.   DOI
7 Lu AT, Cantor RM. Identifying rare-variant associations in parent-child trios using a Gaussian support vector machine. BMC Proc. 2014;8 Suppl 1:S98.   DOI
8 Wellmann R, Preuss S, Tholen E, Heinkel J, Wimmers K , Bennewitz J. Genomic selection using low density marker panels with application to a sire line in pigs. Genet Sel Evol. 2013;45:28.   DOI
9 Wang Y, Cai Z, Stothard P, Moore S, Goebel R, Wang L, Lin G. Fast accurate missing SNP genotype local imputation. BMC Res Notes. 2012;5:404.   DOI
10 Goddard R, Eccles D, Ennis S, Rafiq S, Tapper W, Fliege J, Collins A. Support vector machine classifier for estrogen receptor positive and negative earlyonset breast cancer. PLoS One. 2013;8(7), e68606.   DOI
11 Dettling M, Bühlmann P. Boosting for tumor classification with gene expression data. Bioinformatics. 2003;9:1061-9.
12 Sateesh B. Boosting techniques on rarity mining. IJARCSSE. 2012;2:10.
13 R Development Core Team. R: a language and environment for statistical computing, Vienna. 2014, Available at: http://www.r-project.org/.
14 Technow AF. hypred: simulation of genomic data in applied genetics. R package version 0.5. 2015, Available at: http://CRAN.R-project.org/src/contrib/Archive/hypred/.
15 MATLAB; 2014. http://www.mathworks.com.
16 Hastie T, Tibshirani R, Friedman J. The elements of statistical learning, Data Mining, Inference, and Prediction. Stanford, California.2nd ed. Springer. 2001.
17 Ogutu JO, Piepho HP, Streeck TS. A comparison of random forests, boosting and support vector machines for genomic selection. BMC Proc. 2011;5 Suppl 3:S11.
18 Rutkoski JE, Poland J, Jannink J, Sorrells ME. Imputation of unordered markers and the impact on genomic selection accuracy. G3 (Bethesda). 2013;3:427-39.
19 Weigel KA, Van Tassell CP, O'Connell JR, VanRaden PM, Wiggans GR. Prediction of unobserved single nucleotide polymorphism genotypes of Jersey cattle using reference panels and population-based imputation algorithms. J Dairy Sci. 2010;93:2229-38.   DOI
20 Van Raden PM, Null DJ, Sargolzaei M, Wiggans GR, Tooker ME, Cole JB, et al. Genomic imputation and evaluation using high-density Holstein genotypes. J Dairy Sci. 2013;96:668-78.   DOI
21 Okser S, Pahikkala T, Airola A, Salakoski T, Ripatti S, Aittokallio T. Regularized machine learning in the genetic prediction of complex traits. PLoS Genet. 2014;10(11), e1004754.   DOI
22 Sun J, Zhao H. The application of sparse estimation of covariance matrix to quadratic discriminant analysis. BMC Bioinformatics. 2015;16:48.   DOI
23 Chen W, Zhang JG, Li J, Pei YF, Deng HW. Genotype calling and haplotyping in parent-offspring trios. Genome Res. 2013;23:142-51.   DOI