DOI QR코드

DOI QR Code

Statistical analysis of metagenomics data

  • Calle, M. Luz (Biosciences Department, Faculty of Science and Technology, University of Vic - Central University of Catalonia)
  • Received : 2019.01.14
  • Accepted : 2018.02.20
  • Published : 2019.03.31

Abstract

Understanding the role of the microbiome in human health and how it can be modulated is becoming increasingly relevant for preventive medicine and for the medical management of chronic diseases. The development of high-throughput sequencing technologies has boosted microbiome research through the study of microbial genomes and allowing a more precise quantification of microbiome abundances and function. Microbiome data analysis is challenging because it involves high-dimensional structured multivariate sparse data and because of its compositional nature. In this review we outline some of the procedures that are most commonly used for microbiome analysis and that are implemented in R packages. We place particular emphasis on the compositional structure of microbiome data. We describe the principles of compositional data analysis and distinguish between standard methods and those that fit into compositional data analysis.

Keywords

References

  1. Young VB. The role of the microbiome in human health and disease: an introduction for clinicians. BMJ 2017;356:j831. https://doi.org/10.1136/bmj.j831
  2. Cho I, Blaser MJ. The human microbiome: at the interface of health and disease. Nat Rev Genet 2012;13:260-270. https://doi.org/10.1038/nrg3182
  3. Amato KR. An introduction to microbiome analysis for human biology applications. Am J Hum Biol 2017;29:e22931. https://doi.org/10.1002/ajhb.22931
  4. Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, et al. Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol 2009;75:7537-7541. https://doi.org/10.1128/AEM.01541-09
  5. Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK, et al. QIIME allows analysis of high-throughput community sequencing data. Nat Methods 2010;7:335-336. https://doi.org/10.1038/nmeth.f.303
  6. Franzosa EA, McIver LJ, Rahnavard G, Thompson LR, Schirmer M, Weingart G, et al. Species-level functional profiling of metagenomes and metatranscriptomes. Nat Methods 2018;15:962-968. https://doi.org/10.1038/s41592-018-0176-y
  7. Truong DT, Franzosa EA, Tickle TL, Scholz M, Weingart G, Pasolli E, et al. MetaPhlAn2 for enhanced metagenomic taxonomic profiling. Nat Methods 2015;12:902-903. https://doi.org/10.1038/nmeth.3589
  8. Pawlowsky-Glahn V, Egozcue JJ, Tolosana-Delgado R. Modelling and Analysis of Compositional Data. New York: John Wiley & Sons, 2015.
  9. Gloor GB, Wu JR, Pawlowsky-Glahn V, Egozcue JJ. It's all relative: analyzing microbiome data as compositions. Ann Epidemiol 2016;26:322-329. https://doi.org/10.1016/j.annepidem.2016.03.003
  10. Gloor GB, Reid G. Compositional analysis: a valid approach to analyze microbiome high-throughput sequencing data. Can J Microbiol 2016;62:692-703. https://doi.org/10.1139/cjm-2015-0821
  11. Aitchison J. The Statistical Analysis of Compositional Data. London: Chapman & Hall, 1986.
  12. Egozcue JJ, Pawlowsky-Glahn V, Mateu-Figueras G, Barcelo-Vidal C. Isometric logratio transformations for compositional data analysis. Math Geol 2003;35:279-300. https://doi.org/10.1023/A:1023818214614
  13. McMurdie PJ, Holmes S. phyloseq: an R package for reproducible interactive analysis and graphics of microbiome census data. PLoS One 2013;8:e61217. https://doi.org/10.1371/journal.pone.0061217
  14. McMurdie PJ, Holmes S. Waste not, want not: why rarefying microbiome data is inadmissible. PLoS Comput Biol 2014;10:e1003531. https://doi.org/10.1371/journal.pcbi.1003531
  15. Anders S, Huber W. Differential expression analysis for sequence count data. Genome Biol 2010;11:R106. https://doi.org/10.1186/gb-2010-11-10-r106
  16. Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 2010;26:139-140. https://doi.org/10.1093/bioinformatics/btp616
  17. Weiss S, Xu ZZ, Peddada S, Amir A, Bittinger K, Gonzalez A, et al. Normalization and microbial differential abundance strategies depend upon data characteristics. Microbiome 2017;5:27. https://doi.org/10.1186/s40168-017-0237-y
  18. Martin-Fernandez JA, Hron K, Templ M, Filzmoser P, Palarea-Albaladejo J. Bayesian-multiplicative treatment of count zeros in compositional data sets. Stat Model 2015;15:134-158. https://doi.org/10.1177/1471082X14535524
  19. Oksanen J, Blanchet FG, Friendly M, Kindt R, Legendre P, McGlinn D, et al. vegan: Community Ecology Package. R package version 2.5-2. The Comprehensive R Archive Network, 2018. Accessed 2018 Dec 20. Available from: https://CRAN.R-project.org/package=vegan.
  20. Lozupone C, Knight R. UniFrac: a new phylogenetic method for comparing microbial communities. Appl Environ Microbiol 2005;71:8228-8235. https://doi.org/10.1128/AEM.71.12.8228-8235.2005
  21. Aitchison J. A concise guide to compositional data analysis. 2005. Accessed 2019 Feb 14. Available from: http://www.leg.ufpr.br/lib/exe/fetch.php/pessoais:abtmartins:a_concise_guide_to_compositional_data_analysis.pdf.
  22. Ramette A. Multivariate analyses in microbial ecology. FEMS Microbiol Ecol 2007;62:142-160. https://doi.org/10.1111/j.1574-6941.2007.00375.x
  23. Greenacre M, Primicerio R. Multivariate Analysis of Ecological Data. Bilbao: Fundacion BBVA, 2014.
  24. Le Cao KA, Costello ME, Lakis VA, Bartolo F, Chua XY, Brazeilles R, et al. MixMC: a multivariate statistical framework to gain insight into microbial communities. PLoS One 2016;11:e0160169. https://doi.org/10.1371/journal.pone.0160169
  25. Anderson MJ. A new method for non-parametric multivariate analysis of variance. Austral Ecol 2001;26:32-46. https://doi.org/10.1111/j.1442-9993.2001.01070.pp.x
  26. Clarke KR. Non-parametric multivariate analyses of changes in community structure. Aust J Ecol 1993;18:117-143. https://doi.org/10.1111/j.1442-9993.1993.tb00438.x
  27. Liu D, Lin X, Ghosh D. Semiparametric regression of multidimensional genetic pathway data: least-squares kernel machines and linear mixed models. Biometrics 2007;63:1079-1088. https://doi.org/10.1111/j.1541-0420.2007.00799.x
  28. Zhao N, Chen J, Carroll IM, Ringel-Kulka T, Epstein MP, Zhou H, et al. Testing in microbiome-profiling studies with MiRKAT, the microbiome regression-based kernel association test. Am J Hum Genet 2015;96:797-807. https://doi.org/10.1016/j.ajhg.2015.04.003
  29. Rivera-Pinto J. Statistical methods for the analysis of microbiome compositional data in HIV studies. Ph.D. Dissertation. Barcelona: University of Vic - Central University of Catalonia, 2018.
  30. La Rosa PS, Brooks JP, Deych E, Boone EL, Edwards DJ, Wang Q, et al. Hypothesis testing and power calculations for taxonomic-based human microbiome data. PLoS One 2012;7:e52078. https://doi.org/10.1371/journal.pone.0052078
  31. Le Cao KA, Boitard S, Besse P. Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems. BMC Bioinformatics 2011;12:253. https://doi.org/10.1186/1471-2105-12-253
  32. Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc Series B Methodol 1996;58:267-288.
  33. Thorsen J, Brejnrod A, Mortensen M, Rasmussen MA, Stokholm J, Al-Soud WA, et al. Large-scale benchmarking reveals false discoveries and count transformation sensitivity in 16S rRNA gene amplicon data analysis methods used in microbiome studies. Microbiome 2016;4:62. https://doi.org/10.1186/s40168-016-0208-8
  34. Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 2014;15:550. https://doi.org/10.1186/s13059-014-0550-8
  35. Mandal S, Van Treuren W, White RA, Eggesbo M, Knight R, Peddada SD. Analysis of composition of microbiomes: a novel method for studying microbial composition. Microb Ecol Health Dis 2015;26:27663.
  36. Fernandes AD, Macklaim JM, Linn TG, Reid G, Gloor GB. ANOVA-like differential expression (ALDEx) analysis for mixed population RNA-Seq. PLoS One 2013;8:e67019. https://doi.org/10.1371/journal.pone.0067019
  37. Rivera-Pinto J, Egozcue JJ, Pawlowsky-Glahn V, Paredes R, Noguera-Julian M, Calle ML. Balances: a new perspective for microbiome analysis. mSystems 2018;3:e00053. -18.
  38. Gloor GB, Macklaim JM, Pawlowsky-Glahn V, Egozcue JJ. Microbiome datasets are compositional: and this is not optional. Front Microbiol 2017;8:2224. https://doi.org/10.3389/fmicb.2017.02224