DOI QR코드

DOI QR Code

Evaluation of 16S rRNA Databases for Taxonomic Assignments Using a Mock Community

  • Park, Sang-Cheol (Institute of Health and Environment, Seoul National University) ;
  • Won, Sungho (Institute of Health and Environment, Seoul National University)
  • Received : 2018.11.15
  • Accepted : 2018.12.16
  • Published : 2018.12.31

Abstract

Taxonomic identification is fundamental to all microbiology studies. Particularly in metagenomics, which identifies the composition of microorganisms using thousands of sequences, its importance is even greater. Identification is inevitably affected by the choice of database. This study was conducted to evaluate the accuracy of three widely used 16S databases-Greengenes, Silva, and EzBioCloud-and to suggest basic guidelines for selecting reference databases. Using public mock community data, each database was used to assign taxonomy and to test its accuracy. We show that EzBioCloud performs well compared with other existing databases.

Keywords

References

  1. McDonald D, Price MN, Goodrich J, Nawrocki EP, DeSantis TZ, Probst A, et al. An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea. ISME J 2012;6:610-618. https://doi.org/10.1038/ismej.2011.139
  2. Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P, et al. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res 2013;41:D590-D596.
  3. Yoon SH, Ha SM, Kwon S, Lim J, Kim Y, Seo H, et al. Introducing EzBioCloud: a taxonomically united database of 16S rRNA gene sequences and whole-genome assemblies. Int J Syst Evol Microbiol 2017;67:1613-1617. https://doi.org/10.1099/ijsem.0.001755
  4. Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK, et al. QIIME allows analysis of high-throughput community sequencing data. Nat Methods 2010;7:335-336. https://doi.org/10.1038/nmeth.f.303
  5. Schirmer M, Ijaz UZ, D'Amore R, Hall N, Sloan WT, Quince C. Insight into biases and sequencing errors for amplicon sequencing with the Illumina MiSeq platform. Nucleic Acids Res 2015;43:e37. https://doi.org/10.1093/nar/gku1341
  6. Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J 2011;17: 10-12.
  7. Bokulich NA, Subramanian S, Faith JJ, Gevers D, Gordon JI, Knight R, et al. Quality-filtering vastly improves diversity estimates from Illumina amplicon sequencing. Nat Methods 2013;10:57-59. https://doi.org/10.1038/nmeth.2276
  8. Kwon S, Lee B, Yoon S. CASPER: context-aware scheme for paired-end reads from high-throughput amplicon sequencing. BMC Bioinformatics 2014;15 Suppl 9:S10.
  9. Rognes T, Flouri T, Nichols B, Quince C, Mahe F. VSEARCH: a versatile open source tool for metagenomics. PeerJ 2016;4:e2584. https://doi.org/10.7717/peerj.2584
  10. Edgar RC. Accuracy of microbial community diversity estimated by closed- and open-reference OTUs. PeerJ 2017;5:e3889. https://doi.org/10.7717/peerj.3889