Analytical Tools and Databases for Metagenomics in the Next-Generation Sequencing Era |
Kim, Mincheol
(School of Biological Sciences & Institute of Bioinformatics (BIOMAX), Seoul National University)
Lee, Ki-Hyun (School of Biological Sciences & Institute of Bioinformatics (BIOMAX), Seoul National University) Yoon, Seok-Whan (School of Biological Sciences & Institute of Bioinformatics (BIOMAX), Seoul National University) Kim, Bong-Soo (Chunlab Inc., Seoul National University) Chun, Jongsik (School of Biological Sciences & Institute of Bioinformatics (BIOMAX), Seoul National University) Yi, Hana (Department of Environmental Health, Korea University) |
1 | Gianoulis TA, Raes J, Patel PV, Bjornson R, Korbel JO, Letunic I, et al. Quantifying environmental adaptation of metabolic pathways in metagenomics. Proc Natl Acad Sci U S A 2009;106:1374-1379. DOI |
2 | Peng Y, Leung HC, Yiu SM, Chin FY. Meta-IDBA: a de novo assembler for metagenomic data. Bioinformatics 2011;27:i94- i101. DOI |
3 | Yutin N, Suzuki MT, Teeling H, Weber M, Venter JC, Rusch DB, et al. Assessing diversity and biogeography of aerobic anoxygenic phototrophic bacteria in surface waters of the Atlantic and Pacific Oceans using the Global Ocean Sampling expedition metagenomes. Environ Microbiol 2007;9:1464- 1475. DOI |
4 | Li R, Zhu H, Ruan J, Qian W, Fang X, Shi Z, et al. De novo assembly of human genomes with massively parallel short read sequencing. Genome Res 2010;20:265-272. DOI |
5 | Laserson J, Jojic V, Koller D. Genovo: de novo assembly for metagenomes. J Comput Biol 2011;18:429-443. DOI |
6 | Namiki T, Hachiya T, Tanaka H, Sakakibara Y. MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads. Nucleic Acids Res 2012;40:e155. DOI |
7 | Lai B, Ding R, Li Y, Duan L, Zhu H. A de novo metagenomic assembly program for shotgun DNA reads. Bioinformatics 2012;28:1455-1462. DOI |
8 | Boisvert S, Raymond F, Godzaridis E, Laviolette F, Corbeil J. Ray Meta: scalable de novo metagenome assembly and profiling. Genome Biol 2012;13:R122. DOI |
9 | Koren S, Treangen TJ, Pop M. Bambus 2: scaffolding metagenomes. Bioinformatics 2011;27:2964-2971. DOI |
10 | Gori F, Folino G, Jetten MS, Marchiori E. MTR: taxonomic annotation of short metagenomic reads using clustering at multiple taxonomic ranks. Bioinformatics 2011;27:196-203. DOI |
11 | Gerlach W, Stoye J. Taxonomic classification of metagenomic shotgun sequences with CARMA3. Nucleic Acids Res 2011; 39:e91. DOI |
12 | Monzoorul Haque M, Ghosh TS, Komanduri D, Mande SS. SOrt-ITEMS: sequence orthology based approach for improved taxonomic estimation of metagenomic sequences. Bioinformatics 2009;25:1722-1730. DOI |
13 | Glass EM, Wilkening J, Wilke A, Antonopoulos D, Meyer F. Using the metagenomics RAST server (MG-RAST) for analyzing shotgun metagenomes. Cold Spring Harb Protoc 2010; 2010:pdb.prot5368. |
14 | Overbeek R, Begley T, Butler RM, Choudhuri JV, Chuang HY, Cohoon M, et al. The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes. Nucleic Acids Res 2005;33:5691-5702. DOI |
15 | Wu M, Eisen JA. A simple, fast, and accurate method of phylogenomic inference. Genome Biol 2008;9:R151. DOI |
16 | Liu B, Gibbons T, Ghodsi M, Treangen T, Pop M. Accurate and fast estimation of taxonomic profiles from metagenomic shotgun sequences. BMC Genomics 2011;12 Suppl 2:S4. |
17 | Rosen GL, Reichenberger ER, Rosenfeld AM. NBC: the naive Bayes classification tool webserver for taxonomic classification of metagenomic reads. Bioinformatics 2011;27: 127-129. DOI |
18 | Brady A, Salzberg SL. Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models. Nat Methods 2009;6:673-676. DOI |
19 | McHardy AC, Martin HG, Tsirigos A, Hugenholtz P, Rigoutsos I. Accurate phylogenetic classification of variable- length DNA fragments. Nat Methods 2007;4:63-72. DOI |
20 | Segata N, Waldron L, Ballarini A, Narasimhan V, Jousson O, Huttenhower C. Metagenomic microbial community profiling using unique clade-specific marker genes. Nat Methods 2012;9:811-814. DOI |
21 | Diaz NN, Krause L, Goesmann A, Niehaus K, Nattkemper TW. TACOA: taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach. BMC Bioinformatics 2009;10:56. DOI |
22 | Weber M, Teeling H, Huang S, Waldmann J, Kassabgy M, Fuchs BM, et al. Practical application of self-organizing maps to interrelate biodiversity and functional data in NGS-based metagenomics. ISME J 2011;5:918-928. DOI |
23 | Seshadri R, Kravitz SA, Smarr L, Gilna P, Frazier M. CAMERA: a community resource for metagenomics. PLoS Biol 2007;5:e75. DOI |
24 | MacDonald NJ, Parks DH, Beiko RG. Rapid identification of high-confidence taxonomic assignments for metagenomic data. Nucleic Acids Res 2012;40:e111. DOI |
25 | Markowitz VM, Chen IM, Chu K, Szeto E, Palaniappan K, Grechkin Y, et al. IMG/M: the integrated metagenome data management and comparative analysis system. Nucleic Acids Res 2012;40:D123-D129. DOI |
26 | Goll J, Rusch DB, Tanenbaum DM, Thiagarajan M, Li K, Methe BA, et al. METAREP: JCVI metagenomics reports: an open source tool for high-performance comparative metagenomics. Bioinformatics 2010;26:2631-2632. DOI |
27 | Meyer F, Paarmann D, D'Souza M, Olson R, Glass EM, Kubal M, et al. The metagenomics RAST server: a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformatics 2008;9:386. DOI |
28 | Rho M, Tang H, Ye Y. FragGeneScan: predicting genes in short and error-prone reads. Nucleic Acids Res 2010;38:e191. DOI |
29 | Noguchi H, Park J, Takagi T. MetaGene: prokaryotic gene finding from environmental genome shotgun sequences. Nucleic Acids Res 2006;34:5623-5630. DOI |
30 | Noguchi H, Taniguchi T, Itoh T. MetaGeneAnnotator: detecting species-specific patterns of ribosomal binding site for precise gene prediction in anonymous prokaryotic and phage genomes. DNA Res 2008;15:387-396. DOI |
31 | Hoff KJ, Lingner T, Meinicke P, Tech M. Orphelia: predicting genes in metagenomic sequencing reads. Nucleic Acids Res 2009;37:W101-W105. DOI |
32 | Kelley DR, Liu B, Delcher AL, Pop M, Salzberg SL. Gene prediction with Glimmer for metagenomic sequences augmented by classification and clustering. Nucleic Acids Res 2012;40:e9. DOI |
33 | Zhu W, Lomsadze A, Borodovsky M. Ab initio gene identification in metagenomic sequences. Nucleic Acids Res 2010;38:e132. DOI |
34 | Prakash T, Taylor TD. Functional assignment of metagenomic data: challenges and applications. Brief Bioinform 2012;13:711-727. DOI |
35 | Hayes WS, Borodovsky M. How to interpret an anonymous bacterial genome: machine learning approach to gene identification. Genome Res 1998;8:1154-1171. DOI |
36 | Yada T, Nakao M, Totoki Y, Nakai K. Modeling and predicting transcriptional units of Escherichia coli genes using hidden Markov models. Bioinformatics 1999;15:987-993. DOI |
37 | Nguyen MN, Ma J, Fogel GB, Rajapakse JC. Di-codon usage for classification of genes. Biosystems 2009;98:1-6. DOI |
38 | Hyatt D, Chen GL, Locascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 2010; 11:119. DOI |
39 | Tanenbaum DM, Goll J, Murphy S, Kumar P, Zafar N, Thiagarajan M, et al. The JCVI standard operating procedure for annotating prokaryotic metagenomic shotgun sequencing data. Stand Genomic Sci 2010;2:229-237. DOI |
40 | Salzberg SL, Delcher AL, Kasif S, White O. Microbial gene identification using interpolated Markov models. Nucleic Acids Res 1998;26:544-548. DOI |
41 | Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, et al. The COG database: an updated version includes eukaryotes. BMC Bioinformatics 2003;4:41. DOI |
42 | Powell S, Szklarczyk D, Trachana K, Roth A, Kuhn M, Muller J, et al. eggNOG v3.0: orthologous groups covering 1133 organisms at 41 different taxonomic ranges. Nucleic Acids Res 2012;40:D284-D289. DOI |
43 | Punta M, Coggill PC, Eberhardt RY, Mistry J, Tate J, Boursnell C, et al. The Pfam protein families database. Nucleic Acids Res 2012;40:D290-D301. DOI |
44 | Haft DH, Selengut JD, White O. The TIGRFAMs database of protein families. Nucleic Acids Res 2003;31:371-373. DOI |
45 | Fuhrman JA, Steele JA, Hewson I, Schwalbach MS, Brown MV, Green JL, et al. A latitudinal diversity gradient in planktonic marine bacteria. Proc Natl Acad Sci U S A 2008; 105:7774-7778. DOI |
46 | Gilles A, Meglecz E, Pech N, Ferreira S, Malausa T, Martin JF. Accuracy and quality assessment of 454 GS-FLX titanium pyrosequencing. BMC Genomics 2011;12:245. DOI |
47 | Handelsman J, Rondon MR, Brady SF, Clardy J, Goodman RM. Molecular biological access to the chemistry of unknown soil microbes: a new frontier for natural products. Chem Biol 1998;5:R245-R249. DOI |
48 | Beja O, Aravind L, Koonin EV, Suzuki MT, Hadd A, Nguyen LP, et al. Bacterial rhodopsin: evidence for a new type of phototrophy in the sea. Science 2000;289:1902-1906. DOI |
49 | Fierer N, Leff JW, Adams BJ, Nielsen UN, Bates ST, Lauber CL, et al. Cross-biome metagenomic analyses of soil microbial communities and their functional attributes. Proc Natl Acad Sci U S A 2012;109:21390-21395. DOI |
50 | Metzker ML. Sequencing technologies: the next generation. Nat Rev Genet 2010;11:31-46. DOI |
51 | Teeling H, Glockner FO. Current opportunities and challenges in microbial metagenome analysis: a bioinformatic perspective. Brief Bioinform 2012;13:728-742. DOI |
52 | Suenaga H. Targeted metagenomics: a high-resolution metagenomics approach for specific gene clusters in complex microbial communities. Environ Microbiol 2012;14:13-22. DOI |
53 | Fox GE, Wisotzkey JD, Jurtshuk P Jr. How close is close: 16S rRNA sequence identity may not be sufficient to guarantee species identity. Int J Syst Bacteriol 1992;42:166-170. DOI |
54 | Kunin V, Copeland A, Lapidus A, Mavromatis K, Hugenholtz P. A bioinformatician's guide to metagenomics. Microbiol Mol Biol Rev 2008;72:557-578. DOI |
55 | Bragg L, Stone G, Imelfort M, Hugenholtz P, Tyson GW. Fast, accurate error-correction of amplicon pyrosequences using Acacia. Nat Methods 2012;9:425-426. DOI |
56 | Franceschini A, Szklarczyk D, Frankild S, Kuhn M, Simonovic M, Roth A, et al. STRING v9.1: protein-protein interaction networks, with increased coverage and integration. Nucleic Acids Res 2013;41:D808-D815. DOI |
57 | Eddy SR. Accelerated profile HMM searches. PLoS Comput Biol 2011;7:e1002195. DOI |
58 | Meyer F, Overbeek R, Rodriguez A. FIGfams: yet another set of protein families. Nucleic Acids Res 2009;37:6643-6654. DOI |
59 | Bairoch A. The ENZYME database in 2000. Nucleic Acids Res 2000;28:304-305. DOI |
60 | Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 2000;28:27-30. DOI |
61 | Claudel-Renard C, Chevalet C, Faraut T, Kahn D. Enzymespecific profiles for genome annotation: PRIAM. Nucleic Acids Res 2003;31:6633-6639. DOI |
62 | Caspi R, Altman T, Dreher K, Fulcher CA, Subhraveti P, Keseler IM, et al. The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/ genome databases. Nucleic Acids Res 2012;40:D742- D753. DOI |
63 | Liu B, Pop M. MetaPath: identifying differentially abundant metabolic pathways in metagenomic datasets. BMC Proc 2011;5 Suppl 2:S9. |
64 | Parks DH, Beiko RG. Identifying biologically relevant differences between metagenomic communities. Bioinformatics 2010;26:715-721. DOI |
65 | Yilmaz P, Kottmann R, Field D, Knight R, Cole JR, Amaral-Zettler L, et al. Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications. Nat Biotechnol 2011;29:415-420. DOI |
66 | Stepanauskas R. Single cell genomics: an individual look at microbes. Curr Opin Microbiol 2012;15:613-620. DOI |
67 | Baran Y, Halperin E. Joint analysis of multiple metagenomic samples. PLoS Comput Biol 2012;8:e1002373. DOI |
68 | Rosen MJ, Callahan BJ, Fisher DS, Holmes SP. Denoising PCR-amplified metagenome data. BMC Bioinformatics 2012; 13:283. DOI |
69 | Quince C, Lanzen A, Davenport RJ, Turnbaugh PJ. Removing noise from pyrosequenced amplicons. BMC Bioinformatics 2011;12:38. DOI |
70 | Reeder J, Knight R. Rapidly denoising pyrosequencing amplicon reads by exploiting rank-abundance distributions. Nat Methods 2010;7:668-669. |
71 | Edgar RC, Haas BJ, Clemente JC, Quince C, Knight R. UCHIME improves sensitivity and speed of chimera detection. Bioinformatics 2011;27:2194-2200. DOI |
72 | Haas BJ, Gevers D, Earl AM, Feldgarden M, Ward DV, Giannoukos G, et al. Chimeric 16S rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR amplicons. Genome Res 2011;21:494-504. DOI |
73 | Wright ES, Yilmaz LS, Noguera DR. DECIPHER, a searchbased approach to chimera identification for 16S rRNA sequences. Appl Environ Microbiol 2012;78:717-725. DOI |
74 | Edgar RC. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 2010;26:2460-2461. DOI |
75 | Fu L, Niu B, Zhu Z, Wu S, Li W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 2012;28:3150-3152. DOI |
76 | Cai Y, Sun Y. ESPRIT-Tree: hierarchical clustering analysis of millions of 16S rRNA pyrosequences in quasilinear computational time. Nucleic Acids Res 2011;39:e95. DOI |
77 | DeSantis TZ, Hugenholtz P, Larsen N, Rojas M, Brodie EL, Keller K, et al. Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl Environ Microbiol 2006;72:5069-5072. DOI |
78 | Lee JH, Yi H, Jeon YS, Won S, Chun J. TBC: a clustering algorithm based on prokaryotic taxonomy. J Microbiol 2012;50: 181-185. DOI |
79 | Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P, et al. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res 2013;41:D590-D596. DOI |
80 | Cole JR, Wang Q, Cardenas E, Fish J, Chai B, Farris RJ, et al. The Ribosomal Database Project: improved alignments and new tools for rRNA analysis. Nucleic Acids Res 2009;37: D141-D145. DOI |
81 | Kim OS, Cho YJ, Lee K, Yoon SH, Kim M, Na H, et al. Introducing EzTaxon-e: a prokaryotic 16S rRNA gene sequence database with phylotypes that represent uncultured species. Int J Syst Evol Microbiol 2012;62(Pt 3):716-721. DOI |
82 | Abarenkov K, Henrik Nilsson R, Larsson KH, Alexander IJ, Eberhardt U, Erland S, et al. The UNITE database for molecular identification of fungi: recent updates and future perspectives. New Phytol 2010;186:281-285. DOI |
83 | Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, et al. Introducing mothur: open-source, platform- independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol 2009;75:7537-7541. DOI |
84 | Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK, et al. QIIME allows analysis of high-throughput community sequencing data. Nat Methods 2010;7:335-336. DOI |
85 | Hugenholtzt P, Huber T. Chimeric 16S rDNA sequences of diverse origin are accumulating in the public databases. Int J Syst Evol Microbiol 2003;53(Pt 1):289-293. DOI |
86 | Huson DH, Mitra S, Ruscheweyh HJ, Weber N, Schuster SC. Integrative analysis of environmental sequences using MEGAN4. Genome Res 2011;21:1552-1560. DOI |
87 | Huse SM, Welch DM, Morrison HG, Sogin ML. Ironing out the wrinkles in the rare biosphere through improved OTU clustering. Environ Microbiol 2010;12:1889-1898. DOI |
88 | Bradley RD, Hillis DM. Recombinant DNA sequences generated by PCR amplification. Mol Biol Evol 1997;14:592-593. DOI |
89 | DeSantis TZ Jr, Hugenholtz P, Keller K, Brodie EL, Larsen N, Piceno YM, et al. NAST: a multiple sequence alignment server for comparative analysis of 16S rRNA genes. Nucleic Acids Res 2006;34:W394-W399. |
90 | Schloss PD, Gevers D, Westcott SL. Reducing the effects of PCR amplification and sequencing artifacts on 16S rRNAbased studies. PLoS One 2011;6:e27310. DOI |
91 | Thompson JD, Higgins DG, Gibson TJ. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 1994;22:4673-4680. DOI |
92 | Schloss PD. The effects of alignment quality, distance calculation method, sequence filtering, and region on the analysis of 16S rRNA gene-based studies. PLoS Comput Biol 2010; 6:e1000844. DOI |
93 | Schloss PD. Secondary structure improves OTU assignments of 16S rRNA gene sequences. ISME J 2013;7:457-460. DOI |
94 | Hartmann M, Howes CG, VanInsberghe D, Yu H, Bachar D, Christen R, et al. Significant and persistent impact of timber harvesting on soil microbial communities in Northern coniferous forests. ISME J 2012;6:2199-2218. DOI |
95 | Pruesse E, Peplies J, Glockner FO. SINA: accurate highthroughput multiple sequence alignment of ribosomal RNA genes. Bioinformatics 2012;28:1823-1829. DOI |
96 | Nawrocki EP, Kolbe DL, Eddy SR. Infernal 1.0: inference of RNA alignments. Bioinformatics 2009;25:1335-1337. DOI |
97 | Kumar S, Carlsen T, Mevik BH, Enger P, Blaalid R, Shalchian-Tabrizi K, et al. CLOTU: an online pipeline for processing and clustering of 454 amplicon reads into OTUs followed by taxonomic annotation. BMC Bioinformatics 2011; 12:182. DOI |
98 | Sul WJ, Cole JR, Jesus EC, Wang Q, Farris RJ, Fish JA, et al. Bacterial community comparisons by taxonomy-supervised analysis independent of sequence alignment and clustering. Proc Natl Acad Sci U S A 2011;108:14637-14642. DOI |
99 | Soergel DA, Dey N, Knight R, Brenner SE. Selection of primers for optimal taxonomic classification of environmental 16S rRNA gene sequences. ISME J 2012;6:1440-1444. DOI |
100 | Lan Y, Wang Q, Cole JR, Rosen GL. Using the RDP classifier to predict taxonomic novelty and reduce the search space for finding novel organisms. PLoS One 2012;7:e32491. DOI |
101 | Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol 1990;215:403-410. |
102 | Huson DH, Auch AF, Qi J, Schuster SC. MEGAN analysis of metagenomic data. Genome Res 2007;17:377-386. DOI |
103 | Matsen FA, Kodner RB, Armbrust EV. pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinformatics 2010;11:538. DOI |
104 | Clemente JC, Jansson J, Valiente G. Flexible taxonomic assignment of ambiguous sequencing reads. BMC Bioinformatics 2011;12:8. DOI |
105 | Wang Q, Garrity GM, Tiedje JM, Cole JR. Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Appl Environ Microbiol 2007;73: 5261-5267. DOI |
106 | Berger SA, Krompass D, Stamatakis A. Performance, accuracy, and Web server for evolutionary placement of short sequence reads under maximum likelihood. Syst Biol 2011; 60:291-302. DOI |
107 | Mirarab S, Nguyen N, Warnow T. SEPP: SATe-enabled phylogenetic placement. Pac Symp Biocomput 2012:247-258. |
108 | Wu M, Scott AJ. Phylogenomic analysis of bacterial and archaeal sequences with AMPHORA2. Bioinformatics 2012;28: 1033-1034. DOI |
109 | Vergin KL, Urbach E, Stein JL, DeLong EF, Lanoil BD, Giovannoni SJ. Screening of a fosmid library of marine environmental genomic DNA fragments reveals four clones related to members of the order Planctomycetales. Appl Environ Microbiol 1998;64:3075-3078. |