Browse > Article

An assessment of the taxonomic reliability of DNA barcode sequences in publicly available databases  

Jin, Soyeong (School of Biological Sciences and Technology, Chonnam National University)
Kim, Kwang Young (Department of Oceanography, Chonnam National University)
Kim, Min-Seok (Dental Science Research Institute, School of Dentistry, Chonnam National University)
Park, Chungoo (School of Biological Sciences and Technology, Chonnam National University)
Publication Information
ALGAE / v.35, no.3, 2020 , pp. 293-301 More about this Journal
The applications of DNA barcoding have a wide range of uses, such as in taxonomic studies to help elucidate cryptic species and phylogenetic relationships and analyzing environmental samples for biodiversity monitoring and conservation assessments of species. After obtaining the DNA barcode sequences, sequence similarity-based homology analysis is commonly used. This means that the obtained barcode sequences are compared to the DNA barcode reference databases. This bioinformatic analysis necessarily implies that the overall quantity and quality of the reference databases must be stringently monitored to not have an adverse impact on the accuracy of species identification. With the development of next-generation sequencing techniques, a noticeably large number of DNA barcode sequences have been produced and are stored in online databases, but their degree of validity, accuracy, and reliability have not been extensively investigated. In this study, we investigated the extent to which the amount and types of erroneous barcode sequences were deposited in publicly accessible databases. Over 4.1 million sequences were investigated in three largescale DNA barcode databases (NCBI GenBank, Barcode of Life Data System [BOLD], and Protist Ribosomal Reference database [PR2]) for four major DNA barcodes (cytochrome c oxidase subunit 1 [COI], internal transcribed spacer [ITS], ribulose bisphosphate carboxylase large chain [rbcL], and 18S ribosomal RNA [18S rRNA]); approximately 2% of erroneous barcode sequences were found and their taxonomic distributions were uneven. Consequently, our present findings provide compelling evidence of data quality problems along with insufficient and unreliable annotation of taxonomic data in DNA barcode databases. Therefore, we suggest that if ambiguous taxa are presented during barcoding analysis, further validation with other DNA barcode loci or morphological characters should be mandated.
18S rRNA; COI; DNA barcoding; ITS; rbcL; taxonomic databases;
Citations & Related Records
Times Cited By KSCI : 4  (Citation Analysis)
연도 인용수 순위
1 Ashelford, K. E., Chuzhanova, N. A., Fry, J. C., Jones, A. J. & Weightman, A. J. 2005. At least 1 in 20 16S rRNA sequence records currently held in public repositories is estimated to contain substantial anomalies. Appl. Environ. Microbiol. 71:7724-7736.   DOI
2 Barrett, R. D. H. & Hebert, P. D. N. 2005. Identifying spiders through DNA barcodes. Can. J. Zool. 83:481-491.   DOI
3 Bridge, P. D., Roberts, P. J., Spooner, B. M. & Panchal, G. 2003. On the unreliability of published DNA sequences. New Phytol. 160:43-48.   DOI
4 Burns, J. M., Janzen, D. H., Hajibabaei, M., Hallwachs, W. & Hebert, P. D. 2008. DNA barcodes and cryptic species of skipper butterflies in the genus Perichares in Area de Conservacion Guanacaste, Costa Rica. Proc. Natl. Acad. Sci. U. S. A. 105:6350-6355.   DOI
5 Camacho, C., Coulouris, G., Avagyan, V., Ma, N., Papadopoulos, J., Bealer, K. & Madden, T. L. 2009. BLAST+: architecture and applications. BMC Bioinformatics 10:421.   DOI
6 Guillou, L., Bachar, D., Audic, S., Bass, D., Berney, C., Bittner, L., Boutte, C., Burgaud, G., de Vargas, C., Decelle, J., Del Campo, J., Dolan, J. R., Dunthorn, M., Edvardsen, B., Holzmann, M., Kooistra, W. H. C. F., Lara, E., Le Bescot, N., Logares, R., Mahe, F., Massana, R., Montresor, M., Morard, R., Not, F., Pawlowski, J., Probert, I., Sauvadet, A. -L., Siano, R., Stoeck, T., Vaulot, D., Zimmermann, P. & Christen, R. 2013. The Protist Ribosomal Reference database (PR2): a catalog of unicellular eukaryote small sub-unit rRNA sequences with curated taxonomy. Nucleic Acids Res. 41(Database issue):D597-D604.   DOI
7 Hebert, P. D. N., Cywinska, A., Ball, S. L. & deWaard, J. R. 2003. Biological identifications through DNA barcodes. Proc. Biol. Sci. 270:313-321.   DOI
8 Jo, J., Lee, H. -G., Kim, K. Y. & Park, C. 2019. SoEM: a novel PCR-free biodiversity assessment method based on small-organelles enriched metagenomics. Algae 34:57-70.   DOI
9 Kerr, K. C. R., Stoeckle, M. Y., Dove, C. J., Weigt, L. A., Francis, C. M. & Hebert, P. D. N. 2007. Comprehensive DNA barcode coverage of North American birds. Mol. Ecol. Notes 7:535-543.   DOI
10 Kim, H. M., Jo, J., Park, C., Choi, B. -J., Lee, H. -G. & Kim, K. Y. 2019. Epibionts associated with floating Sargassum horneri in the Korea Strait. Algae 34:303-313.   DOI
11 Koljalg, U., Larsson, K. -H., Abarenkov, K., Nilsson, R. H., Alexander, I. J., Eberhardt, U., Erland, S., Hoiland, K., Kjoller, R., Larsson, E., Pennanen, T., Sen, R., Taylor, A. F. S., Tedersoo, L., Vralstad, T. & Ursing, B. M. 2005. UNITE: a database providing web-based methods for the molecular identification of ectomycorrhizal fungi. New Phytol. 166:1063-1068.   DOI
12 Kress, W. J., Garcia-Robledo, C., Uriarte, M. & Erickson, D. L. 2015. DNA barcodes for ecology, evolution, and conservation. Trends Ecol. Evol. 30:25-35.   DOI
13 Nilsson, R. H., Ryberg, M., Kristiansson, E., Abarenkov, K., Larsson, K. -H. & Koljalg, U. 2006. Taxonomic reliability of DNA sequences in public sequence databases: a fungal perspective. PLoS ONE 1:e59.   DOI
14 Ratnasingham, S. & Hebert, P. D. N. 2007. Bold: The Barcode of Life Data System ( Mol. Ecol. Notes 7:355-364.   DOI
15 Sayers, E. W., Cavanaugh, M., Clark, K., Ostell, J., Pruitt, K. D. & Karsch-Mizrachi, I. 2019. GenBank. Nucleic Acids Res. 47:D94-D99.   DOI
16 Seah, Y. G., Ariffin, A. F. & Jaafar, T. N. A. M. 2017. Levels of COI divergence in Family Leiognathidae using sequences available in GenBank and BOLD systems: a review on the accuracy of public databases. AACL Bioflux 10:391-401.
17 Sonet, G., Jordaens, K., Braet, Y., Bourguignon, L., Dupont, E., Backeljau, T., De Meyer, M. & Desmyter, S. 2013. Utility of GenBank and the Barcode of Life Data Systems (BOLD) for the identification of forensically important Diptera from Belgium and France. Zookeys 365:307-328.   DOI
18 Smith, M. A., Poyarkov, N. A. Jr. & Hebert, P. D. N. 2008. DNA BARCODING: CO1 DNA barcoding amphibians: take the chance, meet the challenge. Mol. Ecol. Resour. 8:235-246.   DOI