DOI QR코드

DOI QR Code

Non-negligible Occurrence of Errors in Gender Description in Public Data Sets

  • Kim, Jong Hwan (Genome Structure Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB)) ;
  • Park, Jong-Luyl (Epigenome Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB)) ;
  • Kim, Seon-Young (Genome Structure Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB))
  • Received : 2015.09.09
  • Accepted : 2015.12.30
  • Published : 2016.03.31

Abstract

Due to advances in omics technologies, numerous genome-wide studies on human samples have been published, and most of the omics data with the associated clinical information are available in public repositories, such as Gene Expression Omnibus and ArrayExpress. While analyzing several public datasets, we observed that errors in gender information occur quite often in public datasets. When we analyzed the gender description and the methylation patterns of gender-specific probes (glucose-6-phosphate dehydrogenase [G6PD], ephrin-B1 [EFNB1], and testis specific protein, Y-linked 2 [TSPY2]) in 5,611 samples produced using Infinium 450K HumanMethylation arrays, we found that 19 samples from 7 datasets were erroneously described. We also analyzed 1,819 samples produced using the Affymetrix U133Plus2 array using several gender-specific genes (X (inactive)-specific transcript [XIST], eukaryotic translation initiation factor 1A, Y-linked [EIF1AY], and DEAD [Asp-Glu-Ala-Asp] box polypeptide 3, Y-linked [DDDX3Y]) and found that 40 samples from 3 datasets were erroneously described. We suggest that the users of public datasets should not expect that the data are error-free and, whenever possible, that they should check the consistency of the data.

Keywords

References

  1. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, et al. Initial sequencing and analysis of the human genome. Nature 2001;409:860-921. https://doi.org/10.1038/35057062
  2. Barrett T, Troup DB, Wilhite SE, Ledoux P, Rudnev D, Evangelista C, et al. NCBI GEO: archive for high-throughput functional genomic data. Nucleic Acids Res 2009;37:D885-D890. https://doi.org/10.1093/nar/gkn764
  3. Rocca-Serra P, Brazma A, Parkinson H, Sarkans U, Shojatalab M, Contrino S, et al. ArrayExpress: a public database of gene expression data at EBI. C R Biol 2003;326:1075-1078. https://doi.org/10.1016/j.crvi.2003.09.026
  4. Zeeberg BR, Riss J, Kane DW, Bussey KJ, Uchio E, Linehan WM, et al. Mistaken identifiers: gene name errors can be introduced inadvertently when using Excel in bioinformatics. BMC Bioinformatics 2004;5:80. https://doi.org/10.1186/1471-2105-5-80
  5. Ober C, Loisel DA, Gilad Y. Sex-specific genetic architecture of human disease. Nat Rev Genet 2008;9:911-922. https://doi.org/10.1038/nrg2415
  6. Van der Meulen J, Sanghvi V, Mavrakis K, Durinck K, Fang F, Matthijssens F, et al. The H3K27me3 demethylase UTX is a gender-specific tumor suppressor in T-cell acute lymphoblastic leukemia. Blood 2015;125:13-21. https://doi.org/10.1182/blood-2014-05-577270
  7. Dimas AS, Nica AC, Montgomery SB, Stranger BE, Raj T, Buil A, et al. Sex-biased genetic effects on gene regulation in humans. Genome Res 2012;22:2368-2375. https://doi.org/10.1101/gr.134981.111
  8. Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U, et al. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 2003;4:249-264. https://doi.org/10.1093/biostatistics/4.2.249
  9. Bibikova M, Chudin E, Wu B, Zhou L, Garcia EW, Liu Y, et al. Human embryonic stem cells have a unique epigenetic signature. Genome Res 2006;16:1075-1083. https://doi.org/10.1101/gr.5319906
  10. Guillen IA, Fernandez JR, Palenzuela DO, Duenas S, Han J, Zhang Z, et al. Analysis of gene expression profile for gender in human blood samples. Int J Innov Appl Stud 2014;7:329-342.
  11. Mohandas T, Sparkes RS, Shapiro LJ. Reactivation of an inactive human X chromosome: evidence for X inactivation by DNA methylation. Science 1981;211:393-396. https://doi.org/10.1126/science.6164095
  12. Donaldson MD, Gault EJ, Tan KW, Dunger DB. Optimising management in Turner syndrome: from infancy to adult transfer. Arch Dis Child 2006;91:513-520. https://doi.org/10.1136/adc.2003.035907
  13. Nielsen J, Wohlert M. Chromosome abnormalities found among 34,910 newborn children: results from a 13-year incidence study in Arhus, Denmark. Hum Genet 1991;87:81-83. https://doi.org/10.1007/BF01213097
  14. Otter M, Schrander-Stumpel CT, Curfs LM. Triple X syndrome: a review of the literature. Eur J Hum Genet 2010;18:265-271. https://doi.org/10.1038/ejhg.2009.109
  15. Stochholm K, Bojesen A, Jensen AS, Juul S, Gravholt CH. Criminality in men with Klinefelter's syndrome and XYY syndrome: a cohort study. BMJ Open 2012;2:e000650. https://doi.org/10.1136/bmjopen-2011-000650
  16. Wan ES, Qiu W, Morrow J, Beaty TH, Hetmanski J, Make BJ, et al. Genome-wide site-specific differential methylation in the blood of individuals with Klinefelter syndrome. Mol Reprod Dev 2015;82:377-386. https://doi.org/10.1002/mrd.22483
  17. Hannum G, Guinney J, Zhao L, Zhang L, Hughes G, Sadda S, et al. Genome-wide methylation profiles reveal quantitative views of human aging rates. Mol Cell 2013;49:359-367. https://doi.org/10.1016/j.molcel.2012.10.016
  18. Horvath S. DNA methylation age of human tissues and cell types. Genome Biol 2013;14:R115. https://doi.org/10.1186/gb-2013-14-10-r115
  19. Grimm E, Friedberg RC, Wilkinson DS, AuBuchon JP, Souers RJ, Lehman CM. Blood bank safety practices: mislabeled samples and wrong blood in tube: a Q-Probes analysis of 122 clinical laboratories. Arch Pathol Lab Med 2010;134:1108-1115.

Cited by

  1. sEst: Accurate Sex-Estimation and Abnormality Detection in Methylation Microarray Data vol.19, pp.10, 2018, https://doi.org/10.3390/ijms19103172