DOI QR코드

DOI QR Code

Mining and analysis of microsatellites in human coronavirus genomes using the in-house built Java pipeline

  • Umang, Umang (School of Computer Science, Shri Venkateshwara University) ;
  • Bharti, Pawan Kumar (School of Computer Science, Shri Venkateshwara University) ;
  • Husain, Akhtar (Department of Computer Science and IT, MJP Rohilkhand University)
  • Received : 2020.05.09
  • Accepted : 2022.09.14
  • Published : 2022.09.30

Abstract

Microsatellites or simple sequence repeats are motifs of 1 to 6 nucleotides in length present in both coding and non-coding regions of DNA. These are found widely distributed in the whole genome of prokaryotes, eukaryotes, bacteria, and viruses and are used as molecular markers in studying DNA variations, gene regulation, genetic diversity and evolutionary studies, etc. However, in vitro microsatellite identification proves to be time-consuming and expensive. Therefore, the present research has been focused on using an in-house built java pipeline to identify, analyse, design primers and find related statistics of perfect and compound microsatellites in the seven complete genome sequences of coronavirus, including the genome of coronavirus disease 2019, where the host is Homo sapiens. Based on search criteria among seven genomic sequences, it was revealed that the total number of perfect simple sequence repeats (SSRs) found to be in the range of 76 to 118 and compound SSRs from 01 to10, thus reflecting the low conversion of perfect simple sequence to compound repeats. Furthermore, the incidence of SSRs was insignificant but positively correlated with genome size (R2 = 0.45, p > 0.05), with simple sequence repeats relative abundance (R2 = 0.18, p > 0.05) and relative density (R2 = 0.23, p > 0.05). Dinucleotide repeats were the most abundant in the coding region of the genome, followed by tri, mono, and tetra. This comparative study would help us understand the evolutionary relationship, genetic diversity, and hypervariability in minimal time and cost.

Keywords

References

  1. Kahn JS, McIntosh K. History and recent advances in coronavirus discovery. Pediatr Infect Dis J 2005;24(11 Suppl):S223-S227. https://doi.org/10.1097/01.inf.0000188166.17324.60
  2. Paules CI, Marston HD, Fauci AS. Coronavirus infections: more than just the common cold. JAMA 2020;323:707-708. https://doi.org/10.1001/jama.2020.0757
  3. Wikimedia. Coronavirus. Wikimedia Foundation, Inc. Accessed 2022 Sep 10. Available from: https://en.wikipedia.org/wiki/Coronavirus.
  4. Liu P, Shi L, Zhang W, He J, Liu C, Zhao C, et al. Prevalence and genetic diversity analysis of human coronaviruses among cross-border children. Virol J 2017;14:230. https://doi.org/10.1186/s12985-017-0896-0
  5. Zhao X, Tian Y, Yang R, Feng H, Ouyang Q, Tian Y, et al. Coevolution between simple sequence repeats (SSRs) and virus genome size. BMC Genomics 2012;13:435. https://doi.org/10.1186/1471-2164-13-435
  6. Holland J, Spindler K, Horodyski F, Grabau E, Nichol S, VandePol S. Rapid evolution of RNA genomes. Science 1982;215:1577-1585. https://doi.org/10.1126/science.7041255
  7. Domingo E. Viruses at the edge of adaptation. Virology 2000;270:251-253. https://doi.org/10.1006/viro.2000.0320
  8. Elena SF, Lenski RE. Evolution experiments with microorganisms: the dynamics and genetic bases of adaptation. Nat Rev Genet 2003;4:457-469. https://doi.org/10.1038/nrg1088
  9. Sanjuan R, Nebot MR, Chirico N, Mansky LM, Belshaw R. Viral mutation rates. J Virol 2010;84:9733-9748. https://doi.org/10.1128/JVI.00694-10
  10. Bennetzen JL. Transposable element contributions to plant gene and genome evolution. Plant Mol Biol 2000;42:251-269. https://doi.org/10.1023/A:1006344508454
  11. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, et al. Initial sequencing and analysis of the human genome. Nature 2001;409:860-921. https://doi.org/10.1038/35057062
  12. Hancock JM. Genome size and the accumulation of simple sequence repeats: implications of new data from genome sequencing projects. Genetica 2002;115:93-103. https://doi.org/10.1023/A:1016028332006
  13. National Library of Medicine. Severe acute respiratory syndrome-related coronavirus. Bethesda: National Library of Medicine, 2020. Accessed 2022 Sep 10. Available from: https://www.ncbi.nlm.nih.gov/data-hub/taxonomy/694009/.
  14. Litt M, Luty JA. A hypervariable microsatellite revealed by in vitro amplification of a dinucleotide repeat within the cardiac muscle actin gene. Am J Hum Genet 1989;44:397-401.
  15. Toth G, Gaspari Z, Jurka J. Microsatellites in different eukaryotic genomes: survey and analysis. Genome Res 2000;10:967-981. https://doi.org/10.1101/gr.10.7.967
  16. Field D, Wills C. Long, polymorphic microsatellites in simple organisms. Proc Biol Sci 1996;263:209-215. https://doi.org/10.1098/rspb.1996.0033
  17. Bachmann L, Bareiss P, Tomiuk J. Allelic variation, fragment length analyses and population genetic model: a case study on Drosophila microsatellites. J Zool Syst Evol Res 2004;42:215-223. https://doi.org/10.1111/j.1439-0469.2004.00275.x
  18. Kofler R, Schlotterer C, Luschutzky E, Lelley T. Survey of microsatellite clustering in eight fully sequenced species sheds light on the origin of compound microsatellites. BMC Genomics 2008;9:612. https://doi.org/10.1186/1471-2164-9-612
  19. Mudunuri SB, Nagarajaram HA. IMEx: imperfect microsatellite extractor. Bioinformatics 2007;23:1181-1187. https://doi.org/10.1093/bioinformatics/btm097
  20. Tautz D, Renz M. Simple sequences are ubiquitous repetitive components of eukaryotic genomes. Nucleic Acids Res 1984;12:4127-4138. https://doi.org/10.1093/nar/12.10.4127
  21. Gupta PK, Balyan HS, Sharma PC, Ramesh B. Microsatellites in plants: a new class of molecular markers. Curr Sci 1996;70:45-54.
  22. Hancock JM. The contribution of slippage-like processes to genome evolution. J Mol Evol 1995;41:1038-1047. https://doi.org/10.1007/BF00173185
  23. Lawson MJ, Zhang L. Distinct patterns of SSR distribution in the Arabidopsis thaliana and rice genomes. Genome Biol 2006;7:R14. https://doi.org/10.1186/gb-2006-7-2-r14
  24. Tautz D. Simple sequences. Curr Opin Genet Dev 1994;4:832-837. https://doi.org/10.1016/0959-437X(94)90067-1
  25. Squirrell J, Hollingsworth PM, Woodhead M, Russell J, Lowe AJ, Gibby M, et al. How much effort is required to isolate nuclear microsatellites from plants? Mol Ecol 2003;12:1339-1348. https://doi.org/10.1046/j.1365-294X.2003.01825.x
  26. Botstein D, White RL, Skolnick M, Davis RW. Construction of a genetic linkage map in man using restriction fragment length polymorphisms. Am J Hum Genet 1980;32:314-331.
  27. Gupta PK, Rustgi S, Sharma S, Singh R, Kumar N, Balyan HS. Transferable EST-SSR markers for the study of polymorphism and genetic diversity in bread wheat. Mol Genet Genomics 2003;270:315-323. https://doi.org/10.1007/s00438-003-0921-4
  28. McCouch SR, Chen X, Panaud O, Temnykh S, Xu Y, Cho YG, et al. Microsatellite marker development, mapping and applications in rice genetics and breeding. Plant Mol Biol 1997;35:89-99. https://doi.org/10.1023/A:1005711431474
  29. Ramsay L, Macaulay M, degli Ivanissevich S, MacLean K, Cardle L, Fuller J, et al. A simple sequence repeat-based linkage map of barley. Genetics 2000;156:1997-2005. https://doi.org/10.1093/genetics/156.4.1997
  30. Buchanan FC, Adams LJ, Littlejohn RP, Maddox JF, Crawford AM. Determination of evolutionary relationships among sheep breeds using microsatellites. Genomics 1994;22:397-403. https://doi.org/10.1006/geno.1994.1401
  31. Martin P, Makepeace K, Hill SA, Hood DW, Moxon ER. Microsatellite instability regulates transcription factor binding and gene expression. Proc Natl Acad Sci U S A 2005;102:3800-3804. https://doi.org/10.1073/pnas.0406805102
  32. Vinces MD, Legendre M, Caldara M, Hagihara M, Verstrepen KJ. Unstable tandem repeats in promoters confer transcriptional evolvability. Science 2009;324:1213-1216. https://doi.org/10.1126/science.1170097
  33. Ashley CT Jr, Warren ST. Trinucleotide repeat expansion and human disease. Annu Rev Genet 1995;29:703-728. https://doi.org/10.1146/annurev.ge.29.120195.003415
  34. Zane L, Bargelloni L, Patarnello T. Strategies for microsatellite isolation: a review. Mol Ecol 2002;11:1-16. https://doi.org/10.1046/j.0962-1083.2001.01418.x
  35. MISA-web. Gatersleben: Das Leibniz-Institut fur Pflanzengenetik und Kulturpflanzenforschung in Gatersleben (IPK). Accessed 2022 Sep 10. Available from: http://pgrc.ipk-gatersleben.de/ misa/download/misa.pl.
  36. Primer3. San Francisco: Github Inc., 2022. Accessed 2022 Sep10. Available from: http://primer3.org/releases.html.
  37. Shanker A, Bhargava A, Bajpai R, Singh S, Srivastava S, Sharma V. Bioinformatically mined simple sequence repeats in UniGene of Citrus sinensis. Sci Hortic 2007;113:353-361. https://doi.org/10.1016/j.scienta.2007.04.011
  38. Satyam R, Jha NK, Kar R, Jha SK, Sharma A, Kumar D, et al. Deciphering the SSR incidences across viral members of Coronaviridae family. Chem Biol Interact 2020;331:109226. https://doi.org/10.1016/j.cbi.2020.109226
  39. Alam CM, Iqbal A, Sharma A, Schulman AH, Ali S. Microsatellite diversity, complexity, and host range of mycobacteriophage genomes of the Siphoviridae family. Front Genet 2019;10:207. https://doi.org/10.3389/fgene.2019.00207
  40. Qin L, Zhang Z, Zhao X, Wu X, Chen Y, Tan Z, et al. Survey and analysis of simple sequence repeats (SSRs) present in the genomes of plant viroids. FEBS Open Bio 2014;4:185-189. https://doi.org/10.1016/j.fob.2014.02.001
  41. Alam CM, Sharfuddin C, Ali S. Analysis of simple and imperfect microsatellites in Ebolavirus species and other genomes of Filoviridae family. Gene Cell Tissue 2015;2:e26204.
  42. Alam CM, Iqbal A, Thadari B, Ali S. Imex based analysis of repeat sequences in flavivirus genomes, including Dengue virus. J Data Mining Genomics Proteomics 2016;7:187.
  43. Paraskevis D, Kostaki EG, Magiorkinis G, Panayiotakopoulos G, Sourvinos G, Tsiodras S. Full-genome evolutionary analysis of the novel corona virus (2019-nCoV) rejects the hypothesis of emergence as a result of a recent recombination event. Infect Genet Evol 2020;79:104212. https://doi.org/10.1016/j.meegid.2020.104212
  44. Chen M, Tan Z, Zeng G. Microsatellite is an important component of complete hepatitis C virus genomes. Infect Genet Evol 2011;11:1646-1654. https://doi.org/10.1016/j.meegid.2011.06.012
  45. Hassan MA, Hasan ME. Finding a tandem repeats motifs in the completed genomes of human coronavirus (hku1) which is identified as a hotspot region for the viruses recombination by using regular expression language. Preprint at: https://doi.org/10.20 944/preprints201910.0249.v1 (2019). https://doi.org/10.20944/preprints201910.0249.v1
  46. Chen M, Tan Z, Jiang J, Li M, Chen H, Shen G, et al. Similar distribution of simple sequence repeats in diverse completed human immunodeficiency virus type 1 genomes. FEBS Lett 2009;583:2959-2963. https://doi.org/10.1016/j.febslet.2009.08.004
  47. Jeffreys AJ, Holloway JK, Kauppi L, May CA, Neumann R, Slingsby MT, et al. Meiotic recombination hot spots and human DNA diversity. Philos Trans R Soc Lond B Biol Sci 2004;359:141-152. https://doi.org/10.1098/rstb.2003.1372
  48. Yant SR, Wu X, Huang Y, Garrison B, Burgess SM, Kay MA. High-resolution genome-wide mapping of transposon integration in mammals. Mol Cell Biol 2005;25:2085-2094. https://doi.org/10.1128/MCB.25.6.2085-2094.2005