DOI QR코드

DOI QR Code

2-D graphical representation of protein sequences and its application to coronavirus phylogeny

  • Li, Chun (Department of Mathematics, Bohai University) ;
  • Xing, Lili (Department of Mathematics, Bohai University) ;
  • Wang, Xin (Dalian Naval Academy)
  • Received : 2007.07.06
  • Accepted : 2007.10.26
  • Published : 2008.03.31

Abstract

Based on a five-letter model of the 20 amino acids, we propose a new 2-D graphical representation of protein sequence. Then we transform the 2-D graphical representation into a numerical characterization that will facilitate quantitative comparisons of protein sequences. As an application, we construct the phylogenetic tree of 56 coronavirus spike proteins. The resulting tree agrees well with the established taxonomic groups.

Keywords

References

  1. Bielinska-Waz, D., Clark, T., Waz, P., Nowak, W. and Nandy, A. (2007) 2D-dynamic representation of DNA sequences. Chem. Phys. Lett. 442, 140-144 https://doi.org/10.1016/j.cplett.2007.05.050
  2. Bielinska-Waz, D., Nowak, W., Waz, P., Nandy, A. and Clark, T. (2007) Distribution moments of 2D-graphs as descriptors of DNA sequences. Chem. Phys. Lett. 443, 408-413 https://doi.org/10.1016/j.cplett.2007.06.088
  3. Gates, M. A. (1986) A simple way to look at DNA. J. Theor. Biol. 119, 319-328 https://doi.org/10.1016/S0022-5193(86)80144-8
  4. Guo, X. F., Randic, M. and Basak, S. C. (2001) A novel 2-D graphical representation of DNA sequences of low degeneracy. Chem. Phys. Lett. 350, 106-112 https://doi.org/10.1016/S0009-2614(01)01246-5
  5. Hamori, E. and Ruskin, J. (1983) H curves, a novel method of representation of nucleotide series especially suited for long DNA sequences. J. Biol. Chem. 258, 1318
  6. Jeffrey, H. I. (1990) Chaos game representation of gene structure. Nucleic Acid Res. 18, 2163-2170 https://doi.org/10.1093/nar/18.8.2163
  7. Leong, P. M. and Morgenthaler, S. (1995) Random walk and gap plots of DNA sequences. Comput. Applic. Biosci. 12, 503-511
  8. Li, C. and Wang, J. (2004) On a 3-D representation of DNA primary sequences. Comb. Chem. High T. Scr. 7, 23-27
  9. Li, C., Tang, N. N. and Wang, J. (2006) Directed graphs of DNA sequences and their numerical characterization. J. Theor. Biol. 241, 173-177 https://doi.org/10.1016/j.jtbi.2005.11.023
  10. Li, C. and Hu, J. (2006) 2-D Graphical representation for characteristic sequences of DNA and its application. J. Biochem. Mol. Biol. 39, 292-296 https://doi.org/10.5483/BMBRep.2006.39.3.292
  11. Nandy, A. (1994) A new graphical representation and analysis of DNA sequence structure: I. Methodology and application to globin genes. Curr. Sci. 66, 309-313
  12. Nandy, A. (1994) Graphical representation of long DNA sequences. Curr. Sci. 66, 821
  13. Nandy, A., Harle, M. and Basak, S. C. (2006) Mathematical descriptors of DNA sequences: development and applications. ARKIVOC 9, 211-238
  14. Randic, M., Vracko, M., Nandy, A. and Basak, S. C. (2000) On 3-D graphical representation of DNA primary sequence and their numerical characterization. J. Chem. Inf. Comput. Sci. 40, 1235-1244 https://doi.org/10.1021/ci000034q
  15. Randic, M., Guo, X. F. and Basak S. C. (2001) On the Characterization of DNA primary sequences by triplet of nucleic acid bases. J. Chem. Inf. Comptu. Sci. 41, 619-626 https://doi.org/10.1021/ci000120q
  16. Randic, M. and Balaban, A. T. (2003) On a four-dimensional representation of DNA primary sequences. J. Chem. Inf. Comptu. Sci. 43, 532-539 https://doi.org/10.1021/ci020051a
  17. Randic, M., Vracko, M., Lers, N. and Plavsic, D. (2003) Novel 2-D graphical representation of DNA sequences and their numerical characterization. Chem. Phys. Lett. 368, 1-6 https://doi.org/10.1016/S0009-2614(02)01784-0
  18. Randic, M., Vracko, M., Lers, N. and Plavsic, D. (2003) Analysis of similarity/dissimilarity of DNA sequences based on novel 2-D graphical representation. Chem. Phys. Lett. 371, 202-207 https://doi.org/10.1016/S0009-2614(03)00244-6
  19. Randic, M., Vracko, M., Zupan, J. and Novic M. (2003) Compact 2-D graphical representation of DNA. Chem. Phys. Lett. 373, 558-562 https://doi.org/10.1016/S0009-2614(03)00639-0
  20. Randic, M. (2004) Graphical representations of DNA as 2-D map. Chem. Phys. Lett. 386, 468-471 https://doi.org/10.1016/j.cplett.2004.01.088
  21. Randic, M. and Zupan, J. (2004) Highly compact 2-D graphical representation of DNA sequences. SAR QSAR Environ. Res. 15, 191-205 https://doi.org/10.1080/10629360410001697753
  22. Roy, A., Raychaudhury, C. and Nandy, A. (1998) A novel technique of graphical representation and analysis of DNA sequences-A review. J. Biosci. 23, 55-71 https://doi.org/10.1007/BF02728525
  23. Wu, Y. H., Liew, A. W., Yan, H. and Yang, M. (2003) DB-Curve: a novel 2D method of DNA sequence visualization and representation. Chem. Phys. Lett. 367, 170-176 https://doi.org/10.1016/S0009-2614(02)01684-6
  24. Zhang, R. and Zhang, C. T. (1994) Z curves, an intuitive tool for visualizing and analyzing DNA sequences. J. Biomol. Struc. Dyn. 11, 767-782 https://doi.org/10.1080/07391102.1994.10508031
  25. Randic, M. (2004) 2-D Graphical representation of proteins based on virtual genetic code. SAR QSAR Environ. Res. 15, 147-157 https://doi.org/10.1080/10629360410001697744
  26. Randic, M., Zupan, J. and Balaban, A. T. (2004) Unique graphical representation of protein sequences based on nucleotide triplet codons. Chem. Phys. Lett. 397, 247-252 https://doi.org/10.1016/j.cplett.2004.08.118
  27. Randic, M., Balaban, A. T., Novic, M., Zaloznik, A. and Pisanski, T. (2005) A novel graphical representation of proteins. Period. Boil. 107, 403-414
  28. Randic, M., Butina, D. and Zupan, J. (2006) Novel 2-D graphical representation of proteins. Chem. Phys. Lett. 419, 528-532 https://doi.org/10.1016/j.cplett.2005.11.091
  29. Randic, M., Zupan, J. and Vikic-Topic, D. (2007) On representation of proteins by star-like graphs. J. Mol. Graph. Model. 26, 290-305 https://doi.org/10.1016/j.jmgm.2006.12.006
  30. Lau, S. K. P., Wo, P. C. Y. and Li, K. S. M., et al. (2005) Severe acute respiratory syndrome coronavirus-like virus in Chinese horseshoe bats. PNAS 102, 14040-14045 https://doi.org/10.1073/pnas.0506735102
  31. Marra, M. A., Jones, S. J. M. and Astell, C. R., et al. (2003) The genome sequence of the sars-associated coronavirus. Science 300, 1399 https://doi.org/10.1126/science.1085953
  32. Poon, L. L., Chu, D. K., Chan, K. H., Wong, O. K., Ellis T. M., Leung, Y. H., Lau, S. K., Woo, P. C., Suen, K. Y., Yuen, K. Y., Guan, Y. and Peiris, J. S. (2005) Identification of a novel coronavirus in bats. J. Virol. 79, 2001-2009 https://doi.org/10.1128/JVI.79.4.2001-2009.2005
  33. Rota, P. A., Oberste, M. S. and Monroe, S. S., et al. (2003) Characterization of a novel coronavirus associated with severe acute respiratory syndrome. Science 300, 1394 https://doi.org/10.1126/science.1085952
  34. Satija, N. and Lal, S. (2007) The Molecular Biology of SARS Coronavirus. Ann. N.Y. Acad. Sci. 1102, 26-38 https://doi.org/10.1196/annals.1408.002
  35. Gao, L., Qi, J., Wei, H. B., Sun, Y. G. and Hao, B. L. (2003) Molecular phylogeny of coronaviruses including human SARS-CoV. Chin. Sci. Bull. 48, 1170-1174 https://doi.org/10.1360/03wc0254
  36. Gorbalenya, A. E., Snijder, E. J. and Spaan, W. J. M. (2004) Severe acute respiratory syndrome coronavirus phylogeny: toward consensus. J. Virol. 78, 7863-7866 https://doi.org/10.1128/JVI.78.15.7863-7866.2004
  37. Ksiazek, T. G., Zaki, S. R. and Urbani, C., et al. (2003) A novel coronavirus associated with severe acute respiratory syndrome. N. Engl. J. Med. 348, 1953-1966 https://doi.org/10.1056/NEJMoa030781
  38. Skowronski, D. M., Astell, C., Brunham, R. C., Low, D. E., Petric, M., Roper, R.L., Talbot, P. J., Tam, T. and Babiuk, L. (2005) Severe acute respiratory syndrome (SARS): a year in review. Annu. Rev. Med. 56, 357-381 https://doi.org/10.1146/annurev.med.56.091103.134135
  39. Snijder, E. J., Bredenbeek, P. J. and Dobbe, J. C., et al. (2003) Unique and conserved features of genome and proteome of SARS-coronavirus, an early split-off from the coronavirus group 2 lineage. J. Mol. Biol. 331, 991-1004 https://doi.org/10.1016/S0022-2836(03)00865-9
  40. Zheng, W. X., Chen, L. L., Ou, H. Y., Gao, F. and Zhang, C. T. (2005) Coronavirus phylogeny based on a geometric approach. Mol. Phylogenet. Evol. 36, 224-232 https://doi.org/10.1016/j.ympev.2005.03.030
  41. Chinese SARS Molecular Epidemiology Consortium. (2004) Molecular evolution of the SARS coronavirus during the course of the SARS epidemic in China. Science 303, 1666-1669 https://doi.org/10.1126/science.1092002
  42. Shi, Z. and Hu, Z. (2007) A review of studies on animal reservoirs of the SARS coronavirus. Virus Res. in press
  43. Song, H. D., Tu, C. C. and Zhang, G. W., et al. (2005) Crosshost evolution of severe acute respiratory syndrome coronavirus in palm civet and human. PNAS 102, 2430-2435 https://doi.org/10.1073/pnas.0409608102
  44. Wang, J. and Wang, W. (1999) A computational approach to simplifying the protein folding problem. Nat. Struct. Biol. 6, 1033-1038 https://doi.org/10.1038/14918
  45. Wang, J. and Wang, W. (2000) Modeling study on the validity of a possibly simplified representation of proteins. Phys. Rev. E. 61, 6981-6986 https://doi.org/10.1103/PhysRevE.61.6981
  46. Riddle, D. S., Santiago, J. V., Brayhall, S. T., Doshi, N., Grantcharova, V. P., Yi, Q. and Baker, D. (1997) Functional rapidly folding proteins from simplified amino acid sequences. Nat. Struct. Biol. 4, 805-809 https://doi.org/10.1038/nsb1097-805
  47. Jaklic, G., Pisanski, T. and Randic, M. (2006) Characterization of Complex Biological Systems by Matrix Invariants. J. Comput. Biol. 13, 1558-1564 https://doi.org/10.1089/cmb.2006.13.1558
  48. Li, C. and Wang, J. (2005) New Invariant of DNA Sequences. J. Chem. Inf. Model. 45, 115-120 https://doi.org/10.1021/ci049874l
  49. Randic, M., Zupan, J., Novic, M., Gute, B. D. and Basak, S. C. (2002) Novel matrix invariants for characterization of changes of proteomics maps. SAR QSAR Environ. Res. 13, 689-703 https://doi.org/10.1080/1062936021000043436

Cited by

  1. Protein sequence analysis based on hydropathy profile of amino acids vol.13, pp.2, 2012, https://doi.org/10.1631/jzus.B1100052
  2. New method for comparing DNA primary sequences based on a discrimination measure vol.266, pp.4, 2010, https://doi.org/10.1016/j.jtbi.2010.07.040
  3. A novel descriptor of protein sequences and its application vol.347, 2014, https://doi.org/10.1016/j.jtbi.2014.01.001
  4. A 2D graphical representation of protein sequence and its numerical characterization vol.476, pp.4-6, 2009, https://doi.org/10.1016/j.cplett.2009.06.017
  5. An alignment-free method to find similarity among protein sequences via the general form of Chou’s pseudo amino acid composition vol.24, pp.7, 2013, https://doi.org/10.1080/1062936X.2013.773378
  6. Numerical Characterization of Protein Sequences Based on the Generalized Chou’s Pseudo Amino Acid Composition vol.6, pp.12, 2016, https://doi.org/10.3390/app6120406
  7. tomocomd-camps and protein bilinear indices - novel bio-macromolecular descriptors for protein research: I. Predicting protein stability effects of a complete set of alanine substitutions in the Arc repressor vol.277, pp.15, 2010, https://doi.org/10.1111/j.1742-4658.2010.07711.x
  8. Mathematical Characterization of Protein Sequences Using Patterns as Chemical Group Combinations of Amino Acids vol.11, pp.12, 2016, https://doi.org/10.1371/journal.pone.0167651
  9. The graphical representation of protein sequences based on the physicochemical properties and its applications vol.31, pp.11, 2010, https://doi.org/10.1002/jcc.21501
  10. WITHDRAWN: A Novel Way of Comparing Protein Sequences Represented Under Physio-Chemical Properties of their Amino Acids 2017, https://doi.org/10.1016/j.compbiolchem.2017.04.001
  11. Condensed Matrix Descriptor for Protein Sequence Comparison vol.04, pp.01, 2016, https://doi.org/10.4236/ijamsc.2016.41001
  12. Use of FFT in Protein Sequence Comparison under Their Binary Representations vol.06, pp.02, 2016, https://doi.org/10.4236/cmb.2016.62003
  13. A novel 2D graphical representation of protein sequence based on individual amino acid vol.111, pp.12, 2011, https://doi.org/10.1002/qua.22709
  14. Alignment-free Comparison of Protein Sequences Based on Reduced Amino Acid Alphabets vol.26, pp.6, 2009, https://doi.org/10.1080/07391102.2009.10507288
  15. 20D-dynamic representation of protein sequences vol.107, pp.1, 2016, https://doi.org/10.1016/j.ygeno.2015.12.003
  16. A new model of amino acids evolution, evolution index of amino acids and its application in graphical representation of protein sequences vol.497, pp.4-6, 2010, https://doi.org/10.1016/j.cplett.2010.08.010
  17. Chemical property based sequence characterization of PpcA and its homolog proteins PpcB-E: A mathematical approach vol.12, pp.3, 2017, https://doi.org/10.1371/journal.pone.0175031
  18. Primary structure similarity analysis of proteins sequences by a new graphical representation vol.25, pp.10, 2014, https://doi.org/10.1080/1062936X.2014.955055
  19. Alignment-free similarity analysis for protein sequences based on fuzzy integral vol.9, pp.1, 2019, https://doi.org/10.1038/s41598-019-39477-8