DOI QR코드

DOI QR Code

Identification of Viral Taxon-Specific Genes (VTSG): Application to Caliciviridae

  • Received : 2018.07.31
  • Accepted : 2018.12.16
  • Published : 2018.12.31

Abstract

Virus taxonomy was initially determined by clinical experiments based on phenotype. However, with the development of sequence analysis methods, genotype-based classification was also applied. With the development of genome sequence analysis technology, there is an increasing demand for virus taxonomy to be extended from in vivo and in vitro to in silico. In this study, we verified the consistency of the current International Committee on Taxonomy of Viruses taxonomy using an in silico approach, aiming to identify the specific sequence for each virus. We applied this approach to norovirus in Caliciviridae, which causes 90% of gastroenteritis cases worldwide. First, based on the dogma "protein structure determines its function," we hypothesized that the specific sequence can be identified by the specific structure. Firstly, we extracted the coding region (CDS). Secondly, the CDS protein sequences of each genus were annotated by the conserved domain database (CDD) search. Finally, the conserved domains of each genus in Caliciviridae are classified by RPS-BLAST with CDD. The analysis result is that Caliciviridae has sequences including RNA helicase in common. In case of Norovirus, Calicivirus coat protein C terminal and viral polyprotein N-terminal appears as a specific domain in Caliciviridae. It does not include in the other genera in Caliciviridae. If this method is utilized to detect specific conserved domains, it can be used as classification keywords based on protein functional structure. After determining the specific protein domains, the specific protein domain sequences would be converted to gene sequences. This sequences would be re-used one of viral bio-marks.

Keywords

References

  1. Lefkowitz EJ, Dempsey DM, Hendrickson RC, Orton RJ, Siddell SG, Smith DB. Virus taxonomy: the database of the International Committee on Taxonomy of Viruses (ICTV). Nucleic Acids Res 2018;46:D708-D717. https://doi.org/10.1093/nar/gkx932
  2. Zheng DP, Ando T, Fankhauser RL, Beard RS, Glass RI, Monroe SS. Norovirus classification and proposed strain nomenclature. Virology 2006;346:312-323. https://doi.org/10.1016/j.virol.2005.11.015
  3. Marchler-Bauer A, Lu S, Anderson JB, Chitsaz F, Derbyshire MK, DeWeese-Scott C, et al. CDD: a Conserved Domain Database for the functional annotation of proteins. Nucleic Acids Res 2011;39:D225-D229. https://doi.org/10.1093/nar/gkq769
  4. Pruitt KD, Tatusova T, Maglott DR. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 2007;35:D61-D65. https://doi.org/10.1093/nar/gkl842
  5. Bao Y, Chetvernin V, Tatusova T. Improvements to pairwise sequence comparison (PASC): a genome-based web tool for virus classification. Arch Virol 2014;159:3293-3304. https://doi.org/10.1007/s00705-014-2197-x
  6. King AM, Adams MJ, Carstens EB, Lefkowitz EJ. Virus Taxonomy: Classification and Nomenclature of Viruses. Ninth report of the International Committee on Taxonomy of Viruses. Amsterdam: Academic Press, 2012.
  7. Marchler-Bauer A, Zheng C, Chitsaz F, Derbyshire MK, Geer LY, Geer RC, et al. CDD: conserved domains and protein three-dimensional structure. Nucleic Acids Res 2013;41:D348-D352.
  8. El-Gebali S, Mistry J, Bateman A, Eddy SR, Luciani A, Potter SC, et al. The Pfam protein families database in 2019. Nucleic Acids Res 2018 Oct 24 [Epub]. https://doi.org/10.1093/nar/gky995.
  9. Steimer L, Klostermeier D. RNA helicases in infection and disease. RNA Biol 2012;9:751-771. https://doi.org/10.4161/rna.20090