Functional Prediction of Hypothetical Proteins from Shigella flexneri and Validation of the Predicted Models by Using ROC Curve Analysis |
Gazi, Md. Amran
(Nutrition and Clinical Services Division, International Centre for Diarrhoeal Disease Research)
Mahmud, Sultan (Infectious Diseases Division, International Centre for Diarrhoeal Disease Research) Fahim, Shah Mohammad (Nutrition and Clinical Services Division, International Centre for Diarrhoeal Disease Research) Kibria, Mohammad Golam (Infectious Diseases Division, International Centre for Diarrhoeal Disease Research) Palit, Parag (Nutrition and Clinical Services Division, International Centre for Diarrhoeal Disease Research) Islam, Md. Rezaul (International Max Planck Research School) Rashid, Humaira (Infectious Diseases Division, International Centre for Diarrhoeal Disease Research) Das, Subhasish (Nutrition and Clinical Services Division, International Centre for Diarrhoeal Disease Research) Mahfuz, Mustafa (Nutrition and Clinical Services Division, International Centre for Diarrhoeal Disease Research) Ahmeed, Tahmeed (Nutrition and Clinical Services Division, International Centre for Diarrhoeal Disease Research) |
1 | Kowalski JC, Belfort M, Stapleton MA, Holpert M, Dansereau JT, Pietrokovski S, et al. Configuration of the catalytic GIY-YIG domain of intron endonuclease I-TevI: coincidence of computational and molecular findings. Nucleic Acids Res 1999;27:2115-2125. DOI |
2 | Van Roey P, Meehan L, Kowalski JC, Belfort M, Derbyshire V. Catalytic domain structure and hypothesis for function of GIY-YIG intron endonuclease I-TevI. Nat Struct Biol 2002;9:806-811. |
3 | Iyer LM, Zhang D, Rogozin IB, Aravind L. Evolution of the deaminase fold and multiple origins of eukaryotic editing and mutagenic nucleic acid deaminases from bacterial toxin systems. Nucleic Acids Res 2011;39:9473-9497. DOI |
4 | Shu W, Liu J, Ji H, Lu M. Core structure of the outer membrane lipoprotein from Escherichia coli at 1.9 A resolution. J Mol Biol 2000;299:1101-1112. DOI |
5 | Saurin W, Hofnung M, Dassa E. Getting in or out: early segregation between importers and exporters in the evolution of ATP-binding cassette (ABC) transporters. J Mol Evol 1999;48:22-41. DOI |
6 | Freeman ZN, Dorus S, Waterfield NR. The KdpD/KdpE two-component system: integrating K(+) homeostasis and virulence. PLoS Pathog 2013;9:e1003201. DOI |
7 | Ibanez-Ruiz M, Robbe-Saule V, Hermant D, Labrude S, Norel F. Identification of RpoS (sigma(S))-regulated genes in Salmonella enterica serovar Typhimurium. J Bacteriol 2000;182:5749-5756. DOI |
8 | Peterson PA, Rask L, Ostberg L, Andersson L, Kamwendo F, Pertoft H. Studies on the transport and cellular distribution of vitamin A in normal and vitamin A-deficient rats with special reference to the vitamin A-binding plasma protein. J Biol Chem 1973;248:4009-4022. |
9 | Minailiuc OM, Vavelyuk O, Gandhi S, Hung MN, Cygler M, Ekiel I. NMR structure of YcgL, a conserved protein from Escherichia coli representing the DUF709 family, with a novel alpha/beta/alpha sandwich fold. Proteins 2007;66:1004-1007. DOI |
10 | Livorsi DJ, Stenehjem E, Stephens DS. Virulence factors of gram-negative bacteria in sepsis with a focus on Neisseria meningitidis. In: Sepsis: Pro-Inflammatory and Anti-Inflammatory Responses (Herwald H, Egesten A, eds.). Basel: Karger Publishers, 2011. pp. 31-47. |
11 | Gerdes K, Wagner EG. RNA antitoxins. Curr Opin Microbiol 2007;10:117-124. DOI |
12 | Fry J, Wood M, Poole PS. Investigation of myo-inositol catabolism in Rhizobium leguminosarum bv. viciae and its effect on nodulation competitiveness. Mol Plant Microbe Interact 2001;14:1016-1025. DOI |
13 | Bollinger JM Jr, Kwon DS, Huisman GW, Kolter R, Walsh CT. Glutathionylspermidine metabolism in Escherichia coli: purification, cloning, overproduction, and characterization of a bifunctional glutathionylspermidine synthetase/amidase. J Biol Chem 1995;270:14031-14041. DOI |
14 | Ejim LJ, D'Costa VM, Elowe NH, Loredo-Osti JC, Malo D, Wright GD. Cystathionine beta-lyase is important for virulence of Salmonella enterica serovar Typhimurium. Infect Immun 2004;72:3310-3314. DOI |
15 | Kawano M, Aravind L, Storz G. An antisense RNA controls synthesis of an SOS-induced toxin evolved from an antitoxin. Mol Microbiol 2007;64:738-754. DOI |
16 | Kawano M. Divergently overlapping cis-encoded antisense RNA regulating toxin-antitoxin systems from E. coli: hok/sok, ldr/rdl, symE/symR. RNA Biol 2012;9:1520-1527. DOI |
17 | Ruggeri ZM, Ware J. von Willebrand factor. FASEB J 1993;7:308-316. DOI |
18 | Taneja N, Mewara A. Shigellosis: epidemiology in India. Indian J Med Res 2016;143:565-576. DOI |
19 | Marra A. Targeting virulence for antibacterial chemotherapy: identifying and characterising virulence factors for lead discovery. Drugs R D 2006;7:1-16. DOI |
20 | Keusch GT. Shigella infections. Clin Gastroenterol 1979;8:645-662. |
21 | Xu D, Xu Y, Uberbacher EC. Computational tools for protein modeling. Curr Protein Pept Sci 2000;1:1-21. DOI |
22 | Sillitoe I, Cuff AL, Dessailly BH, Dawson NL, Furnham N, Lee D, et al. New functional families (FunFams) in CATH to improve the mapping of conserved functional sites to 3D structures. Nucleic Acids Res 2013;41:D490-D498. |
23 | Gough J, Karplus K, Hughey R, Chothia C. Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. J Mol Biol 2001;313:903-919. DOI |
24 | Rappoport N, Karsenty S, Stern A, Linial N, Linial M. ProtoNet 6.0: organizing 10 million protein sequences in a compact hierarchical family tree. Nucleic Acids Res 2012;40:D313-D320. DOI |
25 | Chen CC, Hwang JK, Yang JM. (PS)2-v2: template-based protein structure prediction server. BMC Bioinformatics 2009;10:366. DOI |
26 | Baron C, Coombes B. Targeting bacterial secretion systems: benefits of disarmament in the microcosm. Infect Disord Drug Targets 2007;7:19-27. DOI |
27 | Ahmad F, Jan R, Kannan M, Obser T, Hassan MI, Oyen F, et al. Characterisation of mutations and molecular studies of type 2 von Willebrand disease. Thromb Haemost 2013;109:39-46. DOI |
28 | Naqvi AA, Shahbaaz M, Ahmad F, Hassan MI. Identification of functional candidates amongst hypothetical proteins of Treponema pallidum ssp. pallidum. PLoS One 2015;10:e0124177. DOI |
29 | Colombatti A, Bonaldo P, Doliana R. Type A modules: interacting domains found in several non-fibrillar collagens and in other extracellular matrix proteins. Matrix 1993;13:297-306. DOI |
30 | Shen HB, Chou KC. Predicting protein fold pattern with functional domain and sequential evolution information. J Theor Biol 2009;256:441-446. DOI |
31 | Saha S, Raghava GP. VICMpred: an SVM-based method for the prediction of functional proteins of Gram-negative bacteria using amino acid patterns and composition. Genomics Proteomics Bioinformatics 2006;4:42-47. DOI |
32 | Garg A, Gupta D. VirulentPred: a SVM based prediction method for virulent proteins in bacterial pathogens. BMC Bioinformatics 2008;9:62. DOI |
33 | Szklarczyk D, Franceschini A, Kuhn M, Simonovic M, Roth A, Minguez P, et al. The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic Acids Res 2011;39:D561-D568. DOI |
34 | Chothia C, Lesk AM. The relation between the divergence of sequence and structure in proteins. EMBO J 1986;5:823-826. DOI |
35 | Metz CE. Basic principles of ROC analysis. Semin Nucl Med 1978;8:283-298. DOI |
36 | Anandakumar S, Shanmughavel P. Computational annotation for hypothetical proteins of Mycobacterium tuberculosis. J Comput Sci Syst Biol 2008;1:50-62. |
37 | Galperin MY, Koonin EV. 'Conserved hypothetical' proteins: prioritization of targets for experimental study. Nucleic Acids Res 2004;32:5452-5463. DOI |
38 | Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997;25:3389-3402. DOI |
39 | Eddy SR. Profile hidden Markov models. Bioinformatics 1998;14:755-763. DOI |
40 | Marchler-Bauer A, Anderson JB, Derbyshire MK, DeWeese-Scott C, Gonzales NR, Gwadz M, et al. CDD: a conserved domain database for interactive domain family analysis. Nucleic Acids Res 2007;35:D237-D240. DOI |
41 | Bateman A, Birney E, Cerruti L, Durbin R, Etwiller L, Eddy SR, et al. The Pfam protein families database. Nucleic Acids Res 2002;30:276-280. DOI |
42 | Finn RD, Clements J, Arndt W, Miller BL, Wheeler TJ, Schreiber F, et al. HMMER web server: 2015 update. Nucleic Acids Res 2015;43:W30-W38. DOI |
43 | Letunic I, Doerks T, Bork P. SMART 7: recent updates to the protein domain annotation resource. Nucleic Acids Res 2012;40:D302-D305. DOI |
44 | Wei J, Goldberg MB, Burland V, Venkatesan MM, Deng W, Fournier G, et al. Complete genome sequence and comparative genomics of Shigella flexneri serotype 2a strain 2457T. Infect Immun 2003;71:2775-2786. DOI |
45 | de Castro E, Sigrist CJ, Gattiker A, Bulliard V, Langendijk-Genevaux PS, Gasteiger E, et al. ScanProsite: detection of PROSITE signature matches and ProRule-associated functional and structural residues in proteins. Nucleic Acids Res 2006;34:W362-W365. DOI |
46 | Parajuli P, Adamski M, Verma NK. Bacteriophages are the major drivers of Shigella flexneri serotype 1c genome plasticity: a complete genome analysis. BMC Genomics 2017;18:722. DOI |
47 | Ferreccio C, Prado V, Ojeda A, Cayyazo M, Abrego P, Guers L, et al. Epidemiologic patterns of acute diarrhea and endemic Shigella infections in children in a poor periurban setting in Santiago, Chile. Am J Epidemiol 1991;134:614-627. DOI |
48 | von Seidlein L, Kim DR, Ali M, Lee H, Wang X, Thiem VD, et al. A multicentre study of Shigella diarrhoea in six Asian countries: disease burden, clinical manifestations, and microbiology. PLoS Med 2006;3:e353. DOI |
49 | Nuesch-Inderbinen M, Heini N, Zurfluh K, Althaus D, Hachler H, Stephan R. Shigella antimicrobial drug resistance mechanisms, 2004-2014. Emerg Infect Dis 2016;22:1083-1085. DOI |
50 | Zhu Z, Zhou X, Li B, Wang S, Cheng F, Zhang J. Genomic analysis and resistance mechanisms in Shigella flexneri 2a strain 301. Microb Drug Resist 2018;24:323-336. DOI |
51 | Desler C, Suravajhala P, Sanderhoff M, Rasmussen M, Rasmussen LJ. In silico screening for functional candidates amongst hypothetical proteins. BMC Bioinformatics 2009;10:289. DOI |
52 | Loewenstein Y, Raimondo D, Redfern OC, Watson J, Frishman D, Linial M, et al. Protein function annotation by homology-based inference. Genome Biol 2009;10:207. DOI |
53 | Nimrod G, Schushan M, Steinberg DM, Ben-Tal N. Detection of functionally important regions in "hypothetical proteins" of known structure. Structure 2008;16:1755-1763. DOI |
54 | Morishita R, Kawagoshi A, Sawasaki T, Madin K, Ogasawara T, Oka T, et al. Ribonuclease activity of rat liver perchloric acid-soluble protein, a potent inhibitor of protein synthesis. J Biol Chem 1999;274:20688-20692. DOI |
55 | Eng J. ROC analysis: web-based calculator for ROC curves. Baltimore: Johns Hopkins University, 2006. Accessed 2018 Sep 1. Available from: http://www.jrocfit.org. |
56 | Shahbaaz M, Hassan MI, Ahmad F. Functional annotation of conserved hypothetical proteins from Haemophilus influenzae Rd KW20. PLoS One 2013;8:e84263. DOI |
57 | Delucia AM, Six DA, Caughlan RE, Gee P, Hunt I, Lam JS, et al. Lipopolysaccharide (LPS) inner-core phosphates are required for complete LPS synthesis and transport to the outer membrane in Pseudomonas aeruginosa PAO1. MBio 2011;2:e00142-11. |
58 | Burk DL, Ghuman N, Wybenga-Groot LE, Berghuis AM. X-ray structure of the AAC(6')-Ii antibiotic resistance enzyme at 1.8 A resolution: examination of oligomeric arrangements in GNAT superfamily members. Protein Sci 2003;12:426-437. DOI |
59 | Bjornson HS. Enzymes associated with the survival and virulence of gram-negative anaerobes. Rev Infect Dis 1984;6 Suppl 1:S21-S24. DOI |
60 | Lambrecht JA, Flynn JM, Downs DM. Conserved YjgF protein family deaminates reactive enamine/imine intermediates of pyridoxal 5'-phosphate (PLP)-dependent enzyme reactions. J Biol Chem 2012;287:3454-3461. DOI |
61 | Schmitz G, Downs DM. Reduced transaminase B (IlvE) activity caused by the lack of yjgF is dependent on the status of threonine deaminase (IlvA) in Salmonella enterica serovar Typhimurium. J Bacteriol 2004;186:803-810. DOI |
62 | Aravind L, Leipe DD, Koonin EV. Toprim: a conserved catalytic domain in type IA and II topoisomerases, DnaG-type primases, OLD family nucleases and RecR proteins. Nucleic Acids Res 1998;26:4205-4213. DOI |
63 | Bhasin M, Garg A, Raghava GP. PSLpred: prediction of subcellular localization of bacterial proteins. Bioinformatics 2005;21:2522-2524. DOI |
64 | Rothberg JM, Jacobs JR, Goodman CS, Artavanis-Tsakonas S. slit: an extracellular protein necessary for development of midline glia and commissural axon pathways contains both EGF and LRR domains. Genes Dev 1990;4:2169-2187. DOI |
65 | Kovacs-Simon A, Titball RW, Michell SL. Lipoproteins of bacterial pathogens. Infect Immun 2011;79:548-561. DOI |
66 | Shanmugham B, Pan A. Identification and characterization of potential therapeutic candidates in emerging human pathogen Mycobacterium abscessus: a novel hierarchical in silico approach. PLoS One 2013;8:e59126. DOI |
67 | Yu CS, Chen YC, Lu CH, Hwang JK. Prediction of protein subcellular localization. Proteins 2006;64:643-651. DOI |
68 | Yu NY, Wagner JR, Laird MR, Melli G, Rey S, Lo R, et al. PSORTb 3.0: improved protein subcellular localization prediction with refined localization subcategories and predictive capabilities for all prokaryotes. Bioinformatics 2010;26:1608-1615. DOI |
69 | Krogh A, Larsson B, von Heijne G, Sonnhammer EL. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol 2001;305:567-580. DOI |
70 | Hirokawa T, Boon-Chieng S, Mitaku S. SOSUI: classification and secondary structure prediction system for membrane proteins. Bioinformatics 1998;14:378-379. DOI |
71 | Tusnady GE, Simon I. The HMMTOP transmembrane topology prediction server. Bioinformatics 2001;17:849-850. DOI |
72 | Petersen TN, Brunak S, von Heijne G, Nielsen H. SignalP 4.0: discriminating signal peptides from transmembrane regions. Nat Methods 2011;8:785-786. DOI |
73 | Mathan MM, Mathan VI. Ultrastructural pathology of the rectal mucosa in Shigella dysentery. Am J Pathol 1986;123:25-38. |
74 | Torti SV, Park JT. Lipoprotein of gram-negative bacteria is essential for growth and division. Nature 1976;263:323-326. DOI |
75 | Bendtsen JD, Kiemer L, Fausboll A, Brunak S. Non-classical protein secretion in bacteria. BMC Microbiol 2005;5:58. DOI |
76 | Kanehisa M, Goto S, Kawashima S, Nakaya A. The KEGG databases at GenomeNet. Nucleic Acids Res 2002;30:42-46. DOI |
77 | Quevillon E, Silventoinen V, Pillai S, Harte N, Mulder N, Apweiler R, et al. InterProScan: protein domains identifier. Nucleic Acids Res 2005;33:W116-W120. DOI |
78 | Kumar K, Prakash A, Tasleem M, Islam A, Ahmad F, Hassan MI. Functional annotation of putative hypothetical proteins from Candida dubliniensis. Gene 2014;543:93-100. DOI |
79 | Lubec G, Afjehi-Sadat L, Yang JW, John JP. Searching for hypothetical proteins: theory and practice based upon original data and literature. Prog Neurobiol 2005;77:90-127. DOI |
80 | GBD Diarrhoeal Diseases Collaborators. Estimates of global, regional, and national morbidity, mortality, and aetiologies of diarrhoeal diseases: a systematic analysis for the Global Burden of Disease Study 2015. Lancet Infect Dis 2017;17:909-948. DOI |
81 | Shahbaaz M, Ahmad F, Imtaiyaz Hassan M. Structure-based functional annotation of putative conserved proteins having lyase activity from Haemophilus influenzae. 3 Biotech 2015;5:317-336. |
82 | Sinha A, Ahmad F, Hassan MI. Structure based functional annotation of putative conserved proteins from Treponema pallidum: search for a potential drug target. Lett Drug Des Discov 2015;12:46-59. |
83 | Adams MA, Suits MD, Zheng J, Jia Z. Piecing together the structure-function puzzle: experiences in structure-based functional annotation of hypothetical proteins. Proteomics 2007;7:2920-2932. DOI |
84 | Doerks T, von Mering C, Bork P. Functional clues for hypothetical proteins based on genomic context analysis in prokaryotes. Nucleic Acids Res 2004;32:6321-6326. DOI |
85 | Gazi MA, Kibria MG, Mahfuz M, Islam MR, Ghosh P, Afsar MN, et al. Functional, structural and epitopic prediction of hypothetical proteins of Mycobacterium tuberculosis H37Rv: an in silico approach for prioritizing the targets. Gene 2016;591:442-455. DOI |
86 | Cerveny L, Straskova A, Dankova V, Hartlova A, Ceckova M, Staud F, et al. Tetratricopeptide repeat motifs in the world of bacterial pathogens: role in virulence mechanisms. Infect Immun 2013;81:629-635. DOI |
87 | Tavernarakis N, Driscoll M, Kyrpides NC. The SPFH domain: implicated in regulating targeted protein turnover in stomatins and other membrane-associated proteins. Trends Biochem Sci 1999;24:425-427. DOI |
88 | Gehl B, Sweetlove LJ. Mitochondrial Band-7 family proteins: scaffolds for respiratory chain assembly? Front Plant Sci 2014;5:141. |
89 | Wu T, McCandlish AC, Gronenberg LS, Chng SS, Silhavy TJ, Kahne D. Identification of a protein complex that assembles lipopolysaccharide in the outer membrane of Escherichia coli. Proc Natl Acad Sci U S A 2006;103:11754-11759. DOI |
90 | Singer HM, Kuhne C, Deditius JA, Hughes KT, Erhardt M. The Salmonella Spi1 virulence regulatory protein HilD directly activates transcription of the flagellar master operon flhDC. J Bacteriol 2014;196:1448-1457. DOI |