Browse > Article
http://dx.doi.org/10.5808/GI.2018.16.4.e26

Functional Prediction of Hypothetical Proteins from Shigella flexneri and Validation of the Predicted Models by Using ROC Curve Analysis  

Gazi, Md. Amran (Nutrition and Clinical Services Division, International Centre for Diarrhoeal Disease Research)
Mahmud, Sultan (Infectious Diseases Division, International Centre for Diarrhoeal Disease Research)
Fahim, Shah Mohammad (Nutrition and Clinical Services Division, International Centre for Diarrhoeal Disease Research)
Kibria, Mohammad Golam (Infectious Diseases Division, International Centre for Diarrhoeal Disease Research)
Palit, Parag (Nutrition and Clinical Services Division, International Centre for Diarrhoeal Disease Research)
Islam, Md. Rezaul (International Max Planck Research School)
Rashid, Humaira (Infectious Diseases Division, International Centre for Diarrhoeal Disease Research)
Das, Subhasish (Nutrition and Clinical Services Division, International Centre for Diarrhoeal Disease Research)
Mahfuz, Mustafa (Nutrition and Clinical Services Division, International Centre for Diarrhoeal Disease Research)
Ahmeed, Tahmeed (Nutrition and Clinical Services Division, International Centre for Diarrhoeal Disease Research)
Abstract
Shigella spp. constitutes some of the key pathogens responsible for the global burden of diarrhoeal disease. With over 164 million reported cases per annum, shigellosis accounts for 1.1 million deaths each year. Majority of these cases occur among the children of the developing nations and the emergence of multi-drug resistance Shigella strains in clinical isolates demands the development of better/new drugs against this pathogen. The genome of Shigella flexneri was extensively analyzed and found 4,362 proteins among which the functions of 674 proteins, termed as hypothetical proteins (HPs) had not been previously elucidated. Amino acid sequences of all these 674 HPs were studied and the functions of a total of 39 HPs have been assigned with high level of confidence. Here we have utilized a combination of the latest versions of databases to assign the precise function of HPs for which no experimental information is available. These HPs were found to belong to various classes of proteins such as enzymes, binding proteins, signal transducers, lipoprotein, transporters, virulence and other proteins. Evaluation of the performance of the various computational tools conducted using receiver operating characteristic curve analysis and a resoundingly high average accuracy of 93.6% were obtained. Our comprehensive analysis will help to gain greater understanding for the development of many novel potential therapeutic interventions to defeat Shigella infection.
Keywords
hypothetical protein; in silico; NCBI; ROC curve; Shigella;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Kowalski JC, Belfort M, Stapleton MA, Holpert M, Dansereau JT, Pietrokovski S, et al. Configuration of the catalytic GIY-YIG domain of intron endonuclease I-TevI: coincidence of computational and molecular findings. Nucleic Acids Res 1999;27:2115-2125.   DOI
2 Van Roey P, Meehan L, Kowalski JC, Belfort M, Derbyshire V. Catalytic domain structure and hypothesis for function of GIY-YIG intron endonuclease I-TevI. Nat Struct Biol 2002;9:806-811.
3 Iyer LM, Zhang D, Rogozin IB, Aravind L. Evolution of the deaminase fold and multiple origins of eukaryotic editing and mutagenic nucleic acid deaminases from bacterial toxin systems. Nucleic Acids Res 2011;39:9473-9497.   DOI
4 Shu W, Liu J, Ji H, Lu M. Core structure of the outer membrane lipoprotein from Escherichia coli at 1.9 A resolution. J Mol Biol 2000;299:1101-1112.   DOI
5 Saurin W, Hofnung M, Dassa E. Getting in or out: early segregation between importers and exporters in the evolution of ATP-binding cassette (ABC) transporters. J Mol Evol 1999;48:22-41.   DOI
6 Freeman ZN, Dorus S, Waterfield NR. The KdpD/KdpE two-component system: integrating K(+) homeostasis and virulence. PLoS Pathog 2013;9:e1003201.   DOI
7 Ibanez-Ruiz M, Robbe-Saule V, Hermant D, Labrude S, Norel F. Identification of RpoS (sigma(S))-regulated genes in Salmonella enterica serovar Typhimurium. J Bacteriol 2000;182:5749-5756.   DOI
8 Peterson PA, Rask L, Ostberg L, Andersson L, Kamwendo F, Pertoft H. Studies on the transport and cellular distribution of vitamin A in normal and vitamin A-deficient rats with special reference to the vitamin A-binding plasma protein. J Biol Chem 1973;248:4009-4022.
9 Minailiuc OM, Vavelyuk O, Gandhi S, Hung MN, Cygler M, Ekiel I. NMR structure of YcgL, a conserved protein from Escherichia coli representing the DUF709 family, with a novel alpha/beta/alpha sandwich fold. Proteins 2007;66:1004-1007.   DOI
10 Livorsi DJ, Stenehjem E, Stephens DS. Virulence factors of gram-negative bacteria in sepsis with a focus on Neisseria meningitidis. In: Sepsis: Pro-Inflammatory and Anti-Inflammatory Responses (Herwald H, Egesten A, eds.). Basel: Karger Publishers, 2011. pp. 31-47.
11 Gerdes K, Wagner EG. RNA antitoxins. Curr Opin Microbiol 2007;10:117-124.   DOI
12 Fry J, Wood M, Poole PS. Investigation of myo-inositol catabolism in Rhizobium leguminosarum bv. viciae and its effect on nodulation competitiveness. Mol Plant Microbe Interact 2001;14:1016-1025.   DOI
13 Bollinger JM Jr, Kwon DS, Huisman GW, Kolter R, Walsh CT. Glutathionylspermidine metabolism in Escherichia coli: purification, cloning, overproduction, and characterization of a bifunctional glutathionylspermidine synthetase/amidase. J Biol Chem 1995;270:14031-14041.   DOI
14 Ejim LJ, D'Costa VM, Elowe NH, Loredo-Osti JC, Malo D, Wright GD. Cystathionine beta-lyase is important for virulence of Salmonella enterica serovar Typhimurium. Infect Immun 2004;72:3310-3314.   DOI
15 Kawano M, Aravind L, Storz G. An antisense RNA controls synthesis of an SOS-induced toxin evolved from an antitoxin. Mol Microbiol 2007;64:738-754.   DOI
16 Kawano M. Divergently overlapping cis-encoded antisense RNA regulating toxin-antitoxin systems from E. coli: hok/sok, ldr/rdl, symE/symR. RNA Biol 2012;9:1520-1527.   DOI
17 Ruggeri ZM, Ware J. von Willebrand factor. FASEB J 1993;7:308-316.   DOI
18 Taneja N, Mewara A. Shigellosis: epidemiology in India. Indian J Med Res 2016;143:565-576.   DOI
19 Marra A. Targeting virulence for antibacterial chemotherapy: identifying and characterising virulence factors for lead discovery. Drugs R D 2006;7:1-16.   DOI
20 Keusch GT. Shigella infections. Clin Gastroenterol 1979;8:645-662.
21 Xu D, Xu Y, Uberbacher EC. Computational tools for protein modeling. Curr Protein Pept Sci 2000;1:1-21.   DOI
22 Sillitoe I, Cuff AL, Dessailly BH, Dawson NL, Furnham N, Lee D, et al. New functional families (FunFams) in CATH to improve the mapping of conserved functional sites to 3D structures. Nucleic Acids Res 2013;41:D490-D498.
23 Gough J, Karplus K, Hughey R, Chothia C. Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. J Mol Biol 2001;313:903-919.   DOI
24 Rappoport N, Karsenty S, Stern A, Linial N, Linial M. ProtoNet 6.0: organizing 10 million protein sequences in a compact hierarchical family tree. Nucleic Acids Res 2012;40:D313-D320.   DOI
25 Chen CC, Hwang JK, Yang JM. (PS)2-v2: template-based protein structure prediction server. BMC Bioinformatics 2009;10:366.   DOI
26 Baron C, Coombes B. Targeting bacterial secretion systems: benefits of disarmament in the microcosm. Infect Disord Drug Targets 2007;7:19-27.   DOI
27 Ahmad F, Jan R, Kannan M, Obser T, Hassan MI, Oyen F, et al. Characterisation of mutations and molecular studies of type 2 von Willebrand disease. Thromb Haemost 2013;109:39-46.   DOI
28 Naqvi AA, Shahbaaz M, Ahmad F, Hassan MI. Identification of functional candidates amongst hypothetical proteins of Treponema pallidum ssp. pallidum. PLoS One 2015;10:e0124177.   DOI
29 Colombatti A, Bonaldo P, Doliana R. Type A modules: interacting domains found in several non-fibrillar collagens and in other extracellular matrix proteins. Matrix 1993;13:297-306.   DOI
30 Shen HB, Chou KC. Predicting protein fold pattern with functional domain and sequential evolution information. J Theor Biol 2009;256:441-446.   DOI
31 Saha S, Raghava GP. VICMpred: an SVM-based method for the prediction of functional proteins of Gram-negative bacteria using amino acid patterns and composition. Genomics Proteomics Bioinformatics 2006;4:42-47.   DOI
32 Garg A, Gupta D. VirulentPred: a SVM based prediction method for virulent proteins in bacterial pathogens. BMC Bioinformatics 2008;9:62.   DOI
33 Szklarczyk D, Franceschini A, Kuhn M, Simonovic M, Roth A, Minguez P, et al. The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic Acids Res 2011;39:D561-D568.   DOI
34 Chothia C, Lesk AM. The relation between the divergence of sequence and structure in proteins. EMBO J 1986;5:823-826.   DOI
35 Metz CE. Basic principles of ROC analysis. Semin Nucl Med 1978;8:283-298.   DOI
36 Anandakumar S, Shanmughavel P. Computational annotation for hypothetical proteins of Mycobacterium tuberculosis. J Comput Sci Syst Biol 2008;1:50-62.
37 Galperin MY, Koonin EV. 'Conserved hypothetical' proteins: prioritization of targets for experimental study. Nucleic Acids Res 2004;32:5452-5463.   DOI
38 Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997;25:3389-3402.   DOI
39 Eddy SR. Profile hidden Markov models. Bioinformatics 1998;14:755-763.   DOI
40 Marchler-Bauer A, Anderson JB, Derbyshire MK, DeWeese-Scott C, Gonzales NR, Gwadz M, et al. CDD: a conserved domain database for interactive domain family analysis. Nucleic Acids Res 2007;35:D237-D240.   DOI
41 Bateman A, Birney E, Cerruti L, Durbin R, Etwiller L, Eddy SR, et al. The Pfam protein families database. Nucleic Acids Res 2002;30:276-280.   DOI
42 Finn RD, Clements J, Arndt W, Miller BL, Wheeler TJ, Schreiber F, et al. HMMER web server: 2015 update. Nucleic Acids Res 2015;43:W30-W38.   DOI
43 Letunic I, Doerks T, Bork P. SMART 7: recent updates to the protein domain annotation resource. Nucleic Acids Res 2012;40:D302-D305.   DOI
44 de Castro E, Sigrist CJ, Gattiker A, Bulliard V, Langendijk-Genevaux PS, Gasteiger E, et al. ScanProsite: detection of PROSITE signature matches and ProRule-associated functional and structural residues in proteins. Nucleic Acids Res 2006;34:W362-W365.   DOI
45 Nuesch-Inderbinen M, Heini N, Zurfluh K, Althaus D, Hachler H, Stephan R. Shigella antimicrobial drug resistance mechanisms, 2004-2014. Emerg Infect Dis 2016;22:1083-1085.   DOI
46 Parajuli P, Adamski M, Verma NK. Bacteriophages are the major drivers of Shigella flexneri serotype 1c genome plasticity: a complete genome analysis. BMC Genomics 2017;18:722.   DOI
47 Ferreccio C, Prado V, Ojeda A, Cayyazo M, Abrego P, Guers L, et al. Epidemiologic patterns of acute diarrhea and endemic Shigella infections in children in a poor periurban setting in Santiago, Chile. Am J Epidemiol 1991;134:614-627.   DOI
48 von Seidlein L, Kim DR, Ali M, Lee H, Wang X, Thiem VD, et al. A multicentre study of Shigella diarrhoea in six Asian countries: disease burden, clinical manifestations, and microbiology. PLoS Med 2006;3:e353.   DOI
49 Wei J, Goldberg MB, Burland V, Venkatesan MM, Deng W, Fournier G, et al. Complete genome sequence and comparative genomics of Shigella flexneri serotype 2a strain 2457T. Infect Immun 2003;71:2775-2786.   DOI
50 Zhu Z, Zhou X, Li B, Wang S, Cheng F, Zhang J. Genomic analysis and resistance mechanisms in Shigella flexneri 2a strain 301. Microb Drug Resist 2018;24:323-336.   DOI
51 Desler C, Suravajhala P, Sanderhoff M, Rasmussen M, Rasmussen LJ. In silico screening for functional candidates amongst hypothetical proteins. BMC Bioinformatics 2009;10:289.   DOI
52 Loewenstein Y, Raimondo D, Redfern OC, Watson J, Frishman D, Linial M, et al. Protein function annotation by homology-based inference. Genome Biol 2009;10:207.   DOI
53 Nimrod G, Schushan M, Steinberg DM, Ben-Tal N. Detection of functionally important regions in "hypothetical proteins" of known structure. Structure 2008;16:1755-1763.   DOI
54 Morishita R, Kawagoshi A, Sawasaki T, Madin K, Ogasawara T, Oka T, et al. Ribonuclease activity of rat liver perchloric acid-soluble protein, a potent inhibitor of protein synthesis. J Biol Chem 1999;274:20688-20692.   DOI
55 Eng J. ROC analysis: web-based calculator for ROC curves. Baltimore: Johns Hopkins University, 2006. Accessed 2018 Sep 1. Available from: http://www.jrocfit.org.
56 Shahbaaz M, Hassan MI, Ahmad F. Functional annotation of conserved hypothetical proteins from Haemophilus influenzae Rd KW20. PLoS One 2013;8:e84263.   DOI
57 Delucia AM, Six DA, Caughlan RE, Gee P, Hunt I, Lam JS, et al. Lipopolysaccharide (LPS) inner-core phosphates are required for complete LPS synthesis and transport to the outer membrane in Pseudomonas aeruginosa PAO1. MBio 2011;2:e00142-11.
58 Burk DL, Ghuman N, Wybenga-Groot LE, Berghuis AM. X-ray structure of the AAC(6')-Ii antibiotic resistance enzyme at 1.8 A resolution: examination of oligomeric arrangements in GNAT superfamily members. Protein Sci 2003;12:426-437.   DOI
59 Bjornson HS. Enzymes associated with the survival and virulence of gram-negative anaerobes. Rev Infect Dis 1984;6 Suppl 1:S21-S24.   DOI
60 Lambrecht JA, Flynn JM, Downs DM. Conserved YjgF protein family deaminates reactive enamine/imine intermediates of pyridoxal 5'-phosphate (PLP)-dependent enzyme reactions. J Biol Chem 2012;287:3454-3461.   DOI
61 Schmitz G, Downs DM. Reduced transaminase B (IlvE) activity caused by the lack of yjgF is dependent on the status of threonine deaminase (IlvA) in Salmonella enterica serovar Typhimurium. J Bacteriol 2004;186:803-810.   DOI
62 Aravind L, Leipe DD, Koonin EV. Toprim: a conserved catalytic domain in type IA and II topoisomerases, DnaG-type primases, OLD family nucleases and RecR proteins. Nucleic Acids Res 1998;26:4205-4213.   DOI
63 Bhasin M, Garg A, Raghava GP. PSLpred: prediction of subcellular localization of bacterial proteins. Bioinformatics 2005;21:2522-2524.   DOI
64 Rothberg JM, Jacobs JR, Goodman CS, Artavanis-Tsakonas S. slit: an extracellular protein necessary for development of midline glia and commissural axon pathways contains both EGF and LRR domains. Genes Dev 1990;4:2169-2187.   DOI
65 Kovacs-Simon A, Titball RW, Michell SL. Lipoproteins of bacterial pathogens. Infect Immun 2011;79:548-561.   DOI
66 Shanmugham B, Pan A. Identification and characterization of potential therapeutic candidates in emerging human pathogen Mycobacterium abscessus: a novel hierarchical in silico approach. PLoS One 2013;8:e59126.   DOI
67 Yu CS, Chen YC, Lu CH, Hwang JK. Prediction of protein subcellular localization. Proteins 2006;64:643-651.   DOI
68 Yu NY, Wagner JR, Laird MR, Melli G, Rey S, Lo R, et al. PSORTb 3.0: improved protein subcellular localization prediction with refined localization subcategories and predictive capabilities for all prokaryotes. Bioinformatics 2010;26:1608-1615.   DOI
69 Krogh A, Larsson B, von Heijne G, Sonnhammer EL. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol 2001;305:567-580.   DOI
70 Hirokawa T, Boon-Chieng S, Mitaku S. SOSUI: classification and secondary structure prediction system for membrane proteins. Bioinformatics 1998;14:378-379.   DOI
71 Tusnady GE, Simon I. The HMMTOP transmembrane topology prediction server. Bioinformatics 2001;17:849-850.   DOI
72 Petersen TN, Brunak S, von Heijne G, Nielsen H. SignalP 4.0: discriminating signal peptides from transmembrane regions. Nat Methods 2011;8:785-786.   DOI
73 Mathan MM, Mathan VI. Ultrastructural pathology of the rectal mucosa in Shigella dysentery. Am J Pathol 1986;123:25-38.
74 Torti SV, Park JT. Lipoprotein of gram-negative bacteria is essential for growth and division. Nature 1976;263:323-326.   DOI
75 Bendtsen JD, Kiemer L, Fausboll A, Brunak S. Non-classical protein secretion in bacteria. BMC Microbiol 2005;5:58.   DOI
76 Kanehisa M, Goto S, Kawashima S, Nakaya A. The KEGG databases at GenomeNet. Nucleic Acids Res 2002;30:42-46.   DOI
77 Quevillon E, Silventoinen V, Pillai S, Harte N, Mulder N, Apweiler R, et al. InterProScan: protein domains identifier. Nucleic Acids Res 2005;33:W116-W120.   DOI
78 Kumar K, Prakash A, Tasleem M, Islam A, Ahmad F, Hassan MI. Functional annotation of putative hypothetical proteins from Candida dubliniensis. Gene 2014;543:93-100.   DOI
79 Lubec G, Afjehi-Sadat L, Yang JW, John JP. Searching for hypothetical proteins: theory and practice based upon original data and literature. Prog Neurobiol 2005;77:90-127.   DOI
80 GBD Diarrhoeal Diseases Collaborators. Estimates of global, regional, and national morbidity, mortality, and aetiologies of diarrhoeal diseases: a systematic analysis for the Global Burden of Disease Study 2015. Lancet Infect Dis 2017;17:909-948.   DOI
81 Shahbaaz M, Ahmad F, Imtaiyaz Hassan M. Structure-based functional annotation of putative conserved proteins having lyase activity from Haemophilus influenzae. 3 Biotech 2015;5:317-336.
82 Sinha A, Ahmad F, Hassan MI. Structure based functional annotation of putative conserved proteins from Treponema pallidum: search for a potential drug target. Lett Drug Des Discov 2015;12:46-59.
83 Adams MA, Suits MD, Zheng J, Jia Z. Piecing together the structure-function puzzle: experiences in structure-based functional annotation of hypothetical proteins. Proteomics 2007;7:2920-2932.   DOI
84 Doerks T, von Mering C, Bork P. Functional clues for hypothetical proteins based on genomic context analysis in prokaryotes. Nucleic Acids Res 2004;32:6321-6326.   DOI
85 Gazi MA, Kibria MG, Mahfuz M, Islam MR, Ghosh P, Afsar MN, et al. Functional, structural and epitopic prediction of hypothetical proteins of Mycobacterium tuberculosis H37Rv: an in silico approach for prioritizing the targets. Gene 2016;591:442-455.   DOI
86 Cerveny L, Straskova A, Dankova V, Hartlova A, Ceckova M, Staud F, et al. Tetratricopeptide repeat motifs in the world of bacterial pathogens: role in virulence mechanisms. Infect Immun 2013;81:629-635.   DOI
87 Tavernarakis N, Driscoll M, Kyrpides NC. The SPFH domain: implicated in regulating targeted protein turnover in stomatins and other membrane-associated proteins. Trends Biochem Sci 1999;24:425-427.   DOI
88 Gehl B, Sweetlove LJ. Mitochondrial Band-7 family proteins: scaffolds for respiratory chain assembly? Front Plant Sci 2014;5:141.
89 Wu T, McCandlish AC, Gronenberg LS, Chng SS, Silhavy TJ, Kahne D. Identification of a protein complex that assembles lipopolysaccharide in the outer membrane of Escherichia coli. Proc Natl Acad Sci U S A 2006;103:11754-11759.   DOI
90 Singer HM, Kuhne C, Deditius JA, Hughes KT, Erhardt M. The Salmonella Spi1 virulence regulatory protein HilD directly activates transcription of the flagellar master operon flhDC. J Bacteriol 2014;196:1448-1457.   DOI