DOI QR코드

DOI QR Code

Prediction of Metal Ion Binding Sites in Proteins from Amino Acid Sequences by Using Simplified Amino Acid Alphabets and Random Forest Model

  • Kumar, Suresh (Department of Diagnostic and Allied Health Sciences, Faculty of Health and Life Sciences, Management and Science University)
  • Received : 2017.10.16
  • Accepted : 2017.11.16
  • Published : 2017.12.31

Abstract

Metal binding proteins or metallo-proteins are important for the stability of the protein and also serve as co-factors in various functions like controlling metabolism, regulating signal transport, and metal homeostasis. In structural genomics, prediction of metal binding proteins help in the selection of suitable growth medium for overexpression's studies and also help in obtaining the functional protein. Computational prediction using machine learning approach has been widely used in various fields of bioinformatics based on the fact all the information contains in amino acid sequence. In this study, random forest machine learning prediction systems were deployed with simplified amino acid for prediction of individual major metal ion binding sites like copper, calcium, cobalt, iron, magnesium, manganese, nickel, and zinc.

Keywords

References

  1. Andreini C, Bertini I, Rosato A. A hint to search for metalloproteins in gene banks. Bioinformatics 2004;20:1373-1380. https://doi.org/10.1093/bioinformatics/bth095
  2. Clapp LA, Siddons CJ, Whitehead JR, VanDerveer DG, Rogers RD, Griffin ST, et al. Factors controlling metal-ion selectivity in the binding sites of calcium-binding proteins: the metal-binding properties of amide donors. A crystallographic and thermodynamic study. Inorg Chem 2005;44:8495-8502. https://doi.org/10.1021/ic050632s
  3. Kaur-Atwal G, Weston DJ, Green PS, Crosland S, Bonner PL, Creaser CS. On-line capillary column immobilised metal affinity chromatography/electrospray ionisation mass spectrometry for the selective analysis of histidine-containing peptides. J Chromatogr B Analyt Technol Biomed Life Sci 2007;857:240-245. https://doi.org/10.1016/j.jchromb.2007.07.025
  4. Feng S, Pan C, Jiang X, Xu S, Zhou H, Ye M, et al. Fe3+ immobilized metal affinity chromatography with silica monolithic capillary column for phosphoproteome analysis. Proteomics 2007;7:351-360. https://doi.org/10.1002/pmic.200600045
  5. Osborn MT, Herrin K, Buzen FG, Hurlburt BK, Chambers TC. Electrophoretic mobility shift assay coupled with immunoblotting for the identification of DNA-binding proteins. Biotechniques 1999;27:887-890, 892.
  6. Smith MF Jr, Delbary-Gossart S. Electrophoretic mobility shift assay (EMSA). Methods Mol Med 2001;50:249-257.
  7. Korshin G, Chow CW, Fabris R, Drikas M. Absorbance spectroscopy-based examination of effects of coagulation on the reactivity of fractions of natural organic matter with varying apparent molecular weights. Water Res 2009;43:1541-1548. https://doi.org/10.1016/j.watres.2008.12.041
  8. Nigg PE, Pavlovic J. Characterization of multi-subunit protein complexes of human MxA using non-denaturing polyacrylamide gel-electrophoresis. J Vis Exp 2016;(116):e54683.
  9. Jensen MR, Petersen G, Lauritzen C, Pedersen J, Led JJ. Metal binding sites in proteins: identification and characterization by paramagnetic NMR relaxation. Biochemistry 2005;44:11014-11023. https://doi.org/10.1021/bi0508136
  10. Rondeau P, Sers S, Jhurry D, Cadet F. Sugar interaction with metals in aqueous solution: indirect determination from infrared and direct determination from nuclear magnetic resonance spectroscopy. Appl Spectrosc 2003;57:466-472. https://doi.org/10.1366/00037020360626023
  11. Zhu D, Herbert BE, Schlautman MA, Carraway ER. Characterization of cation-pi interactions in aqueous solution using deuterium nuclear magnetic resonance spectroscopy. J Environ Qual 2004;33:276-284. https://doi.org/10.2134/jeq2004.2760
  12. Butler M, Cabrera GM. A mass spectrometry-based method for differentiation of positional isomers of monosubstituted pyrazine N-oxides using metal ion complexes. J Mass Spectrom 2015;50:136-144. https://doi.org/10.1002/jms.3506
  13. Lin CT, Lin KL, Yang CH, Chung IF, Huang CD, Yang YS. Protein metal binding residue prediction based on neural networks. Int J Neural Syst 2005;15:71-84. https://doi.org/10.1142/S0129065705000116
  14. Passerini A, Punta M, Ceroni A, Rost B, Frasconi P. Identifying cysteines and histidines in transition-metal-binding sites using support vector machines and neural networks. Proteins 2006;65:305-316. https://doi.org/10.1002/prot.21135
  15. Lippi M, Passerini A, Punta M, Rost B, Frasconi P. Metal-Detector: a web server for predicting metal-binding sites and disulfide bridges in proteins from sequence. Bioinformatics 2008;24:2094-2095. https://doi.org/10.1093/bioinformatics/btn371
  16. Deng H, Chen G, Yang W, Yang JJ. Predicting calcium-binding sites in proteins: a graph theory and geometry approach. Proteins 2006;64:34-42. https://doi.org/10.1002/prot.20973
  17. Schymkowitz JW, Rousseau F, Martins IC, Ferkinghoff-Borg J, Stricher F, Serrano L. Prediction of water and metal binding sites and their affinities by using the Fold-X force field. Proc Natl Acad Sci U S A 2005;102:10147-10152. https://doi.org/10.1073/pnas.0501980102
  18. Chen Z, Wang Y, Zhai YF, Song J, Zhang Z. ZincExplorer: an accurate hybrid method to improve the prediction of zincbinding sites from protein sequences. Mol Biosyst 2013;9:2213-2222. https://doi.org/10.1039/c3mb70100j
  19. Levy R, Edelman M, Sobolev V. Prediction of 3D metal binding sites from translated gene sequences based on remote-homology templates. Proteins 2009;76:365-374. https://doi.org/10.1002/prot.22352
  20. Passerini A, Lippi M, Frasconi P. MetalDetector v2.0: predicting the geometry of metal binding sites from protein sequence. Nucleic Acids Res 2011;39:W288-W292. https://doi.org/10.1093/nar/gkr365
  21. Murphy LR, Wallqvist A, Levy RM. Simplified amino acid alphabets for protein fold recognition and implications for folding. Protein Eng 2000;13:149-152. https://doi.org/10.1093/protein/13.3.149
  22. Parisi G, Echave J. Structural constraints and emergence of sequence patterns in protein evolution. Mol Biol Evol 2001;18:750-756. https://doi.org/10.1093/oxfordjournals.molbev.a003857
  23. Tainer JA, Roberts VA, Getzoff ED. Metal-binding sites in proteins. Curr Opin Biotechnol 1991;2:582-591. https://doi.org/10.1016/0958-1669(91)90084-I
  24. Zuo Y, Lv Y, Wei Z, Yang L, Li G, Fan G. iDPF-PseRAAAC: a web-server for identifying the defensin peptide family and subfamily using pseudo reduced amino acid alphabet composition. PLoS One 2015;10:e0145541. https://doi.org/10.1371/journal.pone.0145541
  25. Lu MF, Xie Y, Zhang YJ, Xing XY. Effects of cofactors on conformation transition of random peptides consisting of a reduced amino acid alphabet. Protein Pept Lett 2015;22:579-585. https://doi.org/10.2174/0929866522666150520150230
  26. Liu B, Xu J, Lan X, Xu R, Zhou J, Wang X, et al. iDNA-Prot I dis: identifying DNA-binding proteins by incorporating amino acid distance-pairs and reduced alphabet profile into the general pseudo amino acid composition. PLoS One 2014;9:e106691. https://doi.org/10.1371/journal.pone.0106691
  27. Feng PM, Chen W, Lin H, Chou KC. iHSP-PseRAAAC: Identifying the heat shock protein families using pseudo reduced amino acid alphabet composition. Anal Biochem 2013;442:118-125. https://doi.org/10.1016/j.ab.2013.05.024
  28. Chakrabarti P, Pal D. The interrelationships of side-chain and main-chain conformations in proteins. Prog Biophys Mol Biol 2001;76:1-102. https://doi.org/10.1016/S0079-6107(01)00005-0
  29. Etchebest C, Benros C, Bornot A, Camproux AC, de Brevern AG. A reduced amino acid alphabet for understanding and designing protein adaptation to mutation. Eur Biophys J 2007;36:1059-1069. https://doi.org/10.1007/s00249-007-0188-5
  30. Weathers EA, Paulaitis ME, Woolf TB, Hoh JH. Reduced amino acid alphabet is sufficient to accurately recognize intrinsically disordered protein. FEBS Lett 2004;576:348-352. https://doi.org/10.1016/j.febslet.2004.09.036
  31. The UniProt Consortium. UniProt: the universal protein knowledgebase. Nucleic Acids Res 2017;45:D158-D169. https://doi.org/10.1093/nar/gkw1099
  32. Li W, Godzik A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 2006;22:1658-1659. https://doi.org/10.1093/bioinformatics/btl158
  33. Suzek BE, Wang Y, Huang H, McGarvey PB, Wu CH; UniProt Consortium. UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches. Bioinformatics 2015;31:926-932. https://doi.org/10.1093/bioinformatics/btu739
  34. Cohen G, Hilario M, Sax H, Hugonnet S, Geissbuhler A. Learning from imbalanced data in surveillance of nosocomial infection. Artif Intell Med 2006;37:7-18. https://doi.org/10.1016/j.artmed.2005.03.002
  35. Cannata N, Toppo S, Romualdi C, Valle G. Simplifying amino acid alphabets by means of a branch and bound algorithm and substitution matrices. Bioinformatics 2002;18:1102-1108. https://doi.org/10.1093/bioinformatics/18.8.1102
  36. Rose GD, Geselowitz AR, Lesser GJ, Lee RH, Zehfus MH. Hydrophobicity of amino acid residues in globular proteins. Science 1985;229:834-838. https://doi.org/10.1126/science.4023714
  37. Zheng C, Wang M, Takemoto K, Akutsu T, Zhang Z, Song J. An integrative computational framework based on a two-step random forest algorithm improves prediction of zinc-binding sites in proteins. PLoS One 2012;7:e49716. https://doi.org/10.1371/journal.pone.0049716
  38. Frank E, Hall M, Trigg L, Holmes G, Witten IH. Data mining in bioinformatics using Weka. Bioinformatics 2004;20:2479-2481. https://doi.org/10.1093/bioinformatics/bth261
  39. Smith TC, Frank E. Introducing machine learning concepts with WEKA. Methods Mol Biol 2016;1418:353-378.
  40. Varma S, Simon R. Bias in error estimation when using cross-validation for model selection. BMC Bioinformatics 2006;7:91. https://doi.org/10.1186/1471-2105-7-91
  41. Sahiner B, Chan HP, Hadjiiski L. Classifier performance prediction for computer-aided diagnosis using a limited dataset. Med Phys 2008;35:1559-1570. https://doi.org/10.1118/1.2868757
  42. Liu H, Jiang H, Zheng R. The hybrid feature selection algorithm based on maximum minimum backward selection search strategy for liver tissue pathological image classification. Comput Math Methods Med 2016;2016:7369137.
  43. Mandal M, Mukhopadhyay A, Maulik U. Prediction of protein subcellular localization by incorporating multiobjective PSObased feature subset selection into the general form of Chou's PseAAC. Med Biol Eng Comput 2015;53:331-344. https://doi.org/10.1007/s11517-014-1238-7