[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.5808/GI.2017.15.4.162

Prediction of Metal Ion Binding Sites in Proteins from Amino Acid Sequences by Using Simplified Amino Acid Alphabets and Random Forest Model

Kumar, Suresh (Department of Diagnostic and Allied Health Sciences, Faculty of Health and Life Sciences, Management and Science University)

Publication Information

Abstract

Metal binding proteins or metallo-proteins are important for the stability of the protein and also serve as co-factors in various functions like controlling metabolism, regulating signal transport, and metal homeostasis. In structural genomics, prediction of metal binding proteins help in the selection of suitable growth medium for overexpression's studies and also help in obtaining the functional protein. Computational prediction using machine learning approach has been widely used in various fields of bioinformatics based on the fact all the information contains in amino acid sequence. In this study, random forest machine learning prediction systems were deployed with simplified amino acid for prediction of individual major metal ion binding sites like copper, calcium, cobalt, iron, magnesium, manganese, nickel, and zinc.

Keywords

amino acid sequence; binding sites; machine learning; proteins;

Citations & Related Records

Reference

1	Kaur-Atwal G, Weston DJ, Green PS, Crosland S, Bonner PL, Creaser CS. On-line capillary column immobilised metal affinity chromatography/electrospray ionisation mass spectrometry for the selective analysis of histidine-containing peptides. J Chromatogr B Analyt Technol Biomed Life Sci 2007;857:240-245. DOI
2	Feng S, Pan C, Jiang X, Xu S, Zhou H, Ye M, et al. Fe3+ immobilized metal affinity chromatography with silica monolithic capillary column for phosphoproteome analysis. Proteomics 2007;7:351-360. DOI
3	Osborn MT, Herrin K, Buzen FG, Hurlburt BK, Chambers TC. Electrophoretic mobility shift assay coupled with immunoblotting for the identification of DNA-binding proteins. Biotechniques 1999;27:887-890, 892.
4	Smith MF Jr, Delbary-Gossart S. Electrophoretic mobility shift assay (EMSA). Methods Mol Med 2001;50:249-257.
5	Korshin G, Chow CW, Fabris R, Drikas M. Absorbance spectroscopy-based examination of effects of coagulation on the reactivity of fractions of natural organic matter with varying apparent molecular weights. Water Res 2009;43:1541-1548. DOI
6	Nigg PE, Pavlovic J. Characterization of multi-subunit protein complexes of human MxA using non-denaturing polyacrylamide gel-electrophoresis. J Vis Exp 2016;(116):e54683.
7	Jensen MR, Petersen G, Lauritzen C, Pedersen J, Led JJ. Metal binding sites in proteins: identification and characterization by paramagnetic NMR relaxation. Biochemistry 2005;44:11014-11023. DOI
8	Rondeau P, Sers S, Jhurry D, Cadet F. Sugar interaction with metals in aqueous solution: indirect determination from infrared and direct determination from nuclear magnetic resonance spectroscopy. Appl Spectrosc 2003;57:466-472. DOI
9	Frank E, Hall M, Trigg L, Holmes G, Witten IH. Data mining in bioinformatics using Weka. Bioinformatics 2004;20:2479-2481. DOI
10	Smith TC, Frank E. Introducing machine learning concepts with WEKA. Methods Mol Biol 2016;1418:353-378.
11	Varma S, Simon R. Bias in error estimation when using cross-validation for model selection. BMC Bioinformatics 2006;7:91. DOI
12	Sahiner B, Chan HP, Hadjiiski L. Classifier performance prediction for computer-aided diagnosis using a limited dataset. Med Phys 2008;35:1559-1570. DOI
13	Liu H, Jiang H, Zheng R. The hybrid feature selection algorithm based on maximum minimum backward selection search strategy for liver tissue pathological image classification. Comput Math Methods Med 2016;2016:7369137.
14	Mandal M, Mukhopadhyay A, Maulik U. Prediction of protein subcellular localization by incorporating multiobjective PSObased feature subset selection into the general form of Chou's PseAAC. Med Biol Eng Comput 2015;53:331-344. DOI
15	Chen Z, Wang Y, Zhai YF, Song J, Zhang Z. ZincExplorer: an accurate hybrid method to improve the prediction of zincbinding sites from protein sequences. Mol Biosyst 2013;9:2213-2222. DOI
16	Levy R, Edelman M, Sobolev V. Prediction of 3D metal binding sites from translated gene sequences based on remote-homology templates. Proteins 2009;76:365-374. DOI
17	Passerini A, Lippi M, Frasconi P. MetalDetector v2.0: predicting the geometry of metal binding sites from protein sequence. Nucleic Acids Res 2011;39:W288-W292. DOI
18	Lin CT, Lin KL, Yang CH, Chung IF, Huang CD, Yang YS. Protein metal binding residue prediction based on neural networks. Int J Neural Syst 2005;15:71-84. DOI
19	Zhu D, Herbert BE, Schlautman MA, Carraway ER. Characterization of cation-pi interactions in aqueous solution using deuterium nuclear magnetic resonance spectroscopy. J Environ Qual 2004;33:276-284. DOI
20	Butler M, Cabrera GM. A mass spectrometry-based method for differentiation of positional isomers of monosubstituted pyrazine N-oxides using metal ion complexes. J Mass Spectrom 2015;50:136-144. DOI
21	Passerini A, Punta M, Ceroni A, Rost B, Frasconi P. Identifying cysteines and histidines in transition-metal-binding sites using support vector machines and neural networks. Proteins 2006;65:305-316. DOI
22	Andreini C, Bertini I, Rosato A. A hint to search for metalloproteins in gene banks. Bioinformatics 2004;20:1373-1380. DOI
23	Clapp LA, Siddons CJ, Whitehead JR, VanDerveer DG, Rogers RD, Griffin ST, et al. Factors controlling metal-ion selectivity in the binding sites of calcium-binding proteins: the metal-binding properties of amide donors. A crystallographic and thermodynamic study. Inorg Chem 2005;44:8495-8502. DOI
24	Lippi M, Passerini A, Punta M, Rost B, Frasconi P. Metal-Detector: a web server for predicting metal-binding sites and disulfide bridges in proteins from sequence. Bioinformatics 2008;24:2094-2095. DOI
25	Deng H, Chen G, Yang W, Yang JJ. Predicting calcium-binding sites in proteins: a graph theory and geometry approach. Proteins 2006;64:34-42. DOI
26	Zuo Y, Lv Y, Wei Z, Yang L, Li G, Fan G. iDPF-PseRAAAC: a web-server for identifying the defensin peptide family and subfamily using pseudo reduced amino acid alphabet composition. PLoS One 2015;10:e0145541. DOI
27	Murphy LR, Wallqvist A, Levy RM. Simplified amino acid alphabets for protein fold recognition and implications for folding. Protein Eng 2000;13:149-152. DOI
28	Parisi G, Echave J. Structural constraints and emergence of sequence patterns in protein evolution. Mol Biol Evol 2001;18:750-756. DOI
29	Tainer JA, Roberts VA, Getzoff ED. Metal-binding sites in proteins. Curr Opin Biotechnol 1991;2:582-591. DOI
30	Lu MF, Xie Y, Zhang YJ, Xing XY. Effects of cofactors on conformation transition of random peptides consisting of a reduced amino acid alphabet. Protein Pept Lett 2015;22:579-585. DOI
31	Liu B, Xu J, Lan X, Xu R, Zhou J, Wang X, et al. iDNA-Prot I dis: identifying DNA-binding proteins by incorporating amino acid distance-pairs and reduced alphabet profile into the general pseudo amino acid composition. PLoS One 2014;9:e106691. DOI
32	Feng PM, Chen W, Lin H, Chou KC. iHSP-PseRAAAC: Identifying the heat shock protein families using pseudo reduced amino acid alphabet composition. Anal Biochem 2013;442:118-125. DOI
33	Etchebest C, Benros C, Bornot A, Camproux AC, de Brevern AG. A reduced amino acid alphabet for understanding and designing protein adaptation to mutation. Eur Biophys J 2007;36:1059-1069. DOI
34	Weathers EA, Paulaitis ME, Woolf TB, Hoh JH. Reduced amino acid alphabet is sufficient to accurately recognize intrinsically disordered protein. FEBS Lett 2004;576:348-352. DOI
35	Cannata N, Toppo S, Romualdi C, Valle G. Simplifying amino acid alphabets by means of a branch and bound algorithm and substitution matrices. Bioinformatics 2002;18:1102-1108. DOI
36	Schymkowitz JW, Rousseau F, Martins IC, Ferkinghoff-Borg J, Stricher F, Serrano L. Prediction of water and metal binding sites and their affinities by using the Fold-X force field. Proc Natl Acad Sci U S A 2005;102:10147-10152. DOI
37	The UniProt Consortium. UniProt: the universal protein knowledgebase. Nucleic Acids Res 2017;45:D158-D169. DOI
38	Li W, Godzik A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 2006;22:1658-1659. DOI
39	Suzek BE, Wang Y, Huang H, McGarvey PB, Wu CH; UniProt Consortium. UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches. Bioinformatics 2015;31:926-932. DOI
40	Cohen G, Hilario M, Sax H, Hugonnet S, Geissbuhler A. Learning from imbalanced data in surveillance of nosocomial infection. Artif Intell Med 2006;37:7-18. DOI
41	Rose GD, Geselowitz AR, Lesser GJ, Lee RH, Zehfus MH. Hydrophobicity of amino acid residues in globular proteins. Science 1985;229:834-838. DOI
42	Chakrabarti P, Pal D. The interrelationships of side-chain and main-chain conformations in proteins. Prog Biophys Mol Biol 2001;76:1-102. DOI
43	Zheng C, Wang M, Takemoto K, Akutsu T, Zhang Z, Song J. An integrative computational framework based on a two-step random forest algorithm improves prediction of zinc-binding sites in proteins. PLoS One 2012;7:e49716. DOI