Prediction of Protein Kinase Specific Phosphorylation Sites with Multiple SVMs

  • Lee, Won-Chul (Department of Biosystems., Korea Advanced Institute of Science and Technology (KAIST)) ;
  • Kim, Dong-Sup (Department of Biosystems., Korea Advanced Institute of Science and Technology (KAIST))
  • Published : 2007.03.31

Abstract

The protein phosphorylation is one of the important processes in the cell signaling pathway. A variety of protein kinase families are involved in this process, and each kinase family phosphorylates different kinds of substrate proteins. Many methods to predict the kinase-specific phosphoryrated sites or different types of phosphorylated residues (Serine/Threonine or Tyrosin) have been developed. We employed Supprot Vector Machine (SVM) to attempt the prediction of protein kinase specific phosphorylation sites. 10 different kinds of protein kinase families (PKA, PKC, CK2, CDK, CaM-KII, PKB, MAPK, EGFR) were considered in this study. We defined 9 residues around a phosphorylated residue as a deterministic instance from which protein kinases determine whether they act on. The subsets of PSI-BALST profile was converted to the numerical vectors to represent positive or negative instances. When SVM training, We took advantage of multiple SVMs because of the unbalanced training sets. Representative negative instances were drawn multiple times, and generated new traing sets with the same positive instances in the original traing set. When testing, the final decisions were made by the votes of those multiple SVMs. Generally, RBF kernel was used for the SVMs, and several parameters such as gamma and cost factor were tested. Our approach achieved more than 90% specificity throughout the protein kinase families, while the sensitivities recorded 60% on average.

Keywords