Proceedings of the Korean Information Science Society Conference (한국정보과학회:학술대회논문집)
- 2007.10b
- /
- Pages.22-27
- /
- 2007
- /
- 1598-5164(pISSN)
Prediction of phosphorylation sites using multiple kernel learning
다중 커널 학습을 이용한 단백질의 인산화 부위 예측
- Kim, Jong-Kyoung (Dept. of Computer Science, POSTECH) ;
- Choi, Seung-Jin (Dept. of Computer Science, POSTECH)
- Published : 2007.10.26
Abstract
Phosphorylation is one of the most important post translational modifications which regulate the activity of proteins. The problem of predicting phosphorylation sites is the first step of understanding various biological processes that initiate the actual function of proteins in each signaling pathway. Although many prediction methods using single or multiple features extracted from protein sequences have been proposed, systematic data integration approach has not been applied in order to improve the accuracy of predicting general phosphorylation sites. In this paper, we propose an optimal way of integrating multiple features in the framework of multiple kernel learning. We optimally combine seven kernels extracted from sequence, physico-chemical properties, pairwise alignment, and structural information. Using the data set of Phospho. ELM, the accuracy evaluated by 5-fold cross-validation reaches 85% for serine, 85% for threonine, and 81% for tyrosine. Our computational experiments show significant improvement in the performance of prediction relative to a single feature, or to the combined feature with equal weights. Moreover, our systematic integration method significantly improves the prediction preformance compared with the previous well-known methods.
Keywords