Prediction of subcellular localization of proteins using pairwise sequence alignment and support vector machine

  • Kim, Jong-Kyoung (Department of Computer Science and Engineering, Pohang University of Science and Technology) ;
  • Raghava, G. P. S. (Bioinformatics Centre, Institute of Microbial Technology) ;
  • Kim, Kwang-S. (National Creative Research In itiative Center of Superfunctional Materials, Department of Chemistry, Division of Molecular and Life Sciences, Pohang University of Science and Technology) ;
  • Bang, Sung-Yang (Department of Computer Science and Engineering, Pohang University of Science and Technology) ;
  • Choi, Seung-Jin (Department of Computer Science and Engineering, Pohang University of Science and Technology)
  • Published : 2004.11.04

Abstract

Predicting the destination of a protein in a cell gives valuable information for annotating the function of the protein. Recent technological breakthroughs have led us to develop more accurate methods for predicting the subcellular localization of proteins. The most important factor in determining the accuracy of these methods, is a way of extracting useful features from protein sequences. We propose a new method for extracting appropriate features only from the sequence data by computing pairwise sequence alignment scores. As a classifier, support vector machine (SVM) is used. The overall prediction accuracy evaluated by the jackknife validation technique reach 94.70% for the eukaryotic non-plant data set and 92.10% for the eukaryotic plant data set, which show the highest prediction accuracy among methods reported so far with such data sets. Our numerical experimental results confirm that our feature extraction method based on pairwise sequence alignment, is useful for this classification problem.

Keywords