Browse > Article

Signal Peptide Cleavage Site Prediction Using a String Kernel with Real Exponent Metric  

Chi, Sang-Mun (경상대학교 컴퓨터과학과)
Abstract
A kernel in support vector machines can be described as a similarity measure between data, and this measure is used to find an optimal hyperplane that classifies patterns. It is therefore important to effectively incorporate the characteristics of data into the similarity measure. To find an optimal similarity between amino acid sequences, we propose a real exponent exponential form of the two metrices, which are derived from the evolutionary relationships of amino acids and the hydrophobicity of amino acids. We prove that the proposed metric satisfies the conditions to be a metric, and we find a relation between the proposed metric and the metrics in the string kernels which are widely used for the processing of amino acid sequences and DNA sequences. In the prediction experiments on the cleavage site of the signal peptide, the optimal metric can be found in the proposed metrics.
Keywords
Support Vector Machines; Metric; String Kernels; Signal Peptide Cleavage Site;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Zavaljevski, N., Stevens, F. J. and Reifman, J., 'Support vector machines with selective kernel scaling for protein classification and identification of key amino acid positions,' Bioinformatics, 18, pp.689-696, 2002   DOI   ScienceOn
2 Paetzel, M., Karla, A., Strynadka, N.C. and Dalbey, R.E., 'Signal peptidases,' Chem. Rev., 102, pp.4549-4580, 2002   DOI   ScienceOn
3 Chang, C-C. and Lin, C-J., LIBSVM : a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
4 Leslie, C., Eskin, E., Cohen, A., Weston, J. and Noble, W. S., 'Mismatch string kernels for discriminative protein classification,' Bioinformatics, 20, pp.467-476, 2004   DOI   ScienceOn
5 Kall, L., Krogh, A., Sonnhammer, E.,L.,L., 'A combined transmembrane topology and signal peptide prediction method,' J. Mol. Biol., 338, pp. 1027-1036, 2004   DOI   ScienceOn
6 Jaakkola, T., Diekhans, M., Haussler, D., 'A discriminative framework for detecting remote protein homologies,' J. Comp. Biol., 7, pp.95-114, 2000   DOI   ScienceOn
7 Pavlidis, P., Weston, J., Cai, J. and Noble, W. S., 'Learning gene functional classifications from multiple data types,' J. Comp. Biol., 9, pp. 401-411, 2002   DOI   ScienceOn
8 Leslie, C., Eskin, E. and Noble, W. S., 'The spectrum kernel: A string kernel for SVM protein classification,' proc. pacific symposium on biocomputing, pp.566-575, 2002
9 Menne, K.M., Hermjakob, H., Apweiler, R., 'A comparison of signal sequence prediction methods using a test set of signal peptides,' Bioinformatics, 16, pp.741-742, 2000   DOI   ScienceOn
10 Hua, S., Sun, Z., 'Support vector machine approach for protein subcellular localization prediction,' Bioinformatics, 17, pp.721-728, 2001   DOI   ScienceOn
11 Vert, J.P., 'Support vector machine prediction of signal peptide cleavage site using a new class of kernels for strings,' proc. pacific symposium on biocomputing, pp.649-660, 2002
12 Choo, KH, Tan TW and Ranganathan, S., 'SPdb- a signal peptide database,' BMC Bioinformatics, 6:249, 2005   DOI   ScienceOn
13 Saigo, H. Vert, J.-P., Akutsu, T. and Ueda, N., 'Protein homology detection using string alignment kernels,' Bioinformatics, 20, pp. 1682-1689, 2004   DOI   ScienceOn
14 Cortes, C., Vapnik, V., 'Support-vector network,' Machine learning, 20, pp.273-297, 1995
15 Engelman, D.M., Steitz, T.A., Goldman, A., 'Identifying nonpolar transbilayer helices in amino acid sequences of membrane proteins,' Annu. Rev. Biophys. Biophys. Chem., 15, pp.321-353, 1986   DOI   PUBMED
16 Vapnik, V., Statistical learning theory, John Wiley & Sons, 1998
17 Sonnenburg, S., Schweikert, G., Philips, P., Behr, J. and Ratsch, G., 'Accurate splice site prediction using support vector machines,' BMC Bioinformatics, 8(Suppl 10):S7, 2007   DOI   PUBMED   ScienceOn
18 Bendtsen, J.,D., Nielsen, H., von Heijne, G., Brunak, S., 'Improved prediction of signal peptides: SignalP 3.0,' J. Mol. Biol., 340, pp.783-795, 2004   DOI   ScienceOn
19 Kim, J.K., Bang, S.Y., and Choi, S., 'Sequence driven features for prediction of subcellular localization of proteins' Pattern Recognition, 39(12), pp.2301-2311, 2006   DOI   ScienceOn
20 Henikoff, S., Henikoff, J.G., 'Amino acid substitution matrices from protein blocks,' proc. natl. acad. sci., 89, pp.11915-11919, 1992   DOI   ScienceOn
21 Boser, B., Guyon, I., Vapnik, V., 'A training algorithm for optimal margin classifiers,' proc. workshop, computational learning theory, pp.144-152, 1992   DOI
22 Vapnik, V., Statistical learning theory, John Wiley & Sons, 1998
23 Kreyszig, E.. Introductory Functional Analysis with Applications, John Wiley & Sons, New York, 1978