Signal Sequence Prediction Based on Hydrophobicity and Substitution Matrix

소수성과 치환행렬에 기반한 신호서열 예측

  • 지상문 (경성대학교 컴퓨터과학과)
  • Published : 2007.07.15

Abstract

This paper proposes a method that discriminates signal peptide and predicts the cleavage site of the secretory proteins cleaved by the signal peptidase I. The preprocessing stage uses hydrophobicity scales of amino acids in order to predict the presence of signal sequence and the cleavage site. The preprocessing enhances the performance of the prediction method by eliminating the non-secretory proteins in the early stage of prediction. for the effective use of support vector machine for the signal sequence prediction, the biologically relevant distance between the amino acid sequences is defined by using the hydrophobicity and substitution matrix; the hydrophobicity can be used to Predict the location of amino acid in a cell and the substitution matrix represents the evolutionary relationships of amino acids. The proposed method showed 98.9% discrimination rates from signal sequences and 88% correct rate of the cleavage site prediction on Swiss-Prot release 50 protein database using the 5-fold-cross-validation. In the comparison tests, the proposed method has performed significantly better than other prediction methods.

본 논문에서는 미지의 아미노산 서열이 신호 펩티다제 I에 의해 절단되는 분비성 단백질인지를 판별하고, 분비성 단백질일 경우에는 절단 위치를 예측하는 방법을 제안한다. 아미노산의 소수성을 이용한 전처리를 수행하여 분비성 단백질의 선도서열인 신호서열의 존재와 절단 위치를 추정한다. 전처리를 통해서 신호서열 아닌 서열을 초기에 제외함으로써 신호서열 예측의 정확도를 높인다. 지지벡터기계를 신호서열의 예측에 효과적으로 적용하기 위해서, 생물학적 정보와 관련된 아미노산 서열간의 거리를 제안한다. 아미노산의 세포내 위치를 예측할 수 있는 소수성 척도와 아미노산의 진화적인 관계를 나타낼 수 있는 치환행렬을 이용하여 아미노산 서열간의 거리를 정의한다. Swiss-Prot release 50 단백질 자료에 대하여 교차타당성 기법을 사용하여 실험한 결과 제안한 방법은 신호서열중에 98.9%를 신호서열로 판별하였고, 88%의 절단위치 예측정확도를 보였다. 기존의 방법과의 비교실험을 통해서 제안한 방법이 신호서열의 예측에 더욱 효과적임을 확인하였다.

Keywords

References

  1. von Heijne, G., 'A new method for predicting signal sequence cleavage sites,' Nucl. Acids Res., 14, pp. 4683-4690, 1986 https://doi.org/10.1093/nar/14.11.4683
  2. von Heijne, G., 'Sequence analysis in Molecular biology: Treasure trove or trivial pursuit,' Academic, pp. 429-436, 1987
  3. McGeoch, D.,J., 'On the predictive recognition of signal peptide sequence,' Virus Res., 3, pp. 271-286, 1985 https://doi.org/10.1016/0168-1702(85)90051-6
  4. Nakai, K., Horton, P., 'PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization,' Trends Biochem. Sci., 24, pp. 34-36, 1999 https://doi.org/10.1016/S0968-0004(98)01336-X
  5. Bendtsen, J.,D., Nielsen, H., von Heijne, G., Brunak, S., 'Improved prediction of signal peptides: SignalP 3.0,' J. Mol. Biol., 340, pp. 783-795, 2004 https://doi.org/10.1016/j.jmb.2004.05.028
  6. Hua, S., Sun, Z., 'Support vector machine approach for protein subcellular localization prediction,' Bioinformatics, 17, pp. 721-728, 2001 https://doi.org/10.1093/bioinformatics/17.8.721
  7. Vert, J.P., 'Support vector machine prediction of signal peptide cleavage site using a new class of kernels for strings,' proc. pacific symposium on biocomputing, pp. 649-660, 2002
  8. Menne, K.M., Hermjakob, H., Apweiler, R., 'A comparison of signal sequence prediction methods using a test set of signal peptides,' Bioinformatics, 16, pp. 741-742, 2000 https://doi.org/10.1093/bioinformatics/16.8.741
  9. Paetzel, M., Karla, A., Strynadka, N.C. and Dalbey, R.E., 'Signal peptidases,' Chem. Rev. 102, pp. 4549-4580, 2002 https://doi.org/10.1021/cr010166y
  10. Kall, L., Krogh, A., Sonnhammer, E.,L.,L., 'A combined transmembrane topology and signal peptide prediction method,' J. Mol. Biol., 338, pp. 1027-1036, 2004 https://doi.org/10.1016/j.jmb.2004.03.016
  11. http://www.expasy.org/sprot/download.html
  12. Engelman, D.M., Steitz, T.A., Goldman, A., 'Identifying nonpolar transbilayer helices in amino acid sequences of membrane proteins,' Annu. Rev. Biophys. Biophys. Chem., 15, 321-353, 1986 https://doi.org/10.1146/annurev.bb.15.060186.001541
  13. Kyte, J., Doolittle, R.F., 'A simple method for displaying the hydropathic character of a protein,' J. Mol. Biol., 157, pp. 105-132, 1982 https://doi.org/10.1016/0022-2836(82)90515-0
  14. Oppenheim, A.V., Schafer, R.W,. Discrete-time signal processing, Prentice-Hall, New Jersey, 1989
  15. Henikoff, S., Henikoff, J.G., 'Amino acid substitution matrices from protein blocks,' proc. natl. acad. sci., 89, pp. 11915-11919, 1992
  16. Boser, B., Guyon, I., Vapnik, V., 'A training algorithm for optimal margin classifiers,' proc. workshop, computational learning theory, pp. 144-152, 1992
  17. Cortes, C., Vapnik, V., 'Support-vector network,' Machine learning, 20, pp. 273-297, 1995
  18. Vapnik, V., The nature of statistical learning theory, Springer-Verlag, New York, NY, 1995
  19. Ruping, S., mySVM-Manual, http://www-ai.cs.uni-dortmund.de/SOFTWARE/MYSVM/