Estimating Amino Acid Composition of Protein Sequences Using Position-Dependent Similarity Spectrum

위치 종속 유사도 스펙트럼을 이용한 단백질 서열의 아미노산 조성 추정

  • 지상문 (경성대학교 컴퓨터과학과)
  • Published : 2010.01.15

Abstract

The amino acid composition of a protein provides basic information for solving many problems in bioinformatics. We propose a new method that uses biologically relevant similarity between amino acids to determine the amino acid composition, where the BOLOSUM matrix is exploited to define a similarity measure between amino acids. Futhermore, to extract more information from a protein sequence than conventional methods for determining amino acid composition, we exploit the concepts of spectral analysis of signals such as radar and speech signals-the concepts of time-dependent analysis, time resolution, and frequency resolution. The proposed method was applied to predict subcellular localization of proteins, and showed significantly improved performance over previous methods for amino acid composition estimation.

단백질의 아미노산 조성은 생물정보학의 여러 문제를 해결하기 위한 기초적인 정보로 자주 활용된다. 본 논문에서는 아미노산간의 진화적인 연관성을 정의한 BLOSUM 행렬에서 유도한 유사도 함수를 사용하여 아미노산 조성을 결정한다. 이러한 방법은 생물학적인 연관성이 있는 단백질 서열일수록 비슷한 아미노산 조성을 갖도록 한다. 또한 단백질의 구조와 기능에 중요한 역할을 하는 위치-특이적인 아미노산의 분포를 추정하기 위해서 레이더나 음성 신호의 스펙트럼 분석에 사용되는 개념인 시간-종속 분석, 시간 해상도와 주파수 해상도의 개념을 적용하였다. 제안한 방법을 단백질의 세포내 위치예측에 적용하여 기존의 아미노산 조성 추정 방법을 사용하는 것보다 크게 향상된 성능을 보임을 확인하였다.

Keywords

References

  1. M. A. Andrade, S. I. O'Donoghue, and B. Rost, "Adaption of protein surfaces to subcellular location," J. Mol. Biol., 276, pp.517-525, 1998. https://doi.org/10.1006/jmbi.1997.1498
  2. M. Paetzel, A. Karla, N. C. Strynadka, and R. E. Dalbey, "Signal peptidases," Chem. Rev., 102, pp. 4549-4580, 2002. https://doi.org/10.1021/cr010166y
  3. V. Goder, and M. Spiess, "Molecular mechanism of signal sequence orientation in the endoplasmic reticulum," The EMBO Journal., 22, pp.3645-3653, 2003. https://doi.org/10.1093/emboj/cdg361
  4. E. Granseth, G. von Heijne, and A. Elofsson, "A study of the membrane-water interface region of membrane proteins," J. Mol. Biol., 346, pp.377-385, 2005. https://doi.org/10.1016/j.jmb.2004.11.036
  5. K.-J. Park, and M. Kanehisa, "Prediction of protein subcellular location by support vector machines using compositions of amino acids and amino acid pairs," Bioinformatics, 19, pp.1656-1663, 2003. https://doi.org/10.1093/bioinformatics/btg222
  6. A. Hoglund, P. Donnes, T, Blum, H.-W. Adolph, and O. Kohlbacher, "Multiloc: prediction of protein localization using n-terminal targeting sequences, sequence motifs and amino acid compositions," Bioinformatics, 22, pp.1158-1165, 2006. https://doi.org/10.1093/bioinformatics/btl002
  7. W.-W. Yang, B.-L. Lu, and Y. Yang, "A comparative study on feature extraction from protein sequences for subcellular localization prediction," IEEE Symposium on CIBCB, pp.201-208, Toronto, Canada, 2006.
  8. H. Shatkay, A. Hoglund, S. Brady, T. Blum, P. Donnes, and O. Kohlbacher, "Sherloc: high-accuracy prediction of protein subcellular localization by integrating text and protein sequence data," Bioinformatics, 23, pp.1410-1417. 2007. https://doi.org/10.1093/bioinformatics/btm115
  9. S. Henikoff, and J. G. Henikoff, "Amino acid substitution matrices from protein blocks," proc. natl. acad. sci., 89, pp.11915-11919, 1992. https://doi.org/10.1073/pnas.89.24.11915
  10. S. Matsuda, J.-P. Vert, H. Saigo, N. Ueda, H. Toh, and T. Akutsu, "A novel representation of protein sequences for prediction of subcellular location using support vector machines," Protein Sci., 14(11), pp.2804-2813, 2005. https://doi.org/10.1110/ps.051597405
  11. A. V. Oppenheim, and R. W. Schafer, Discrete-time signal processing. Prentice-Hall, New Jersey, 1989.
  12. K. Gupta, D. Thomas, S. Vidya, K. Venkatesh, and S. Ramakumar, "Detailed protein sequence alignment based on Spectral Similarity Score (SSS)," BMC Bioinformatics, 6(105), 2005.
  13. V. Vapnik, Statistical learning theory, John Wiley & Sons, 1998.
  14. C.-C. Chang, and C.-J. Lin, LIBSVM : a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/-cjlin/libsvm