Browse > Article
http://dx.doi.org/10.3745/KIPSTD.2003.10D.2.261

A Protein Sequence Prediction Method by Mining Sequence Data  

Cho, Sun-I (전남대학교 대학원 전산통계학과)
Lee, Do-Heon (한국과학기술원 바이오시스템학과)
Cho, Kwang-Hwi (숭실대학교 생명정보학과)
Won, Yong-Gwan (전남대학교 전자컴퓨터정보통신공학부)
Kim, Byoung-Ki (전남대학교 전산학과)
Abstract
A protein, which is a linear polymer of amino acids, is one of the most important bio-molecules composing biological structures and regulating bio-chemical reactions. Since the characteristics and functions of proteins are determined by their amino acid sequences in principle, protein sequence determination is the starting point of protein function study. This paper proposes a protein sequence prediction method based on data mining techniques, which can overcome the limitation of previous bio-chemical sequencing methods. After applying multiple proteases to acquire overlapped protein fragments, we can identify candidate fragment sequences by comparing fragment mass values with peptide databases. We propose a method to construct multi-partite graph and search maximal paths to determine the protein sequence by assembling proper candidate sequences. In addition, experimental results based on the SWISS-PROT database showing the validity of the proposed method is presented.
Keywords
Protein Identification; Mass Spectrometry; Protein Sequence Prediction; Multi-partite Graph;
Citations & Related Records
연도 인용수 순위
  • Reference
1 M. Mann and M. Wilm, 'Error-Tolerant Identification of Peptides in Sequence Data-bases by Peptide Sequence Tags,' Anal. Chem, 66, pp.4390-4399, 1994   DOI   ScienceOn
2 Ting Chen, 'Gene-Finding via Tandem Mass Spectrometry,' The ACM-SIGACT Fifth Annual International Conference on Computational Moledular Biology (RECOMBOl), pp. 85-92, 2001   DOI
3 Daniel C. Liebler, 'Introduction to Proteomics,' Humana Press, 2001
4 Daniel H. Huson et al., 'The Greedy Path-Merging Algorithm for Sequence Assembly,' RECOMB, pp.157-163, 2001   DOI
5 Pavel A. Pevzner and Haixu Tang, 'Fragment assembly with double-barreled data,' Bioinformatics, 17, pp.225S-233S, 2001   DOI   ScienceOn
6 Needleman, S. B. and Wunsch, C. D., 'A general method applicable to the search for similarities in the amino acid sequence of two proteins,' J, Mol. Bilo, 48, pp.443-453, 1970   DOI
7 John M. Ward, 'Identification of Novel Families of Membrane Proteins from the Model Plant Arabidopsis Thaliana,' Bioinformatics, 17, pp.560-563, 2001   DOI   ScienceOn
8 A. Shevchenko et aI., 'Linking Genome and Proteome by Mass Spectrometry: Large-Scale Identification of Yeast Proteins from Two Dimensional Gels,' Proc. Nat'I Acad. Sci, 93, pp.14440-14445, 1996   DOI
9 D. N. Perkins et al., 'Probability-Based Protein Identification by Searching Sequence Databases Using Mass Spectrometry Data,' Electrophoresis, 20, pp.3551-3567, 1999   DOI   ScienceOn
10 M. Wilm et aI., 'Femtornole Sequencing of Proteins from Polyacrylamide Gels by Nano-Electrospray Mass Spectrometry,' Nature, 379, pp.466-469, 1996   DOI   ScienceOn
11 Edmon de Hlffmann, 'Tandem Mass Spectrometry : a Primer,' Journal of mass spectrometry, Vo1.31, pp.I29-137, 1996   DOI
12 G. Neubauer et al., 'Mass Spectrometry and EST-Database Searching Allows Characterization of the MultiProtein Spliceosome Complex,' Nature Genetics, 20, pp. 46-50, 1998   DOI   ScienceOn
13 R. M. Idury and M. S. Waterman. 'A New Algorithm for DNA sequence assembly,' Journal of Computational Biology, 2, pp.291-306, 1995   DOI   ScienceOn
14 Andrew A. et al., 'A role for Edman degradation in proteome studies,' Electrophoresis, 18, pp.1068-72, 1997   DOI   ScienceOn
15 Gusfield, D., 'Algorithms on Strings, Trees, and Sequences,' Cambridge University Press, 1997