[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.3745/KIPSTD.2003.10D.2.261

A Protein Sequence Prediction Method by Mining Sequence Data

Cho, Sun-I (전남대학교 대학원 전산통계학과)
Lee, Do-Heon (한국과학기술원 바이오시스템학과)
Cho, Kwang-Hwi (숭실대학교 생명정보학과)
Won, Yong-Gwan (전남대학교 전자컴퓨터정보통신공학부)
Kim, Byoung-Ki (전남대학교 전산학과)

Publication Information

The KIPS Transactions:PartD / v.10D, no.2, 2003 , pp. 261-266 More about this Journal

Abstract

A protein, which is a linear polymer of amino acids, is one of the most important bio-molecules composing biological structures and regulating bio-chemical reactions. Since the characteristics and functions of proteins are determined by their amino acid sequences in principle, protein sequence determination is the starting point of protein function study. This paper proposes a protein sequence prediction method based on data mining techniques, which can overcome the limitation of previous bio-chemical sequencing methods. After applying multiple proteases to acquire overlapped protein fragments, we can identify candidate fragment sequences by comparing fragment mass values with peptide databases. We propose a method to construct multi-partite graph and search maximal paths to determine the protein sequence by assembling proper candidate sequences. In addition, experimental results based on the SWISS-PROT database showing the validity of the proposed method is presented.

Keywords

Protein Identification; Mass Spectrometry; Protein Sequence Prediction; Multi-partite Graph;

Citations & Related Records

Reference

1	M. Mann and M. Wilm, 'Error-Tolerant Identification of Peptides in Sequence Data-bases by Peptide Sequence Tags,' Anal. Chem, 66, pp.4390-4399, 1994 DOI ScienceOn
2	Ting Chen, 'Gene-Finding via Tandem Mass Spectrometry,' The ACM-SIGACT Fifth Annual International Conference on Computational Moledular Biology (RECOMBOl), pp. 85-92, 2001 DOI
3	Daniel C. Liebler, 'Introduction to Proteomics,' Humana Press, 2001
4	Daniel H. Huson et al., 'The Greedy Path-Merging Algorithm for Sequence Assembly,' RECOMB, pp.157-163, 2001 DOI
5	Pavel A. Pevzner and Haixu Tang, 'Fragment assembly with double-barreled data,' Bioinformatics, 17, pp.225S-233S, 2001 DOI ScienceOn
6	Needleman, S. B. and Wunsch, C. D., 'A general method applicable to the search for similarities in the amino acid sequence of two proteins,' J, Mol. Bilo, 48, pp.443-453, 1970 DOI
7	John M. Ward, 'Identification of Novel Families of Membrane Proteins from the Model Plant Arabidopsis Thaliana,' Bioinformatics, 17, pp.560-563, 2001 DOI ScienceOn
8	A. Shevchenko et aI., 'Linking Genome and Proteome by Mass Spectrometry: Large-Scale Identification of Yeast Proteins from Two Dimensional Gels,' Proc. Nat'I Acad. Sci, 93, pp.14440-14445, 1996 DOI
9	D. N. Perkins et al., 'Probability-Based Protein Identification by Searching Sequence Databases Using Mass Spectrometry Data,' Electrophoresis, 20, pp.3551-3567, 1999 DOI ScienceOn
10	M. Wilm et aI., 'Femtornole Sequencing of Proteins from Polyacrylamide Gels by Nano-Electrospray Mass Spectrometry,' Nature, 379, pp.466-469, 1996 DOI ScienceOn
11	Edmon de Hlffmann, 'Tandem Mass Spectrometry : a Primer,' Journal of mass spectrometry, Vo1.31, pp.I29-137, 1996 DOI
12	G. Neubauer et al., 'Mass Spectrometry and EST-Database Searching Allows Characterization of the MultiProtein Spliceosome Complex,' Nature Genetics, 20, pp. 46-50, 1998 DOI ScienceOn
13	R. M. Idury and M. S. Waterman. 'A New Algorithm for DNA sequence assembly,' Journal of Computational Biology, 2, pp.291-306, 1995 DOI ScienceOn
14	Andrew A. et al., 'A role for Edman degradation in proteome studies,' Electrophoresis, 18, pp.1068-72, 1997 DOI ScienceOn
15	Gusfield, D., 'Algorithms on Strings, Trees, and Sequences,' Cambridge University Press, 1997

KSCI

A Protein Sequence Prediction Method by Mining Sequence Data 서열 데이타마이닝을 통한 단백질 서열 예측기법

A Protein Sequence Prediction Method by Mining Sequence Data