Theoretical Peptide Mass Distribution in the Non-Redundant Protein Database of the NCBI

  • Lim Da-Jeong (Department of Food and Animal Biotechnology, Seoul National University) ;
  • Oh Hee-Seok (Department of Statistics, Seoul National University) ;
  • Kim Hee-Bal (Department of Food and Animal Biotechnology, Seoul National University)
  • Published : 2006.06.01

Abstract

Peptide mass mapping is the matching of experimentally generated peptides masses with the predicted masses of digested proteins contained in a database. To identify proteins by matching their constituent fragment masses to the theoretical peptide masses generated from a protein database, the peptide mass fingerprinting technique is used for the protein identification. Thus, it is important to know the theoretical mass distribution of the database. However, few researches have reported the peptide mass distribution of a database. We analyzed the peptide mass distribution of non-redundant protein sequence database in the NCBI after digestion with 15 different types of enzymes. In order to characterize the peptide mass distribution with different digestion enzymes, a power law distribution (Zipfs law) was applied to the distribution. After constructing simulated digestion of a protein database, rank-frequency plot of peptide fragments was applied to generalize a Zipfs law curve for all enzymes. As a result, our data appear to fit Zipfs law with statistically significant parameter values.

Keywords

References

  1. Blackstock, W.P. and Weir, M.P. (1999). Proteomics: quantitative and physical mapping of cellular proteins. Trends Biotechnol. 17, 121-127 https://doi.org/10.1016/S0167-7799(98)01245-1
  2. Furusawa, C. and Kaneko, K. (2003). Zipf's law in gene expression. Phys. Rev. Lett. 90, 88-102
  3. Henzel, W.J., Billeci, T.M., Stults, J.T., Wong, S.C., Grimley, C., and Watanabe, C. (1993). Identifying proteins from two-dimensional gels by molecular mass searching of peptide fragments in protein sequence databases. Proc. Natl. Acad. Sci. USA. 90, 5011-5015
  4. Kalda, J., Sakki, M., Vainu, M., and Laan, M. (2001). Zipf's law in human heartbeat dynamics. Physics, 1-4
  5. Lu, T., Costello, C.M., Croucher, P.J., Hasler, R., Deuschl, G., and Schreiber, S. (2005). Can Zipf's law be adapted to normalize microarrays? BMC Bioinformatics 6, 37 https://doi.org/10.1186/1471-2105-6-37
  6. Luhn, H.P. (1957). A statistical approach to mechanized encoding and search of literature information. IBM J. Res. Develop. 2, 159-165 https://doi.org/10.1147/rd.22.0159
  7. Luscombe, N.M., Qian, J., Zhang, Z., Johnson, T., and Gerstein, M. (2002). The dominance of the population by a selected few: power-law behaviour applies to a wide variety of genomic properties. Genome Biol. 3, RESEARCH 0040.1-0040.7
  8. Manning, C.D. and Schutze, H. (1999). Statistical natural Language processing. (Cambridge: MIT Press)
  9. Mantegna, R.N., Buldyrev, S.V., Goldberger, A.L., Havlin, S., Peng, C.K., Simons, M., and Stanley, H.E. (1994). Linguistic Features of Noncoding DNA Sequences. Phys. Rev. Lett. 73, 3179-3172 https://doi.org/10.1103/PhysRevLett.73.3179
  10. Moore, D.S. and McCabe, G.P. (2003). Introduction to the practice of statistics W.H. Freeman, ed.(New York)
  11. O'Farrell, P.H. (1975). High resolution two-dimensional electrophoresis of proteins. J. Biol. Chem. 250, 4007-4021
  12. Wuchty, S. (2001). Scale-free behavior in protein domain networks. Mol. Biol. Evol. 18, 1694-1702 https://doi.org/10.1093/oxfordjournals.molbev.a003957
  13. Zipf, G.K. (1949). Human behavior and the principle of least effort: an introduction to human ecology. (Cambridge: Addison-Wesley Press)