Improved Algorithms for the Identification of Yeast Proteins and Significant Transcription Factor and Motif Analysis

  • Published : 2006.06.01

Abstract

With the rapid development of MS technologiesy, the demands for a more sophisticated MS interpretation algorithm haves grown as well. We have developed a new protein fingerprinting method using a binomial distribution, (fBIND). With the fBIND, we improved the performance accuracy of protein fingerprinting up to the maximum 49% (more than MOWSE) and 2% than(at a previous binomial distribution approach studied by of Wool et al.) as compared to the established algorithms. Moreover, we also suggest a the statistical approach to define the significance of transcription factors and motifs in the identified proteins based on the Gene Ontology (GO). Abbreviations: fBIND, fingerprinting using binomial distribution; GO, Gene Ontology; MS, Mass Spectrometry; PMF, peptide mass fingerprinting; nr, nonredundant; SGD, Saccharomyces Genome Database

Keywords

References

  1. Barsnes, H., Mikalsen, S.O., and Eidhammer, I. (2006). MassSorter: a tool for administrating and analyzing data from mass spectrometry experiments on proteins with known amino acid sequences. BMC bioinformatics 7, 42 https://doi.org/10.1186/1471-2105-7-42
  2. Camon, E., Magrane, M., Barrell, D., Binns, D., Fleischmann, W., Kersey, P., Mulder, N., Oinn, T., Maslen, J., Cox, A., and Apweiler, R. (2003). The Gene Ontology Annotation (GOA) Project: Implementation of GO in SWISS-PROT, TrEMBL, and InterPro. Genome Res. 13, 662-672 https://doi.org/10.1101/gr.461403
  3. Clauser, K.R., Baker, P., and Burlingame, A.L. (1999). Role of accurate mass measurement (${\pm}$ 10 ppm) in protein identification strategies employing MS or MS/MS and database searching. Anal. Chem. 71, 2871-2882 https://doi.org/10.1021/ac9810516
  4. Cutler, P., Heald, G., White, I.R., and Ruan, J. (2003). A novel approach to spot detection for two-dimensional gel electrophoresis images using pixel value collection. Proteomics 3, 392-401 https://doi.org/10.1002/pmic.200390054
  5. Dwight, S.S., Harris, M.A., Dolinski, K., Ball, C.A., Binkley, G., Christie, K.R., Fisk, D.G., Issei-Tarver, L., Schroeder, M., Sherlock, G., Sethuraman, A., Weng, S., Botstein, D., and Cherry, J.M. (2002). Saccharomyces Genome Database (SGD) provides secondary gene annotation using the Gene Ontology (GO). Nucleic Acids Res. 30, 69-72 https://doi.org/10.1093/nar/30.1.69
  6. Fenyo, D. (2000). Identifying the proteome. Curr. Opin. Biotechnol. 11, 391-395 https://doi.org/10.1016/S0958-1669(00)00115-4
  7. Florea, L., Hartzell, G., Zhang, Z., Rubin, G.M., and Miller, W. (1998). A computer program for aligning a cDNA sequence with a genomic DNA sequence, Genome Res. 8, 967-974 https://doi.org/10.1101/gr.8.9.967
  8. Kel, A.E., Gossling, E., Reuter, I., Cheremushkin, E., Kel-Margoulis O.V., and Wingender E. (2003). MATCH: A tool for searching transcription factor binding sites in DNA sequences. Nucleic Acids Res. 31, 3576-3579 https://doi.org/10.1093/nar/gkg585
  9. Mann, M., Hojrup, P., and Roepstorff, P. (1993). Use of mass spectrometric molecular weight information to identify proteins in sequence databases. Biol. Mass Spectrom. 22, 338-344 https://doi.org/10.1002/bms.1200220605
  10. Matys, V., Fricke, E., Getters, R., Gossling, E., Haubrock, M., Hehl, R., Hornischer, K., Karas, D., Kel, A.E., Kel-Margoulis, O.V., Kloos, D.U., Land, S., Lewicki-Potapov, B., Michael, H., Munch, R., Reuter, I., Rotert, S., Saxel, H., Scheer, M., Thiele, S., and Wingender, E. (2003). TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res. 31, 374-378 https://doi.org/10.1093/nar/gkg108
  11. Mulder, N.J., Apweiler, R., Attwood, T.K., Bairoch, A., Barrell, D., Bateman, A., Binns, D., Biswas, M., Bradley, P., Bork, P., Bucher, P., Copley, R.R., Courcelle, E., Das, U., Durbin, R., Falquet, L., Fleischmann, W., Griffiths-Jones, S., Haft, D., Harte, N., Hulo, N., Kahn, D., Kanapin, A., Krestyaninova, M., Lopez, R., Letunic, I., Lonsdale, D., Silventoinen, V., Orchard, S.E., Pagni, M., Peyruc, D., Ponting, C.P., Selengut, J.D., Servant, F., Sigrist, C.J., Vaughan, R., and Zdobnov, E.M. (2003). The InterPro Database, 2003 brings increased coverage and new features. Nucleic Acids Res. 31, 315-318 https://doi.org/10.1093/nar/gkg046
  12. Pappin, D.J., Hojrup, P., and Bleasby, A.J. (1993). Rapid identification of proteins by peptide-mass finger printing. Curr. Biol. 3, 327-332 https://doi.org/10.1016/0960-9822(93)90195-T
  13. Perkins, D.N., Pappin, D.J., Creasy, D.M., and Cottrell, J.S. (1999). Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20, 3551-3567 https://doi.org/10.1002/(SICI)1522-2683(19991201)20:18<3551::AID-ELPS3551>3.0.CO;2-2
  14. Rogers, M., Graham, J., and Tonge, R.P. (2003). Statistical Moddels of Shape for the Analysis of Protein Spots in 2-D Electrophoresis Gel Images. Proteomics 3, 879-886 https://doi.org/10.1002/pmic.200300420
  15. Wilkins, M.R., Gasteiger, E., Bairoch, A., Sanchez, J.C., Willianms, K.L., Appel, R.D., and Hochstrasser, D.F. (1999). Protein identification and analysis tools in the ExPASy server. Methods Mol. Biol. 112, 531-552
  16. Wilkins, M.R., Gasteiger, E., Wheeler, C.H., Lindskog, I., Sanchez, J., Bairoch, A., Appel, R.D., Dunn, M.J., and Hochstrasser D.F. (1998). Multiple parameter cross-species protein identification using Multildent-a world-wide web accessible tool. Electrophoresis 19, 3199-3206 https://doi.org/10.1002/elps.1150191824
  17. Wool, A. and Smilansky, Z. (2002). Precalibration of matrix-assisted laser desorption/ionization-time of flight spectra for peptide mass fingerprinting. Proteomics 2, 1365-1373 https://doi.org/10.1002/1615-9861(200210)2:10<1365::AID-PROT1365>3.0.CO;2-9
  18. Zhang, W. and Chait, B.T. (2000). ProFound- an expert system for protein identification using mass spectrometric peptide mapping information. Anal. Chem. 72, 2482-2489 https://doi.org/10.1021/ac991363o
  19. Zhu, J. and Zhang, M.Q. (1999). SCPD: a promoter database of the yeast Saccharomyces cerevisiae. Bioinformatics 15, 607-611 https://doi.org/10.1093/bioinformatics/15.7.607