Browse > Article

An Effective Data Analysis System for Improving Throughput of Shotgun Proteomic Data based on Machine Learning  

Na, Seung-Jin (서울시립대학교 기계정보공학과)
Paek, Eun-Ok (서울시립대학교 기계정보공학과)
Abstract
In proteomics, recent advancements In mass spectrometry technology and in protein extraction and separation technology made high-throughput analysis possible. This leads to thousands to hundreds of thousands of MS/MS spectra per single LC-MS/MS experiment. Such a large amount of data creates significant computational challenges and therefore effective data analysis methods that make efficient use of computational resources and, at the same time, provide more peptide identifications are in great need. Here, SIFTER system is designed to avoid inefficient processing of shotgun proteomic data. SIFTER provides software tools that can improve throughput of mass spectrometry-based peptide identification by filtering out poor-quality tandem mass spectra and estimating a Peptide charge state prior to applying analysis algorithms. SIFTER tools characterize and assess spectral features and thus significantly reduce the computation time and false positive rates by localizing spectra that lead to wrong identification prior to full-blown analysis. SIFTER enables fast and in-depth interpretation of tandem mass spectra.
Keywords
Proteomics; tandem mass spectra; peptide identification; spectral qualify; charge state determination;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Mann, M. and Wilm, M. Error-Tolerant Identification of Peptides in Sequence Databases by Peptide Sequence Tags,' Anal. Chem., 66, 4390-4399, 1994   DOI   ScienceOn
2 Tabb, D. L., Saraf, A., and Yates, J. R. 'Guten Tag: high-throughput sequence tagging via an empirically derived fragmentation model,' Anal. Chem., 75, 6415-6421, 2003   DOI   ScienceOn
3 Moore, R. E., Young, M. K. and Lee, T. D. 'Method for Screening Peptide Fragment Ion Mass Spectra Prior to Database Searching,' J. Am. Soc. Mass Spectrom., 11, 422-426, 2000   DOI   ScienceOn
4 Purvine, S., Kolker, N. and Kolker, E., 'Spectral Quality Assessment for High-Throughput Tandem Mass Spectrometry Proteomics,' OMICS, 8, 255-265, 2004   DOI   ScienceOn
5 Klammer, A. A., Wu, C. C., MacCoss, M. J. and Noble, W. S., 'Peptide charge state determination for low-resolution tandem mass spectra,' Proceedings of the Computational Systems Bioinformatics Conference, Stanford, CA., August 8-11, pp 175-185, 2005
6 Schnapp, L. M., Donohoe, S., Chen, J., Sunde, D. A., Kelly, P. M., Ruzinski, J., Martin, T. and Goodlett, D. R., 'Mining the Acute Respiratory Distress Syndrome Proteome: Identification of the Insulin-Like Growth Factor (IGF)/IGF-Binding Protein-3 Pathway in Acute Lung Injury,' Am. J. Pathol., 169, 86-95, 2006   DOI   ScienceOn
7 Eng, J. K., McCormack, A. L., and Yates, J. R. 'An approach to correlate tandem mass-spectral data of peptides with amino-acid-sequences in a protein database,' J. Am. Soc. Mass Spectrom., 5, 976-989, 1994   DOI   ScienceOn
8 Resing, K. A., Meyer-Arendt, K., Mendoza, A. M., Aveline-Wolf, L. D., Jonscher, K. R., Pierce, K. G., Old, W. M., Cheung, H. T., Russell, S., Wattawa, J. L., Goehle, G. R., Knight, R. D. and Ahn, N. G., 'Improving Reproducibility and Sensitivity in Identifying Human Proteins by Shotgun Proteomics,' Anal. Chem., 76, 3556-3568, 2004   DOI   ScienceOn
9 Aebersold, R. and Mann, M., 'Mass spectrometrybased proteomics,' Nature, 422, 198-207, 2003   DOI   ScienceOn
10 Hogan, J. M., Higdon, R., Kolker, N. and Kolker, E., 'Charge State Estimation for Tandem Mass Spectrometry Proteomics,' OMICS, 9, 233-250, 2005   DOI   ScienceOn
11 Savitski, M. M., Nielsen, M. L. and Zubarev, R. A., 'New Data Base-independent, Sequence Tagbased Scoring of Peptide MS/MS Data Validates Mowse Scores, Recovers Below Threshold Data, Singles Out Modified Peptides, and Assesses the Quality of MS/MS Techniques,' Mol. Cell. Proteomics, 4, 1180-1188, 2005   DOI   ScienceOn
12 Steen, H. and Mann, M., 'THE ABC'S (AND XYZ'S) OF PEPTIDE SEQUENCING,' Nat. Rev. Mol. Cell Biol., 5, 699-711, 2004   DOI   ScienceOn
13 Bern, M., Goldberg, D., McDonald, W. H. and Yates, J. R., III., 'Automatic Quality Assessment of Peptide Tandem Mass Spectra,' Bioinformatics, 20, i49-i54, 2004   DOI
14 Kim, S., Na, S., Sim, J. W., Park, H., Jeong, J., Kim, H., Seo, Y., Seo, J., Lee, K. J., Paek, E. 'Modi : a powerful and convenient web server for identifying multiple post-translational peptide modifications from tandem mass spectra,' Nuc. Acids Res., 34, W258-W263, 2006   DOI   ScienceOn
15 Keller, A., Purvine, S., Nesvizhskii, A. I., Stolyar, S., Goodlett, D. R. and Kolker, E., 'Experimental Protein Mixture for Validating Tandem Mass Spectral Analysis,' OMICS, 6, 207-212, 2002   DOI   ScienceOn
16 Na, S. and Paek, E., 'Quality Assessment of Tandem Mass Spectra Based on Cumulative Intensity Normalization,' J. Proteome Res., 5, 3241-3248, 2006   DOI   ScienceOn
17 Nesvizhskii, A. I., Roos, F. F., Grossmann, J., Vogelzang, M., Eddes, J. S., Gruissem, W., Baginsky, S. and Aebersold, R., 'Dynamic Spectrum Quality Assessment and Iterative Computational Analysis of Shotgun Proteomic Data,' Mol. Cell. Proteomics, 5, 652-670, 2006   DOI
18 Perkins, D. N., Pappin, D. J. C., Creasy, D. M., and Cottrell, J. S. 'Probability-based protein identification by searching sequence databases using mass spectrometry data,' Electrophoresis, 20, 3551-3567, 1999   DOI   ScienceOn
19 Ma, B., Zhang, K., Hendrie, C., Liang, C., Li, M., Doherty-Kirby, A. and Lajoie, G., 'PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry,' Rapid Commun. Mass Spectrom., 17, 2337-2342, 2003   DOI   ScienceOn
20 Xu, M., Geer, L. Y., Bryant, S. H., Roth, J. S., Kowalak, J. A., Maynard, D. M. and Markey, S. P., 'Assessing Data Quality of Peptide Mass Spectra Obtained by Quadrupole Ion Trap Mass Spectrometry,' J. Proteome Res., 4, 300-305, 2005   DOI   ScienceOn
21 Huang, Y., Triscari, J. M., Tseng, G. C., Pasa-Tolic, L., Lipton, M. S., Smith, R. D. and Wysocki, V. H., 'Statistical Characterization of the Charge State and Residue Dependence of Low-Energy CID Peptide Dissociation Patterns,' Anal. Chem., 77, 5800-5813, 2005   DOI   ScienceOn
22 Keller, A., Nesvizhskii, A. I., Kolker, E. and Aebersold, R., 'Empirical Statistical Model To Estimate the Accuracy of Peptide Identifications Made by MS/MS and Database Search,' Anal. Chem.., 74, 5383-5392, 2002   DOI   ScienceOn
23 Taylor, J. A. and Johnson, R. S., 'Implementation and Uses of Automated de Novo Peptide Sequencing by Tandem Mass Spectrometry,' Anal. Chem., 74, 2594-2604, 2001
24 Sadygov, R. G., Eng, J., Durr, E., Saraf, A., McDonald, H., MacCoss, M. J. and Yates, J. R., III., 'Code Developments to Improve the Efficiency of Automated MS/MS Spectra Interpretation,' J. Proteome Res., 1, 211-215, 2002   DOI   ScienceOn
25 Colinge, J., Magnin, J., Dessingy, T., Giron, M. and Masselot, A., 'Improved peptide charge state assignment,' Proteomics, 3, 1434-1440, 2003   DOI   ScienceOn
26 Flikka, K., Martens, L., Vandekerckhove, J., Gevaert, K. and Eidhammer, I., 'Improving the reliability and throughput of mass spectrometrybased proteomics by spectrum quality filtering,' Proteomics, 6, 2086-2094, 2006   DOI   ScienceOn