Assessment of the Reliability of Protein-Protein Interactions Using Protein Localization and Gene Expression Data

  • Lee, Hyun-Ju (Department of Computer Science, University of Southern California) ;
  • Deng, Minghua (School of Mathematics, Beijing University) ;
  • Sun, Fengzhu (Molecular and Computational Biology Program, University of Southern California) ;
  • Chen, Ting (Molecular and Computational Biology Program, University of Southern California)
  • Published : 2005.09.22

Abstract

Estimating the reliability of protein-protein interaction data sets obtained by high-throughput technologies such as yeast two-hybrid assays and mass spectrometry is of great importance. We develop a maximum likelihood estimation method that uses both protein localization and gene expression data to estimate the reliability of protein interaction data sets. By integrating protein localization data and gene expression data, we can obtain more accurate estimates of the reliability of various interaction data sets. We apply the method to protein physical interaction data sets and protein complex data sets. The reliability of the yeast two-hybrid interactions by Ito et al. (2001) is 27%, and that by Uetz et at.(2000) is 68%. The reliability of the protein complex data sets using tandem affinity purification-mass spec-trometry (TAP) by Gavin et at. (2002) is 45%, and that using high-throughput mass spectrometric protein complex identification (HMS-PCI) by Ho et al. (2002) is 20%. The method is general and can be applied to analyze any protein interaction data sets.

Keywords