Browse > Article
http://dx.doi.org/10.3745/KIPSTB.2006.13B.7.679

Protein-Protein Interaction Reliability Enhancement System based on Feature Selection and Classification Technique  

Lee, Min-Su (이화여자대학교 컴퓨터학과)
Park, Seung-Soo (이화여자대학교 컴퓨터학과)
Lee, Sang-Ho (이화여자대학교 컴퓨터학과)
Yong, Hwan-Seung (이화여자대학교 컴퓨터학과)
Kang, Sung-Hee (명지대학교 방목기초교육대학)
Abstract
Protein-protein interaction data obtained from high-throughput experiments includes high false positives. In this paper, we introduce a new protein-protein interaction reliability verification system. The proposed system integrates various biological features related with protein-protein interactions, and then selects the most relevant and informative features among them using a feature selection method. To assess the reliability of each protein-protein interaction data, the system construct a classifier that can distinguish true interacting protein pairs from noisy protein-protein interaction data based on the selected biological evidences using a classification technique. Since the performance of feature selection methods and classification techniques depends heavily upon characteristics of data, we performed rigorous comparative analysis of various feature selection methods and classification techniques to obtain optimal performance of our system. Experimental results show that the combination of feature selection method and classification algorithms provide very powerful tools in distinguishing true interacting protein pairs from noisy protein-protein interaction dataset. Also, we investigated the effects on performances of feature selection methods and classification techniques in the proposed protein interaction verification system.
Keywords
Data Mining; Classification Technique; Feature Selection; Protein-Protein Interaction;
Citations & Related Records
연도 인용수 순위
  • Reference
1 C. von Mering, R. Krause, B. Snel, M. cornell, et al. 'Comparative Assessment of Large-Scale Data Sets of Protein-Protein Interactions,' Nature, Vol.417, pp.399-403, 2002   DOI   ScienceOn
2 A. Ruepp, A. Zollner , D. Maier, K. Albermann, et al.: The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes. Nucleic Acids Res. 32, 5539-5545, 2004   DOI   ScienceOn
3 Y. Ho, A. Gruhler, A. Heilbut, et al. 'Systematic Indenification of Protein Complexes in Saccharomyces Cerevisiae by Mass Spectrometry,' Nature, Vol.415, pp.180-183, 2002   DOI   ScienceOn
4 R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers, San Mateo, CA. 1993
5 U. Guldener, M. Munsterkotter, M. Oesterheld, et al. 'MPact: the MIPS Protein Interaction Resource on Yeast,' Nucleic Acids Research 34, D436-D441, 2006   DOI   ScienceOn
6 T. Ito, T. Chiba, R. Ozawa, M. Yoshida, et al. 'A Comprehensive Two-Hybrid Analysis to Explore the Yeast Protein Interactome,' PNAS, Vol.98, pp.4569-4574, 2001   DOI   ScienceOn
7 A. C. Gavin, M. Bosche, R. Krause, et al. 'Functional Organization of the Yeast Proteome by Systematic Analysis of Protein Complexes,' Nature, Vol.415, pp.141-147, 2002   DOI   ScienceOn
8 The Gene Ontology Consortium, 'Gene Ontology: Tool for the unfication of biology,' Nature Genetics 25, 25-29, 2000   DOI   ScienceOn
9 H. W. Mewes, D. Fishman, K. F. X. Mayer, et al, 'MIPS: Analysis and Annotation of Proteins from Whole Genomes in 2005,' Nucleic Acids Research 34, D169-D172, 2005   DOI   ScienceOn
10 P. Uetz, L. Giot, G. Cagney, T. A. Mansfield, et al. 'A Comprehensive Analysis of Protein-Protein Interactions in Saccharomyces Cerevisiae,' Nature, Vol.403, pp.623-627, 2000   DOI   ScienceOn
11 D. Aha and D. Kibler, 'Instance-based Learning Algorithms,' Machine Learing Vol.6, pp.37-66, 1991   DOI
12 G. H. John and P. Langley, 'Estimating Continuous Distributions in Bayesian Classifiers,' Proc. of the 11th Conf. On Uncertainty in Artificial Intelligence.pp.338-345, Morgan Kaufmann, San Mateo. 1995
13 J. Platt, 'Fast Training of Support Vector Machines using Sequential Minimal Optimization,' Advances in kernel methods -support vector learning, Schoelkopf, B., Burges, C. and Smola, A. eds., MIT Press. 1998
14 I. J. Witten and E. Frank, Data Mining: Practical Machine Learning Tools with Java Implementations. Morgan Kaufmann, San Francisco, CA. 2000
15 P. N. Tan, M. Stenbach, V. Kumar, Introduction to Data Mining, Addison Wesley, 2005
16 M. P. Samanta and S. Liang, 'Predicting Protein Funcyions from Redundancies in Large-scale Protein Interaction Networks,' PNAS, Vol.100, pp.12579-12583, 2003   DOI   ScienceOn
17 M., A. Steffen, Petti, J. Aach, P. D'haeseleer, and G. Church, 'Automated Modeling of Signal Transduction Networks,' BMC Bioinformatics, Vol.3, pp.34-44, 2002   DOI
18 I. Guyon and A. Elisseff, 'An introduction to variable and feature selection,' Journal of machine learning research, 3, 1157-1182, 2003   DOI
19 A. Patil and H. Nakamura, 'Filtering High-throughput Protein-Protein Interaction Data using a Combination of Genomic Features,' BMC Bionformatics, 6:100-112, 2005   DOI   ScienceOn
20 A. Vazquez, A. Flammini, A. Maritan, and A. Vespignani, 'Global Protein Function Prediction from Protein-Protein Interaction Networks,' Nature Biotechnology, Vol.21, pp.697-700, 2003   DOI   ScienceOn
21 E. Sprinzak, S. Sattath and H. J. Margalit. 'How reliable are experimental Protein-Protein Interaction data?' Molecular Biology, Vol. 327, pp.919-923, 2003   DOI   ScienceOn
22 L. J. Lu, A, Paccanaro, H. Yu, 'Assessing the Limits of Genomic Data Integration for Predictiong Protein Networks,' Genome Research 15, 9455-953, 2005   DOI   ScienceOn
23 M. Deng, F, Sun, T. Chen, 'Assessment of the Reliablity of Protein-Protein Interactions and Protein Function Prediction,' Symp. Biocomputing, 140-151, 2003
24 R. Jansen, H. Yu, D. Greenbaum et al. 'A Bayesian Network Approach for Predictiong Protein-Protein Interactions from Genomic Data,' Science 203, 449-153, 2003   DOI   ScienceOn
25 T. Sato, Y. Yamanishi, M. Kanehisa, and H. Toh, 'The Inference of Protein-Protein Interactions by Co-evolutionary Analysis is Improved by Excluding the Information about the Phylogenetic Relationships,' Bionformatics Vol.21, pp.3482-3489, 2005   DOI   ScienceOn
26 L. R. Mattews, P. Vaglio, J. Reboul, H. Ge, et al. 'Identification of Potential Interaction Networks using Sequence-Based Searches for Conserved Protein-Protein Iinteractions or Interologs'' Genome Research, Vol.11, pp.2120-2126, 2001   DOI   ScienceOn
27 R. Jasen, D. Greenbaum and M. Gerstein, 'Relating Whole-genome Expression Data with Protein-Protein Interaction,' Genome Research Vol.12, pp.37-46, 2002   DOI   ScienceOn
28 N. Bhardwaj and H. Lu, 'Correlation between Gene Expression Profiles and Protein-Protein Interactions within and across Genomes,' Bioinformatics vol.21, pp.2730-2738, 2005   DOI   ScienceOn
29 C. M. Deane, L. Salwinski, I. Xenarios, and D. Eisenber, 'ProteinInteractions: Two Methods for Assessment of the Reliability of High Throughput Observations,' Molecular and Cellular Proteomics, Vol.1, pp.349-356, 2002   DOI
30 H. Ge, Z, Liu, G. M. Church, and M. Vidal, 'Correlation between Transcriptome and Interactome Mapping Data from Saccharomyces Cerevisiae,' Nature Genetics, Vol.29, pp.482-486, 2001   DOI   ScienceOn