Browse > Article

Prediction of Implicit Protein - Protein Interaction Using Optimal Associative Feature Rule  

Eom, Jae-Hong (서울대학교 전기컴퓨터공학부)
Zhang, Byoung-Tak (서울대학교 전기컴퓨터공학부)
Abstract
Proteins are known to perform a biological function by interacting with other proteins or compounds. Since protein interaction is intrinsic to most cellular processes, prediction of protein interaction is an important issue in post-genomic biology where abundant interaction data have been produced by many research groups. In this paper, we present an associative feature mining method to predict implicit protein-protein interactions of Saccharomyces cerevisiae from public protein interaction data. We discretized continuous-valued features by maximal interdependence-based discretization approach. We also employed feature dimension reduction filter (FDRF) method which is based on the information theory to select optimal informative features, to boost prediction accuracy and overall mining speed, and to overcome the dimensionality problem of conventional data mining approaches. We used association rule discovery algorithm for associative feature and rule mining to predict protein interaction. Using the discovered associative feature we predicted implicit protein interactions which have not been observed in training data. According to the experimental results, the proposed method accomplished about 96.5% prediction accuracy with reduced computation time which is about 29.4% faster than conventional method with no feature filter in association rule mining.
Keywords
Protein-protein interaction; Feature association mining; Association rule; Data mining; Bioinformatics;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Dohkan, S., Koike, A., and Takagi, T., 'Prediction of protein-protein interactions using support vector machines,' In Proc. 4th IEEE Symp. Bioinfo. Bioeng. (BIBE'04), pp. 576-586, 2004
2 Chen, S.-C. and Bahar, I., 'Mining frequent patterns in protein structures: a study of protease families,' Bioinformatics, Vol.20, Suppl.1, pp. i77-i85, 2004   DOI   ScienceOn
3 Csank C. and Costanzo M. C. et aI., 'Three yeast proteome databases: YPD, PombePD, and CalPD (MycoPathPD),' Methods Enzymol., VoI.350, pp. 347-373, 2002   DOI
4 Qi, Y., Klein-Seetharaman, J., and Bar-Joseph, Z., 'Random forest similarity for protein-protein interaction prediction from multiple sources,' In Proc. Pac. Symp, Biocomput., pp. 531-542, 2005   DOI
5 Aytuna, A. S., Gursoy, A., and Keskin, O., 'Prediction of protein - protein interactions by combining structure and sequence conservation in protein interfaces,' Bioinformatics, Vol.21, No.12, pp. 2850-2855, 2005   DOI   ScienceOn
6 Kurgan, L. A. and Cios, K. J., 'CAIM Discretization Algorithm,' IEEE Trans. Knowledge and Data Eng., Vol.16, No.2, pp. 145-153, 2004   DOI   ScienceOn
7 Quinlan, J. R., C4.5: Programs for machine learning, Morgan Kaufmann Publishers, San Francisco, 1993
8 Oyama, T, Kitano, K., Satou, K, and Ito, T, 'Extraction of knowledge on protein-protein interaction by association rule discovery,' Bioinformatics, Vol.18, No.5, pp. 705-714, 2002   DOI   ScienceOn
9 Press, W. H. and Flannery, B. P. et aI., 'Numerical recipes in C: The Art of Scientific Computing,' 2nd Ed., pp. 633-634, Cambridge University Press, Cambridge, 1992
10 Fellenberg, M., Albermann, K, Zollner, A., Mewes, H. W., and Hani, J. 'Integrative analysis of protein interaction data,' In Proc. Int. Conf. Intell. Syst, Mol. BioI., Vol.8, pp. 152-161, 2000
11 Yu, L. and Liu, H., 'Feature selection for high dimensional data: a fast correlation-based filter solution,' In Proceedings of the 20th International Conference on Machine Leaning (ICML-03), pp. 856-863, 2003
12 Ito, T, Matsui, Y., Ago, T, Ota, K, and Sumimoto, H., 'Novel modular domain PB1 recognizes PC motif to mediate functional protein-protein interactions,' EMBO J., Vol.20, pp. 3938-3946, 2001   DOI   ScienceOn
13 Agrawal, R, Imielinski, T, and Swami, A., 'Mining association rules between sets of items in large data-bases,' In Proc. ACM SIGMOD-93, pp. 207-216, 1993   DOI
14 Satou, K and Shibayama, G. et al., 'Finding association rules on heterogeneous genome data,' In Proc. Pac. Symp, Biocornput., pp. 397-408, 1997
15 Creighton, C. and Hanash, S., 'Mining gene expression databases for association rules,' Bioinformatics, Vol.19, No.1, pp. 79-86, 2003   DOI   ScienceOn
16 Hartwell L., 'Robust Interactions,' Science, Vol.303, No.5659, pp. 774-775, 2004   DOI   ScienceOn
17 Tong A. H. and Lesage G. et al., 'Global mapping of the yeast genetic interaction network,' Science, Vol.303, No.5659, pp. 808-813, 2004   DOI   ScienceOn
18 Uetz, P. and Giot, L. et aI., 'A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae,' Nature, Vol.403, No.6770, pp. 623-627, 2000   DOI   ScienceOn
19 Bu, D. and Zhao, Y. et aI., 'Topological structure analysis of the protein-protein interaction network in budding yeast,' Nucl. Acids. Res., Vol.31, No.9, pp. 2443-2450, 2003   DOI   ScienceOn
20 Ito, T and Chiba, T et aI., 'A comprehensive two-hybrid analysis to explore the yeast protein interactome,' Proc. Natl Acad. Sci., Vol.98, pp. 4569-4574, 2001   DOI   ScienceOn
21 Iossifov, 1. and Krauthammer, M. et aI., 'Probabilistic inference of molecular networks from noisy data sources,' Bioinformatics, Vol.20, No.8, pp. 1205-12013, 2004   DOI   ScienceOn
22 Goffeau, A. and Barrell, B. G. et aI., 'Life with 6000 genes,' Science, Vol.274, pp. 563-567, 1996   DOI   ScienceOn
23 Ng, S. K., Zhang, Z., and Tan, S. H., 'Integrative approach for computationally inferring protein domain interactions,' Bioinformatics, Vol.19, No.8, pp. 923-29, 2003   DOI   ScienceOn
24 Fields, S. and Stemglanz, R, 'The two-hybrid system: an assay for protein-protein interactions,' Trends in Genetics, Vol.10, pp. 286-92, 1994   DOI   ScienceOn
25 Park, J., Lappe, M., and Teichmann, S. A., 'Mapping protein family interactions: intra-molecular and intermolecular protein family interaction repertoires in the PDB and yeast,' J. Mol. BioI. VoI.307, pp. 929-39, 2001   DOI   ScienceOn
26 Pavlidis, P. and Weston, J., Gene functional classification from heterogeneous data,' In Proc. 5th Int. Conf. Comput. Mol. Biol, (RECOMB2001), pp. 249-55, 2001   DOI
27 Wu, L. F. and Hughes, T. R. et aI., 'Large-scale prediction of Saccharomyces cerevisiae gene function using overlapping transcriptional clusters,' Nature Genetics, Vol.31, pp. 255-265, 2002   DOI   ScienceOn
28 Eisen, M. B., Spellman, P. T., Brown, P.O., and Botstein, D., 'Cluster analysis and display of genomewide expression patterns,' Proc. Nat!. Acad. Sci., Vol.95, pp. 14863-14868, 1998   DOI   ScienceOn
29 Deng, M., Mehta, S., Sun, F., and Chen, T., 'Inferring domain-domain interactions from protein - protein interactions,' Genome Res. Vo1.12, No.10, pp. 1540-1548, 2002   DOI   ScienceOn