DOI QR코드

DOI QR Code

Prediction of Protein-Protein Interactions from Sequences using a Correlation Matrix of the Physicochemical Properties of Amino Acids

  • Received : 2021.03.05
  • Published : 2021.03.30

Abstract

Detection of protein-protein interactions (PPIs) remains essential for the development of therapies against diseases. Experimental studies to detect PPI are longer and more expensive. Today, with the availability of PPI data, several computer models for predicting PPIs have been proposed. One of the big challenges in this task is feature extraction. The relevance of the information extracted by some extraction techniques remains limited. In this work, we first propose an extraction method based on correlation relationships between the physicochemical properties of amino acids. The proposed method uses a correlation matrix obtained from the hydrophobicity and hydrophilicity properties that it then integrates in the calculation of the bigram. Then, we use the SVM algorithm to detect the presence of an interaction between 2 given proteins. Experimental results show that the proposed method obtains better performances compared to the approaches in the literature. It obtains performances of 94.75% in accuracy, 95.12% in precision and 96% in sensitivity on human HPRD protein data.

Keywords

References

  1. H. Zhu et al., 'Global Analysis of Protein Activities Using Proteome Chips', Science, vol. 293, no. 5537, pp. 2101-2105, Sep. 2001, doi: 10.1126/science.1062191.
  2. T. Ito, T. Chiba, R. Ozawa, M. Yoshida, M. Hattori, and Y. Sakaki, 'A comprehensive two-hybrid analysis to explore the yeast protein interactome', PNAS, vol. 98, no. 8, pp. 4569-4574, Apr. 2001, doi: 10.1073/pnas.061034498.
  3. C. D. Nguyen, K. J. Gardiner, and K. J. Cios, 'Protein annotation from protein interaction networks and Gene Ontology', Journal of Biomedical Informatics, vol. 44, no. 5, pp. 824-829, Oct. 2011, doi: 10.1016/j.jbi.2011.04.010.
  4. T. S. Keshava Prasad et al., 'Human Protein Reference Database--2009 update', Nucleic Acids Research, vol. 37, no. Database, pp. D767-D772, Jan. 2009, doi: 10.1093/nar/gkn892.
  5. I. Xenarios, D. W. Rice, L. Salwinski, M. K. Baron, E. M. Marcotte, and D. Eisenberg, 'DIP: the Database of Interacting Proteins', Nucleic Acids Res, vol. 28, no. 1, pp. 289-291, Jan. 2000. https://doi.org/10.1093/nar/28.1.289
  6. B. Aranda et al., 'The IntAct molecular interaction database in 2010', Nucleic Acids Res., vol. 38, no. Database issue, pp. D525-531, Jan. 2010, doi: 10.1093/nar/gkp878.
  7. G. D. Bader, D. Betel, and C. W. Hogue, 'BIND: the biomolecular interaction network database', Nucleic acids research, vol. 31, no. 1, pp. 248-250, 2003. https://doi.org/10.1093/nar/gkg056
  8. K.-C. Chou, 'Pseudo Amino Acid Composition and its Applications in Bioinformatics, Proteomics and System Biology', Current Proteomics, vol. 6, no. 4, pp. 262-274, Dec. 2009, doi: 10.2174/157016409789973707.
  9. Y. Guo, L. Yu, Z. Wen, and M. Li, 'Using support vector machine combined with auto covariance to predict protein-protein interactions from protein sequences', Nucleic Acids Res, vol. 36, no. 9, pp. 3025-3030, May 2008, doi: 10.1093/nar/gkn159.
  10. Z.-H. You, X. Li, and K. C. Chan, 'An improved sequence-based prediction protocol for protein-protein interactions using amino acids substitution matrix and rotation forest ensemble classifiers', Neurocomputing, vol. 228, pp. 277-282, Mar. 2017, doi: 10.1016/j.neucom.2016.10.042.
  11. L. Wong, Z.-H. You, S. Li, Y.-A. Huang, and G. Liu, 'Detection of protein-protein interactions from amino acid sequences using a rotation forest model with a novel PR-LPQ descriptor', in International Conference on Intelligent Computing, 2015, pp. 713-720.
  12. X.-Y. Pan, Y.-N. Zhang, and H.-B. Shen, 'Large-Scale Prediction of Human Protein-Protein Interactions from Amino Acid Sequence Based on Latent Topic Features', J. Proteome Res., vol. 9, no. 10, pp. 4992-5001, Oct. 2010, doi: 10.1021/pr100618t.
  13. T. Beysolow II, Applied Natural Language Processing with Python: Implementing Machine Learning and Deep Learning Algorithms for Natural Language Processing. Berkeley, CA: Apress, 2018.
  14. Y. E. Goktepe and H. Kodaz, 'Prediction of Protein-Protein Interactions Using An Effective Sequence Based Combined Method', Neurocomputing, vol. 303, pp. 68-74, Aug. 2018, doi: 10.1016/j.neucom.2018.03.062.
  15. A. Sharma, J. Lyons, A. Dehzangi, and K. K. Paliwal, 'A feature extraction technique using bi-gram probabilities of position specific scoring matrix for protein fold recognition', Journal of Theoretical Biology, vol. 320, pp. 41-46, Mar. 2013, doi: 10.1016/j.jtbi.2012.12.008.
  16. A. Dehzangi et al., 'PSSM-Suc: Accurately predicting succinylation using position specific scoring matrix into bigram for feature extraction', Journal of Theoretical Biology, vol. 425, pp. 97-102, Jul. 2017, doi: 10.1016/j.jtbi.2017.05.005.
  17. M. Gribskov, A. D. McLachlan, and D. Eisenberg, 'Profile analysis: detection of distantly related proteins', PNAS, vol. 84, no. 13, pp. 4355-4358, Jul. 1987, doi: 10.1073/pnas.84.13.4355.
  18. C. N. Kopoin, Nt. Tchimou, B. K. Saha, and M. Babri, 'A Feature Extraction Method in Large Scale Prediction of Human Protein-Protein Interactions using Physicochemical Properties into Bi-gram', in 2020 IEEE International Conf on Natural and Engineering Sciences for Sahel's Sustainable Development - Impact of Big Data Application on Society and Environment (IBASE-BF), Feb. 2020, pp. 1-7, doi: 10.1109/IBASEBF48578.2020.9069594.
  19. G. D. Rose, A. R. Geselowitz, G. J. Lesser, R. H. Lee, and M. H. Zehfus, 'Hydrophobicity of amino acid residues in globular proteins', Science, vol. 229, no. 4716, pp. 834-838, Aug. 1985, doi: genetic. https://doi.org/10.1126/science.4023714
  20. J. Jia, Z. Liu, X. Xiao, B. Liu, and K.-C. Chou, 'Identification of protein-protein binding sites by incorporating the physicochemical properties and stationary wavelet transforms into pseudo amino acid composition', Journal of Biomolecular Structure and Dynamics, vol. 34, no. 9, pp. 1946-1961, Sep. 2016, doi: 10.1080/07391102.2015.1095116.
  21. C. J. Shin, S. Wong, M. J. Davis, and M. A. Ragan, 'Protein-protein interaction as a predictor of subcellular location', BMC Syst Biol, vol. 3, no. 1, p. 28, Feb. 2009, doi: 10.1186/1752-0509-3-28.
  22. Y.-A. Huang, Z.-H. You, X. Chen, K. Chan, and X. Luo, 'Sequence-based prediction of protein-protein interactions using weighted sparse representation model combined with global encoding', BMC Bioinformatics, vol. 17, no. 1, p. 184, Dec. 2016, doi: 10.1186/s12859-016-1035-4.
  23. Z.-H. You, Y.-K. Lei, L. Zhu, J. Xia, and B. Wang, 'Prediction of protein-protein interactions from amino acid sequences with ensemble extreme learning machines and principal component analysis', BMC Bioinformatics, vol. 14, no. S8, p. S10, May 2013, doi: 10.1186/1471-2105-14-S8-S10.
  24. S. Martin, D. Roe, and J.-L. Faulon, 'Predicting protein-protein interactions using signature products', Bioinformatics, vol. 21, no. 2, pp. 218-226, Jan. 2005, doi: 10.1093/bioinformatics/bth483.
  25. 'Database resources of the National Center for Biotechnology Information', Nucleic Acids Res, vol. 44, no. Database issue, pp. D7-D19, Jan. 2016, doi: 10.1093/nar/gkv1290.
  26. P.-A. Binz et al., 'Proteomics standards initiative extended FASTA format', Journal of proteome research, vol. 18, no. 6, pp. 2686-2692, 2019. https://doi.org/10.1021/acs.jproteome.9b00064
  27. K.-C. Chou, 'Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes', Bioinformatics, vol. 21, no. 1, pp. 10-19, Jan. 2005, doi: 10.1093/bioinformatics/bth466.
  28. Z.-H. You, J.-Z. Yu, L. Zhu, S. Li, and Z.-K. Wen, 'A MapReduce based parallel SVM for large-scale predicting protein-protein interactions', Neurocomputing, vol. 145, pp. 37-43, Dec. 2014. https://doi.org/10.1016/j.neucom.2014.05.072
  29. S. B. Rakhmetulayeva, K. S. Duisebekova, A. M. Mamyrbekov, D. K. Kozhamzharova, G. N. Astaubayeva, and K. Stamkulova, 'Application of Classification Algorithm Based on SVM for Determining the Effectiveness of Treatment of Tuberculosis', Procedia Computer Science, vol. 130, pp. 231-238, Jan. 2018, doi: 10.1016/j.procs.2018.04.034.
  30. A. J. Gonzalez and L. Liao, 'Predicting domain-domain interaction based on domain profiles with feature selection and support vector machines', BMC Bioinformatics, vol. 11, no. 1, p. 537, Oct. 2010, doi: 10.1186/1471-2105-11-537.
  31. A. Ben-Hur and W. S. Noble, 'Kernel methods for predicting protein-protein interactions', Bioinformatics, vol. 21, no. suppl_1, pp. i38-i46, Jun. 2005, doi: 10.1093/bioinformatics/bti1016.
  32. Z.-H. You, L. Zhu, C.-H. Zheng, H.-J. Yu, S.-P. Deng, and Z. Ji, 'Prediction of protein-protein interactions from amino acid sequences using a novel multi-scale continuous and discontinuous feature set', BMC Bioinformatics, vol. 15, no. 15, p. S9, Dec. 2014, doi: 10.1186/1471-2105-15-S15-S9.
  33. Y. Yao, X. Du, Y. Diao, and H. Zhu, 'An integration of deep learning with feature embedding for protein-protein interaction prediction', PeerJ, vol. 7, p. e7126, Jun. 2019, doi: 10.7717/peerj.7126.