Browse > Article
http://dx.doi.org/10.22937/IJCSNS.2021.21.3.6

Prediction of Protein-Protein Interactions from Sequences using a Correlation Matrix of the Physicochemical Properties of Amino Acids  

Kopoin, Charlemagne N'Diffon (Institut National Polytechnique Felix Houphouet Boigny)
Atiampo, Armand Kodjo (Universite Virtuelle de Cote d'Ivoire)
N'Guessan, Behou Gerard (Universite Virtuelle de Cote d'Ivoire)
Babri, Michel (Institut National Polytechnique Felix Houphouet Boigny)
Publication Information
International Journal of Computer Science & Network Security / v.21, no.3, 2021 , pp. 41-47 More about this Journal
Abstract
Detection of protein-protein interactions (PPIs) remains essential for the development of therapies against diseases. Experimental studies to detect PPI are longer and more expensive. Today, with the availability of PPI data, several computer models for predicting PPIs have been proposed. One of the big challenges in this task is feature extraction. The relevance of the information extracted by some extraction techniques remains limited. In this work, we first propose an extraction method based on correlation relationships between the physicochemical properties of amino acids. The proposed method uses a correlation matrix obtained from the hydrophobicity and hydrophilicity properties that it then integrates in the calculation of the bigram. Then, we use the SVM algorithm to detect the presence of an interaction between 2 given proteins. Experimental results show that the proposed method obtains better performances compared to the approaches in the literature. It obtains performances of 94.75% in accuracy, 95.12% in precision and 96% in sensitivity on human HPRD protein data.
Keywords
Feature extraction; bigram; NLP; protein-protein interaction;
Citations & Related Records
연도 인용수 순위
  • Reference
1 H. Zhu et al., 'Global Analysis of Protein Activities Using Proteome Chips', Science, vol. 293, no. 5537, pp. 2101-2105, Sep. 2001, doi: 10.1126/science.1062191.   DOI
2 S. B. Rakhmetulayeva, K. S. Duisebekova, A. M. Mamyrbekov, D. K. Kozhamzharova, G. N. Astaubayeva, and K. Stamkulova, 'Application of Classification Algorithm Based on SVM for Determining the Effectiveness of Treatment of Tuberculosis', Procedia Computer Science, vol. 130, pp. 231-238, Jan. 2018, doi: 10.1016/j.procs.2018.04.034.   DOI
3 C. N. Kopoin, Nt. Tchimou, B. K. Saha, and M. Babri, 'A Feature Extraction Method in Large Scale Prediction of Human Protein-Protein Interactions using Physicochemical Properties into Bi-gram', in 2020 IEEE International Conf on Natural and Engineering Sciences for Sahel's Sustainable Development - Impact of Big Data Application on Society and Environment (IBASE-BF), Feb. 2020, pp. 1-7, doi: 10.1109/IBASEBF48578.2020.9069594.
4 Z.-H. You, J.-Z. Yu, L. Zhu, S. Li, and Z.-K. Wen, 'A MapReduce based parallel SVM for large-scale predicting protein-protein interactions', Neurocomputing, vol. 145, pp. 37-43, Dec. 2014.   DOI
5 Z.-H. You, Y.-K. Lei, L. Zhu, J. Xia, and B. Wang, 'Prediction of protein-protein interactions from amino acid sequences with ensemble extreme learning machines and principal component analysis', BMC Bioinformatics, vol. 14, no. S8, p. S10, May 2013, doi: 10.1186/1471-2105-14-S8-S10.   DOI
6 'Database resources of the National Center for Biotechnology Information', Nucleic Acids Res, vol. 44, no. Database issue, pp. D7-D19, Jan. 2016, doi: 10.1093/nar/gkv1290.   DOI
7 P.-A. Binz et al., 'Proteomics standards initiative extended FASTA format', Journal of proteome research, vol. 18, no. 6, pp. 2686-2692, 2019.   DOI
8 T. Ito, T. Chiba, R. Ozawa, M. Yoshida, M. Hattori, and Y. Sakaki, 'A comprehensive two-hybrid analysis to explore the yeast protein interactome', PNAS, vol. 98, no. 8, pp. 4569-4574, Apr. 2001, doi: 10.1073/pnas.061034498.   DOI
9 Z.-H. You, L. Zhu, C.-H. Zheng, H.-J. Yu, S.-P. Deng, and Z. Ji, 'Prediction of protein-protein interactions from amino acid sequences using a novel multi-scale continuous and discontinuous feature set', BMC Bioinformatics, vol. 15, no. 15, p. S9, Dec. 2014, doi: 10.1186/1471-2105-15-S15-S9.   DOI
10 A. Ben-Hur and W. S. Noble, 'Kernel methods for predicting protein-protein interactions', Bioinformatics, vol. 21, no. suppl_1, pp. i38-i46, Jun. 2005, doi: 10.1093/bioinformatics/bti1016.   DOI
11 Y. E. Goktepe and H. Kodaz, 'Prediction of Protein-Protein Interactions Using An Effective Sequence Based Combined Method', Neurocomputing, vol. 303, pp. 68-74, Aug. 2018, doi: 10.1016/j.neucom.2018.03.062.   DOI
12 I. Xenarios, D. W. Rice, L. Salwinski, M. K. Baron, E. M. Marcotte, and D. Eisenberg, 'DIP: the Database of Interacting Proteins', Nucleic Acids Res, vol. 28, no. 1, pp. 289-291, Jan. 2000.   DOI
13 G. D. Bader, D. Betel, and C. W. Hogue, 'BIND: the biomolecular interaction network database', Nucleic acids research, vol. 31, no. 1, pp. 248-250, 2003.   DOI
14 K.-C. Chou, 'Pseudo Amino Acid Composition and its Applications in Bioinformatics, Proteomics and System Biology', Current Proteomics, vol. 6, no. 4, pp. 262-274, Dec. 2009, doi: 10.2174/157016409789973707.   DOI
15 A. Dehzangi et al., 'PSSM-Suc: Accurately predicting succinylation using position specific scoring matrix into bigram for feature extraction', Journal of Theoretical Biology, vol. 425, pp. 97-102, Jul. 2017, doi: 10.1016/j.jtbi.2017.05.005.   DOI
16 Y.-A. Huang, Z.-H. You, X. Chen, K. Chan, and X. Luo, 'Sequence-based prediction of protein-protein interactions using weighted sparse representation model combined with global encoding', BMC Bioinformatics, vol. 17, no. 1, p. 184, Dec. 2016, doi: 10.1186/s12859-016-1035-4.   DOI
17 T. S. Keshava Prasad et al., 'Human Protein Reference Database--2009 update', Nucleic Acids Research, vol. 37, no. Database, pp. D767-D772, Jan. 2009, doi: 10.1093/nar/gkn892.   DOI
18 Z.-H. You, X. Li, and K. C. Chan, 'An improved sequence-based prediction protocol for protein-protein interactions using amino acids substitution matrix and rotation forest ensemble classifiers', Neurocomputing, vol. 228, pp. 277-282, Mar. 2017, doi: 10.1016/j.neucom.2016.10.042.   DOI
19 L. Wong, Z.-H. You, S. Li, Y.-A. Huang, and G. Liu, 'Detection of protein-protein interactions from amino acid sequences using a rotation forest model with a novel PR-LPQ descriptor', in International Conference on Intelligent Computing, 2015, pp. 713-720.
20 T. Beysolow II, Applied Natural Language Processing with Python: Implementing Machine Learning and Deep Learning Algorithms for Natural Language Processing. Berkeley, CA: Apress, 2018.
21 M. Gribskov, A. D. McLachlan, and D. Eisenberg, 'Profile analysis: detection of distantly related proteins', PNAS, vol. 84, no. 13, pp. 4355-4358, Jul. 1987, doi: 10.1073/pnas.84.13.4355.   DOI
22 G. D. Rose, A. R. Geselowitz, G. J. Lesser, R. H. Lee, and M. H. Zehfus, 'Hydrophobicity of amino acid residues in globular proteins', Science, vol. 229, no. 4716, pp. 834-838, Aug. 1985, doi: genetic.   DOI
23 J. Jia, Z. Liu, X. Xiao, B. Liu, and K.-C. Chou, 'Identification of protein-protein binding sites by incorporating the physicochemical properties and stationary wavelet transforms into pseudo amino acid composition', Journal of Biomolecular Structure and Dynamics, vol. 34, no. 9, pp. 1946-1961, Sep. 2016, doi: 10.1080/07391102.2015.1095116.   DOI
24 A. Sharma, J. Lyons, A. Dehzangi, and K. K. Paliwal, 'A feature extraction technique using bi-gram probabilities of position specific scoring matrix for protein fold recognition', Journal of Theoretical Biology, vol. 320, pp. 41-46, Mar. 2013, doi: 10.1016/j.jtbi.2012.12.008.   DOI
25 K.-C. Chou, 'Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes', Bioinformatics, vol. 21, no. 1, pp. 10-19, Jan. 2005, doi: 10.1093/bioinformatics/bth466.   DOI
26 Y. Yao, X. Du, Y. Diao, and H. Zhu, 'An integration of deep learning with feature embedding for protein-protein interaction prediction', PeerJ, vol. 7, p. e7126, Jun. 2019, doi: 10.7717/peerj.7126.   DOI
27 B. Aranda et al., 'The IntAct molecular interaction database in 2010', Nucleic Acids Res., vol. 38, no. Database issue, pp. D525-531, Jan. 2010, doi: 10.1093/nar/gkp878.   DOI
28 Y. Guo, L. Yu, Z. Wen, and M. Li, 'Using support vector machine combined with auto covariance to predict protein-protein interactions from protein sequences', Nucleic Acids Res, vol. 36, no. 9, pp. 3025-3030, May 2008, doi: 10.1093/nar/gkn159.   DOI
29 X.-Y. Pan, Y.-N. Zhang, and H.-B. Shen, 'Large-Scale Prediction of Human Protein-Protein Interactions from Amino Acid Sequence Based on Latent Topic Features', J. Proteome Res., vol. 9, no. 10, pp. 4992-5001, Oct. 2010, doi: 10.1021/pr100618t.   DOI
30 C. J. Shin, S. Wong, M. J. Davis, and M. A. Ragan, 'Protein-protein interaction as a predictor of subcellular location', BMC Syst Biol, vol. 3, no. 1, p. 28, Feb. 2009, doi: 10.1186/1752-0509-3-28.   DOI
31 A. J. Gonzalez and L. Liao, 'Predicting domain-domain interaction based on domain profiles with feature selection and support vector machines', BMC Bioinformatics, vol. 11, no. 1, p. 537, Oct. 2010, doi: 10.1186/1471-2105-11-537.   DOI
32 S. Martin, D. Roe, and J.-L. Faulon, 'Predicting protein-protein interactions using signature products', Bioinformatics, vol. 21, no. 2, pp. 218-226, Jan. 2005, doi: 10.1093/bioinformatics/bth483.   DOI
33 C. D. Nguyen, K. J. Gardiner, and K. J. Cios, 'Protein annotation from protein interaction networks and Gene Ontology', Journal of Biomedical Informatics, vol. 44, no. 5, pp. 824-829, Oct. 2011, doi: 10.1016/j.jbi.2011.04.010.   DOI