Browse > Article
http://dx.doi.org/10.3837/tiis.2022.03.002

A Protein-Protein Interaction Extraction Approach Based on Large Pre-trained Language Model and Adversarial Training  

Tang, Zhan (College of Information and Electrical Engineering, China Agricultural University)
Guo, Xuchao (College of Information and Electrical Engineering, China Agricultural University)
Bai, Zhao (College of Information and Electrical Engineering, China Agricultural University)
Diao, Lei (College of Information and Electrical Engineering, China Agricultural University)
Lu, Shuhan (School of Information, University of Michigan)
Li, Lin (College of Information and Electrical Engineering, China Agricultural University)
Publication Information
KSII Transactions on Internet and Information Systems (TIIS) / v.16, no.3, 2022 , pp. 771-791 More about this Journal
Abstract
Protein-protein interaction (PPI) extraction from original text is important for revealing the molecular mechanism of biological processes. With the rapid growth of biomedical literature, manually extracting PPI has become more time-consuming and laborious. Therefore, the automatic PPI extraction from the raw literature through natural language processing technology has attracted the attention of the majority of researchers. We propose a PPI extraction model based on the large pre-trained language model and adversarial training. It enhances the learning of semantic and syntactic features using BioBERT pre-trained weights, which are built on large-scale domain corpora, and adversarial perturbations are applied to the embedding layer to improve the robustness of the model. Experimental results showed that the proposed model achieved the highest F1 scores (83.93% and 90.31%) on two corpora with large sample sizes, namely, AIMed and BioInfer, respectively, compared with the previous method. It also achieved comparable performance on three corpora with small sample sizes, namely, HPRD50, IEPA, and LLL.
Keywords
adversarial training; information extraction; natural language processing; pre-trained language model; protein-protein interaction;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 S. Pyysalo, F. Ginter, J. Heimonen, J. BjoRne, J. Boberg, J. Rvinen and T. Salakoski, "BioInfer: a corpus for information extraction in the biomedical domain," BMC Bioinformatics, vol. 8, 2007.
2 C. Nedellec, ''Learning language in logic-genic interaction extraction challenge,'' in Proc. of Learn. Lang. Logic Workshop, pp. 1-7, 2005.
3 S. Pyysalo, A. Airola, J. Heimonen, J. Bjorne, F. Ginter and T. Salakoski, "Comparative analysis of five protein-protein interaction corpora," BMC Bioinformatics, vol. 9, Article no. S6, 2008.
4 C. Quan, L. Hua, X. Sun and W. Bai, "Multichannel Convolutional Neural Network for Biological Relation Extraction," Biomed Res Int, vol. 2016, no. 1850404, 2016.
5 W. A. Baumgartner, Z. Lu, H. L. Johnson, J. G. Caporaso, J. Paquette, A. Lindemann, E. K. White, O. Medvedeva, K. B. Cohen and L. Hunter, "Concept recognition for extracting protein interaction relations from biomedical text," Genome Biology, vol. 9, Article no. S9, 2008.
6 G. Murugesan, S. Abdulkadhar and J. Natarajan, "Distributed smoothed tree kernel for protein-protein interaction extraction from the biomedical literature," PLOS ONE, vol. 12, pp. e0187379, 2017.   DOI
7 Z. H. Zhao, Z. H. Yang, H. F. Lin, J. Wang and S. Gao, "A protein-protein interaction extraction approach based on deep neural network," Int J Data Min Bioinform, vol. 15, pp. 145-164, 2016.   DOI
8 M. Jian, K. M. Lam, J. Dong, et al, "Visual-Patch-Attention-Aware Saliency Detection," IEEE Transactions on Cybernetics, vol. 45(8), pp.1575-1586, 2015.   DOI
9 Y. Peng and Z. lu, "Deep learning for extracting protein-protein interactions from biomedical literature," in Proc. of The BioNLP 2017 workshop, pp. 29-38, 2017.
10 S. P. Choi, "Extraction of protein-protein interactions (PPIs) from the literature by deep convolutional neural networks with various feature embeddings," J Inf Sci, vol. 44, pp. 60-73, 2018.   DOI
11 L. Hua and C. Quan, "A Shortest Dependency Path Based Convolutional Neural Network for Protein-Protein Relation Extraction," Biomed Res Int, vol. 2016, no. 8479587, 2016.
12 D. Kwon, J. H. Yoon, S.-Y. Shin, T.-H. Jang, H.-G. Kim, I. So, J.-H. Jeon and H. H. Park, "A comprehensive manually curated protein-protein interaction database for the Death Domain superfamily," Nucleic Acids Research, vol. 40, pp. D331-D336, 2012.   DOI
13 D. E. Gordon, G. M. Jang, M. Bouhaddou, J. W. Xu, K. Obernier, K. M. White, M. J. O'Meara, V. V. Rezelj, J. F. Z. Guo, D. L. Swaney et al, "A SARS-CoV-2 protein interaction map reveals targets for drug repurposing," Nature, vol. 583, pp. 459-468, 2020.   DOI
14 A. Airola, S. Pyysalo, J. Bjorne, T. Pahikkala, F. Ginter and T. Salakoski, "All-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning," BMC Bioinformatics, vol. 9, 2008, Article no. S2.
15 N. Warikoo, Y.-C. Chang and W.-L. Hsu, "LBERT: Lexically aware Transformer-based Bidirectional Encoder Representation model for learning universal bio-entity relations," Bioinformatics, vol. 37, pp. 404-412, 2021.   DOI
16 J. Devlin, M.-W. Chang, K. Lee and K. Toutanova, "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding," in Proc. of NAACL, Minneapolis, Minnesota, USA, pp. 4171-4186, 2019.
17 H. Yang, J. Yuan, C. Li, G. Zhao, Z. Sun, Q. Yao, B. Bao, A. V. Vasilakos and J. Zhang, "BrainIoT: Brain-Like Productive Services Provisioning with Federated Learning in Industrial IoT," IEEE Internet of Things Journal, vol. 9, pp. 2014-2024, 2022.   DOI
18 H. Zhang, R. C. Guan, F. F. Zhou, Y. C. Liang, Z. H. Zhan, L. Huang and X. Y. Feng, "Deep Residual Convolutional Neural Network for Protein-Protein interaction Extraction," Ieee Access, vol. 7, pp. 89354-89365, 2019.   DOI
19 Y. L. Hsieh, Y. C. Chang, N. W. Chang and W. L. Hsu, "Identifying Protein-protein Interactions in Biomedical Literature using Recurrent Neural Networks with Long Short-Term Memory," in Proc. of The 8th IJCNLP, pp. 240-245, 2017.
20 M. Ahmed, J. Islam, M. R. Samee, and R. E. Mercer, "Identifying Protein-Protein Interaction using Tree LSTM and Structured Attention," in Proc. of IEEE 13th ICSC, 2019.
21 J. Lee, W. Yoon, S. Kim, D. Kim, S. Kim, C. H. So and J. Kang, "BioBERT: a pre-trained biomedical language representation model for biomedical text mining," Bioinformatics, vol. 36, pp. 1234-1240, 2020.   DOI
22 A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser and I. Polosukhin, "Attention is all you need," in Proc. of 31st ICNIPS, Long Beach, California, USA, pp. 6000-6010, 2017.
23 J. Pennington, R. Socher and C. Manning, "Glove: Global Vectors for Word Representation," in Proc. of EMNLP, 2014.
24 M. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee and L. Zettlemoyer, "Deep Contextualized Word Representations," in Proc. of NAACL, New Orleans, Louisiana, USA, pp. 2227-2237, 2019.
25 T. Mikolov, I. Sutskever, K. Chen, G. Corrado and J. Dean, "Distributed Representations of Words and Phrases and their Compositionality," in Proc. of NIPS, 2013.
26 R. Alec, N. Karthik, S. Tim and S. Ilya, "Improving Language Understanding by Generative Pre-Training," in Proc. of NLPIR, 2019.
27 T. Miyato, A. M. Dai and I. Goodfellow, "Adversarial Training Methods for Semi-Supervised Text Classification," in Proc. of ICLR, 2017.
28 M. Jian, J. Wang, H. Yu, et al, "Visual saliency detection by integrating spatial position prior of object with background cues," Expert Systems with Applications, vol. 168(11), pp. 114219, 2020.
29 M. Jian, W. Zhang, H. Yu, et al, "Saliency Detection Based on Directional Patches Extraction and Principal Local Color Contrast," Journal of Visual Communication and Image Representation, vol.57, pp. 1-11, 2018.   DOI
30 Y.-C. Chang, C.-H. Chu, Y.-C. Su, C. C. Chen and W.-L. Hsu, "PIPE: a protein-protein interaction passage extraction module for BioCreative challenge," Database, vol. 2016, no. baw101, 2016.
31 J. Howard and S. Ruder, "Universal Language Model Fine-tuning for Text Classification," in Proc. of ACL, Melbourne, Australia, pp. 328-339, 2018.
32 C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow and R. Fergus, "Intriguing properties of neural networks," in Proc. of ICLR, 2014.
33 J. Goodfellow, J. Shlens and C. Szegedy, "Explaining and Harnessing Adversarial Examples," in Proc. of ICLR, 2015.
34 A. Madry, A. Makelov, L. Schmidt, D. Tsipras and A. Vladu, "Towards Deep Learning Models Resistant to Adversarial Attacks," in Proc. of ICLR, 2018.
35 L. Ba, J. R. Kiros and G. E. Hinton, "Layer Normalization," arxiv, 2016.
36 M. Jian, Qi. Q ,J. Dong, et al, "Integrating QDWD with pattern distinctness and local contrast for underwater saliency detection," Journal of Visual Communication and Image Representation, vol. 53, pp. 31-41, 2018.   DOI
37 R. Bunescu, R. Ge, R. J. Kate, E. M. Marcotte, R. J. Mooney, A. K. Ramani and Y. W. Wong, "Comparative experiments on learning information extractors for proteins and their interactions," Artificial Intelligence in Medicine, vol. 33, pp. 139-155, 2005.   DOI
38 Fundel, R. Kueffner and R. Zimmer, "RelEx - Relation extraction using dependency parse trees," Bioinformatics, vol. 23, pp. 365-371, 2007.   DOI
39 D. B. A'D, J. Da and D. Nb, "Mining Medline: Abstracts, Sentences, Or Phrases?," in Proc. of Pacific Symposium on Biocomputing Pacific Symposium on Biocomputing, vol. 7, pp. 326-337, 2002.
40 T. Yu, R. Jin, X. Han, J. Li and T. Yu, "Review of Pre-training Models for Natural Language Processing," CEA, vol. 56, no. 23, pp. 12-22, 2020.
41 M. Altmann, S. Altmann, P. A. Rodriguez, B. Weller, L. E. Vergara, J. Palme, N. M. de la Rosa, M. Sauer, M. Wenig, J. A. Villaecija-Aguilar et al, "Extensive signal integration by the phytohormone protein network," Nature, vol. 583, pp. 271-276, 2020.   DOI
42 S. Yadav, A. Ekbal, S. Saha, A. Kumar and P. Bhattacharyya, "Feature assisted stacked attentive shortest dependency path based Bi-LSTM model for protein-protein interaction," Knowledge-Based Syst, vol. 166, pp. 18-29, 2019.   DOI
43 K. Yu, P.-Y. Lung, T. Zhao, P. Zhao, Y.-Y. Tseng and J. Zhang, "Automatic extraction of protein-protein interactions using grammatical relationship graph," BMC Medical Informatics and Decision Making, vol. 18, 2018.