Browse > Article
http://dx.doi.org/10.9708/jksci.2022.27.08.049

Small CNN-RNN Engraft Model Study for Sequence Pattern Extraction in Protein Function Prediction Problems  

Lee, Jeung Min (Bio Big Data Convergence Major, Dept. of Computer and Electronics Convergence Engineering, Sunmoon University)
Lee, Hyun (Division of Computer Science and Engineering, Sunmoon University)
Abstract
In this paper, we designed a new enzyme function prediction model PSCREM based on a study that compared and evaluated CNN and LSTM/GRU models, which are the most widely used deep learning models in the field of predicting functions and structures using protein sequences in 2020, under the same conditions. Sequence evolution information was used to preserve detailed patterns which would miss in CNN convolution, and the relationship information between amino acids with functional significance was extracted through overlapping RNNs. It was referenced to feature map production. The RNN family of algorithms used in small CNN-RNN models are LSTM algorithms and GRU algorithms, which are usually stacked two to three times over 100 units, but in this paper, small RNNs consisting of 10 and 20 units are overlapped. The model used the PSSM profile, which is transformed from protein sequence data. The experiment proved 86.4% the performance for the problem of predicting the main classes of enzyme number, and it was confirmed that the performance was 84.4% accurate up to the sub-sub classes of enzyme number. Thus, PSCREM better identifies unique patterns related to protein function through overlapped RNN, and Overlapped RNN is proposed as a novel methodology for protein function and structure prediction extraction.
Keywords
PSSM; Deep learning; Protein Function Prediction; Feature Engraft Model; Overlapped RNN;
Citations & Related Records
Times Cited By KSCI : 3  (Citation Analysis)
연도 인용수 순위
1 Saigo, Hiroto et al. "Reaction graph kernels predict EC numbers of unknown enzymatic reactions in plant secondary metabolism.", BMC Bioinformatics, 11 Suppl 1(Suppl 1), S31, Jan, 2010, doi: 10.1186/1471-2105-11-S1-S31.   DOI
2 A. G. McDonald and K. F. Tipton, "Enzyme nomenclature and classification: the state of the art.", FEBS J, Nov, 2021, doi.org/10.1111/febs.16274   DOI
3 S. Kim, "Basic for Protein Structure Prediction: BLAST and Profile", Biophysical Society Newsletter, vol. 11, no. 1, October 2005.
4 Gao, Ruibo et al. "Prediction of Enzyme Function Based on Three Parallel Deep CNN and Amino Acid Mutation." International journal of molecular sciences, vol. 20(11), 2845, Jun, 2019, doi:10.3390/ijms20112845   DOI
5 Y. Kim, "Convolutional Neural Networks for Sentence Classification", In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1746-1751, Oct, 2014, 10.3115/v1/D14-1181   DOI
6 J. Lee, H. Lee, "Comparison of Deep Learning Models Using Protein Sequence Data", KIPS Transactions on Software and Data Engineering, Vol. 11, No. 6, pp. 245-254, Jun, 2022, https://doi.org/10.3745/KTSDE.2022.11.6.245   DOI
7 A. A. Schaffer 1, L. Aravind, T. L. Madden, " Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements.", Nucleic Acids Res, vol. 29(14), 2994-3005, Jul, 2001, doi: 10.1093/nar/29.1.2994.   DOI
8 Mousavian Z, Khakabimamaghani S, Kavousi K, Masoudi-Nejad A., "Drug-target interaction prediction from PSSM based evolutionary information.", Journal of pharmacological and toxicological methods, vol. 78, 42-51, March-April, 2016, doi:10.1016/j.vascn.2015.11.002   DOI
9 J. Wang, B. Yang, J. Revote, A. Leier, T. T Marquez-Lago, G. Webb, J. Song, K. Chou, T. Lithgow, "POSSUM: a bioinformatics toolkit for generating numerical sequence feature descriptors based on PSSM profiles", Bioinformatics, Volume 33, Issue 17, 01 September 2017, Pages 2756-2758, https://doi.org/10.1093/bioinformatics/btx302   DOI
10 Y. Guo, J. Wu, H. Ma, S. Wang, and J. Huang, "EPTool: A New Enhancing PSSM Tool for Protein Secondary Structure Prediction", Journal of computational biology : a journal of computational molecular cell biology, vol. 28, 362-364, Apr, 2021, doi:10.1089/cmb.2020.0417   DOI
11 A. Amidi, S. Amidi and D. Vlachakis et al, "EnzyNet: enzyme classification using 3D convolutional neural networks on spatial representation.", PeerJ, vol. 6, e4750, May, 2018, doi:10.7717/peerj.4750   DOI
12 A. Dalkiran, A. S. Rifaioglu and M. J. Martin et al, "ECPred: a tool for the prediction of the enzymatic functions of protein sequences based on the EC nomenclature.", BMC bioinformatics, vol. 19, 334, Sep, 2018, https://doi.org/10.1186/s12859-018-2368-y   DOI
13 Y. Liang, S. Liu, S. Zhang, "Prediction of Protein Structural Classes for Low-Similarity Sequences Based on Consensus Sequence and Segmented PSSM", Computational and Mathematical Methods in Medicine, vol. 2015, 9 pages, Dec, 2015. https://doi.org/10.1155/2015/370756   DOI
14 N. Q. K. Le and V. N. Nguyen. "SNARE-CNN: a 2D convolutional neural network architecture to identify SNARE proteins from high-throughput sequencing data." PeerJ. Computer science, vol. 5, e177, Feb, 2019, doi:10.7717/peerj-cs.177   DOI
15 Liu Y, Gong W, Yang Z, Li C., "SNB-PSSM: A spatial neighbor-based PSSM used for protein-RNA binding site prediction.", J Mol Recognit, vol.34, e2887, June, 2021, https://doi.org/10.1002/jmr.2887   DOI
16 A. L. Rio, M. Martin, A. Perera-Lluna and R. Saidi , "Effect of sequence padding on the performance of deep learning models in archaeal protein functional prediction.", Scientific Reports, 10(1), 14634, Sep, 2020, https://doi.org/10.1038/s41598-020-71450-8   DOI
17 X. Xiao, L. Duan and G. Xue et al, "MF-EFP: Predicting Multi-Functional Enzymes Function Using Improved Hybrid Multi-Label Classifier", in IEEE Access, vol. 8, pp. 50276-50284, Mar, 2020, 10.1109/ACCESS.2020.2979888   DOI
18 N. Strodthoff, P. Wagner, M. Wenzel and W. Samek, "UDSMProt: universal deep sequence models for protein classification", Bioinformatics, Vol 36(8), 2401-2409, Apr, 2020, https://doi.org/10.1093/bioinformatics/btaa003   DOI
19 Suzuki H (2015). "Chapter 7: Active Site Structure". How Enzymes Work: From Structure to Function. Boca Raton, FL: CRC Press. pp. 117-140. ISBN 978-981-4463-92-8.
20 D. M. Debra, "Enzyme function discovery.", Structure, vol. 16(11), 1599-600, NOV, 2008, doi:10.1016/j.str.2008.10.001   DOI
21 J. Y. Ryu, H. U. Kim, S. Y. Lee, "Deep learning enables high-quality and high-throughput prediction of enzyme commission numbers", Proceedings of the National Academy of Sciences of the United States of America, 116 (28), 13996-14001, June, 2019, https://doi.org/10.1073/pnas.1821905116   DOI