Browse > Article
http://dx.doi.org/10.3745/KTSDE.2022.11.6.245

Comparison of Deep Learning Models Using Protein Sequence Data  

Lee, Jeung Min (선문대학교 컴퓨터융합전자공학과 바이오빅데이터융합)
Lee, Hyun (선문대학교 컴퓨터공학부)
Publication Information
KIPS Transactions on Software and Data Engineering / v.11, no.6, 2022 , pp. 245-254 More about this Journal
Abstract
Proteins are the basic unit of all life activities, and understanding them is essential for studying life phenomena. Since the emergence of the machine learning methodology using artificial neural networks, many researchers have tried to predict the function of proteins using only protein sequences. Many combinations of deep learning models have been reported to academia, but the methods are different and there is no formal methodology, and they are tailored to different data, so there has never been a direct comparative analysis of which algorithms are more suitable for handling protein data. In this paper, the single model performance of each algorithm was compared and evaluated based on accuracy and speed by applying the same data to CNN, LSTM, and GRU models, which are the most frequently used representative algorithms in the convergence research field of predicting protein functions, and the final evaluation scale is presented as Micro Precision, Recall, and F1-score. The combined models CNN-LSTM and CNN-GRU models also were evaluated in the same way. Through this study, it was confirmed that the performance of LSTM as a single model is good in simple classification problems, overlapping CNN was suitable as a single model in complex classification problems, and the CNN-LSTM was relatively better as a combination model.
Keywords
CNN; LSTM; GRU; Combined Model; Protein Sequence;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Y. Bengio, P. Simard, and P. Frasconi, "Learning long-term dependencies with gradient descent is diffcult," IEEE Transactions on Neural Networks, Vol.5, No.2, pp.157-166, 1994.   DOI
2 Y. Kim. "Convolutional Neural Networks for Sentence Classification". In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp.1746-1751, 2014.
3 A. Dalkiran, A. S. Rifaioglu, M. J. Martin, R. Cetin-Atalay, V. Atalay, T. Dogan, "ECPred: A tool for the prediction of the enzymatic functions of protein sequences based on the EC nomenclature," BMC Bioinformatics, Vol.19, No.1, pp.334, 2018.   DOI
4 A. Amidi, S. Amidi, D. Vlachakis, V. Megalooikonomou, N. Paragios, E. I. Zacharaki, "EnzyNet: Enzyme classification using 3D convolutional neural networks on spatial representation," PeerJ, Vol.6, pp.e4750, 2018.   DOI
5 D. Masters and C. Luschi, "Revisiting small batch training for deep neural networks," Graphcore Research, arXiv: 1804.07612, 2018.
6 O. B. Sezer and A. M. Ozbayoglu, "Algorithmic financial trading with deep convolutional neural networks: Time series to image conversion approach," Applied Soft Computing, Vol.70, pp.525-538, 2018.   DOI
7 N. Strodthoff, P. Wagner, M. Wenzel, and W. Samek, "UDSMProt: Universal deep sequence models for protein classification," Bioinformatics, Vol.36, Iss.8, pp.2401-2409, 2020.   DOI
8 R. Semwal, I. Aier, P. Tyagi, and P. K. Varadwaj, "DeEPn: A deep neural network based tool for enzyme functional annotation," Journal of Biomolecular Structure and Dynamics, Vol.39, No.8, pp.2733-2743, 2021.   DOI
9 S. A. Memon, K. A. Khan, and H. Naveed, "HECNet: A hierarchical approach to enzyme function classification using a Siamese Triplet Network," Bioinformatics, Vol.36, No.17, pp.4583-4589, 2020.   DOI
10 C. Mirabello and B. Wallner, "rawMSA: End-to-end deep learning using raw multiple sequence alignments." PloS one, Vol.14, No.8, pp.e0220182, 2019.   DOI
11 E. C. Alley, G Khimulya, S. Biswas, M. AlQuraishi, G. M. Church, "Unified rational protein engineering with sequence-based deep representation learning," Nature Methods, Vol.16, No.12, pp.1315-1322, 2019.   DOI
12 C. Claudel-Renard, C. Chevalet, T. Farau, and D. Kahn, "Enzyme-specific profiles for genome annotation: PRIAM," Nucleic Acids Research, Vol.31, No.22, pp.6633-6639, 2003.   DOI
13 R. d. O. Almeida and G. T. Valente, "Predicting metabolic pathways of plant enzymes without using sequence similarity: Models from machine learning," The Plant Genome, Vol.13, No.3, pp.e20043, 2020.
14 Z. Tao, B. Dong, Z. Teng, and Y. Zhao, "The classification of enzymes by deep learning," IEEE Access, Vol.8, pp.89802-89811, 2020.   DOI
15 S. Hochreiter, "Untersuchungen zu dynamischen neuronalen Netzen," Diplom thesis, Institut f Informatik, Technische Univ, Munich. 1991.
16 Y. Jiang, D. Wang, and D. Xu, "DeepDom: Predicting protein domain boundary from sequence alone using stacked bidirectional LSTM," Pacific Symposium on Biocomputing: Pacific Symposium on Biocomputing, Vol.24, pp.66-75, 2019.
17 N. Q. K. Le, E. K. Y. Yapp, and H. Yeh, "ET-GRU: Using multi-layer gated recurrent units to identify electron transport proteins," BMC Bioinformatics, Vol.20, No.1, pp.377, 2019.   DOI
18 X. Xiao, L. Duan, G. Xue, G. Chen, P. Wang, W. R. Qiu, "MF-EFP: Predicting multi-functional enzymes function using improved hybrid multi-label classifier," IEEE Access, Vol.8, pp.50276-50284, 2020.   DOI
19 J. Y. Ryu, H. U. Kim, and S. Y. Lee, "Deep learning enables high-quality and high-throughput prediction of enzyme commission numbers," Proceedings of the National Academy of Sciences, Vol.116, No.28, pp.13996-14001, 2019.   DOI
20 S. Min, H. Kim, B. Lee, and S. Yoon, "Protein transfer learning improves identification of heat shock protein families," PloS one, Vol.16, No.5, pp.e0251865, 2021.   DOI
21 Y. Guo, W. Li, B. Wang, H. Liu, and D. Zhou, "DeepACLSTM: Deep asymmetric convolutional long short-term memory neural models for protein secondary structure prediction," BMC Bioinformatics, Vol.20, No.1, pp.341, 2019.   DOI
22 K. Cho et al., "Learning phrase representations using RNN encoder-decoder for statistical machine translation," arXiv preprint arXiv:1406.1078, 2014.
23 K. Bhardwaj, "Convolutional Neural Network(CNN/ConvNet) in stock price movement prediction," arXiv:2106.01920, 2021.