DOI QR코드

DOI QR Code

Architectures of Convolutional Neural Networks for the Prediction of Protein Secondary Structures

단백질 이차 구조 예측을 위한 합성곱 신경망의 구조

  • Chi, Sang-Mun (Department of Computer Science, Kyungsung University)
  • Received : 2018.01.28
  • Accepted : 2018.04.16
  • Published : 2018.05.31

Abstract

Deep learning has been actively studied for predicting protein secondary structure based only on the sequence information of the amino acids constituting the protein. In this paper, we compared the performances of the convolutional neural networks of various structures to predict the protein secondary structure. To investigate the optimal depth of the layer of neural network for the prediction of protein secondary structure, the performance according to the number of layers was investigated. We also applied the structure of GoogLeNet and ResNet which constitute building blocks of many image classification methods. These methods extract various features from input data, and smooth the gradient transmission in the learning process even using the deep layer. These architectures of convolutional neural networks were modified to suit the characteristics of protein data to improve performance.

단백질을 구성하는 아미노산의 서열 정보만으로 단백질 이차 구조를 예측하기 위하여 심층 학습이 활발히 연구되고 있다. 본 논문에서는 단백질 이차 구조를 예측하기 위하여 다양한 구조의 합성곱 신경망의 성능을 비교하였다. 단백질 이차 구조의 예측에 적합한 신경망의 층의 깊이를 알아내기 위하여 층의 개수에 따른 성능을 조사하였다. 또한 이미지 분류 분야의 많은 방법들이 기반 하는 GoogLeNet과 ResNet의 구조를 적용하였는데, 이러한 방법은 입력 자료에서 다양한 특성을 추출하거나, 깊은 층을 사용하여도 학습과정에서 그래디언트 전달을 원활하게 한다. 합성곱 신경망의 여러 구조를 단백질 자료의 특성에 적합하게 변경하여 성능을 향상시켰다.

Keywords

References

  1. D. Baker and A. Sali., "Protein structure prediction and structural genomics," Science, vol. 294 no. 5, pp. 93-96, Oct. 2001. https://doi.org/10.1126/science.1065659
  2. H. Lodish, et al., Molecular Cell Biology, 6th ed. New York, NY: W.H. Freeman and Company, 2007
  3. H. W. Buchan, et al., "Scalable web services for the PSIPRED protein analysis workbench," Nucleic Acids Research, vol. 41, W72-W76, Jul. 2013. https://doi.org/10.1093/nar/gks1467
  4. C. N. Magnan and P. Baldi, "SSpro/ACCpro 5: almost perfect prediction of protein secondary structure and relative solvent accessibility using profiles, machine learning and structural similarity," Bioinformatics, vol. 30, no. 18, pp, 2592-2597, Sep. 2014. https://doi.org/10.1093/bioinformatics/btu352
  5. J. Zhou, and O. Troyanskaya, "Deep supervised convolutional generative stochastic network for protein secondary structure prediction," Proceedings of Machine Learning Research, vol. 32, no. 1, pp. 745-753, Jun. 2014.
  6. M. Spencer, J. Eickholt, and J. Cheng, "A deep learning network approach to ab initio protein secondary structure prediction," IEEE/ACM Transactions on Computational Biology Bioinformatics, vol. 12, no. 1, pp. 103-112, Jan/Feb. 2015. https://doi.org/10.1109/TCBB.2014.2343960
  7. S. Wang, et al., "Protein secondary structure prediction using deep convolutional neural fields," Scientific Reports 6, Article number: 18962, Jan. 2016.
  8. Olga Russakovsky, et al., "ImageNet Large Scale Visual Recognition Challenge," International Journal of Computer Vision, vol. 115, no. 3, pp. 211-252, 2015. https://doi.org/10.1007/s11263-015-0816-y
  9. C. Szegedy, et al., "Going deeper with convolution," IEEE Conference on Computer Vision and Pattern Recognition, pp. 1-9, Jun. 2015.
  10. K. He, et al., "Deep residual learning for image recognition," IEEE Conference on Computer Vision and Pattern Recognition, pp. 770-778, Jun. 2016.
  11. K. He, et al., "Identity mapping in deep residual networks," European Conference on Computer Vision, pp. 630-645, Sep. 2016.
  12. G. Wang and R.L. Dunbrack "PISCES: a protein sequence culling server," Bioinformatics, vol. 19, no. 12, pp. 1589-1591, 2003. https://doi.org/10.1093/bioinformatics/btg224
  13. W. Kabsch and C. Sander, "Dictionary of protein secondary structure: pattern recognition of hydrohen-bonded and geometrical features," Biopolymers, vol. 22, no. 12, pp. 2577-2637, Dec. 1983. https://doi.org/10.1002/bip.360221211
  14. S. F. Altschul, et al., "Gapped blast and PSI-BLAST: a new generation of protein database search programs," Nucleic Acids Research, vol. 25, no. 17, pp. 3389-3402, Sep. 1997. https://doi.org/10.1093/nar/25.17.3389
  15. B. E. Suzek, et al., "Uniref: comprehensive and non- reduncant uniprot reference clusters," Bioinformatics, vol. 23, no. 10, pp. 1282-1288, May. 2007. https://doi.org/10.1093/bioinformatics/btm098
  16. G. E. Hinton, et al., "Improving neural networks by preventing co-adaptation of feature detectors," [Online]. arXiv:1207.0580, Jul. 2012.
  17. Theano Development Team. "Theano: A Python framework for fast computation of mathematical expressions," [Online]. arXiv:1605.02688, May. 2016.
  18. S.. Dieleman, et al., "Lasagne: First release," [Internet]. Available: http://dx.doi.org/10.5281/zenodo.27878.
  19. J. Duchi, E. Hazan, and Y. Singer, "Adaptive subgradient methods for online learning and stochastic optimization," Journal of Machine Learning Research, vol. 12, pp. 2121-2159, Jul. 2011.
  20. W. Li, et al., "Regularization of neural networks using dropconnect," Proceedings of the 30th International Conference on Machine Learning, Atlanta, USA, vol. 28, no. 3, pp. 1058-1066, Jun. 2013.
  21. D. Ciresan, U. Meier, and J. Schmidhuber, "Multi-column deep neural networks for image classification," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3642-3649, Washington, DC, USA, Jun. 2012.
  22. Shin-Hye, et al., "A Comparison of Predicting Movie Success between Artificial Neural Network and Decision Tree", Asia-pacific Journal of Multimedia, vol.7, no.4, pp. 593-602, 2017.
  23. S. Chi, "A Performance Comparison of Protein Profiles for the Prediction of Protein Secondary Structures," Journal of the Korea Institute of Information and Communication Engineering, vol. 22, no. 1, pp. 26-32 Jan. 2018. https://doi.org/10.6109/JKIICE.2018.22.1.26