DOI QR코드

DOI QR Code

Pairwise Neural Networks for Predicting Compound-Protein Interaction

약물-표적 단백질 연관관계 예측모델을 위한 쌍 기반 뉴럴네트워크

  • 이문환 (서울대학교 의생명지식공학연구실) ;
  • 김응희 (서울대학교 의생명지식공학연구실) ;
  • 김홍기 (서울대학교 의생명지식공학연구실)
  • Received : 2017.09.22
  • Accepted : 2017.10.24
  • Published : 2017.12.30

Abstract

Predicting compound-protein interactions in-silico is significant for the drug discovery. In this paper, we propose an scalable machine learning model to predict compound-protein interaction. The key idea of this scalable machine learning model is the architecture of pairwise neural network model and feature embedding method from the raw data, especially for protein. This method automatically extracts the features without additional knowledge of compound and protein. Also, the pairwise architecture elevate the expressiveness and compact dimension of feature by preventing biased learning from occurring due to the dimension and type of features. Through the 5-fold cross validation results on large scale database show that pairwise neural network improves the performance of predicting compound-protein interaction compared to previous prediction models.

In-silico 기반의 약물-표적 단백질 연관관계 예측은 신약 탐색 단계에서 매우 중요하다. 그러나 기존의 예측모델은 입력 값이 고정적이며 표적 단백질의 특질 값이 가공된 데이터로 한정됨으로써 예측 모델의 확장성과 유연성이 부족하다. 본 논문에서는 약물-표적 단백질 연관관계를 예측하는 확장 가능한 형태의 머신러닝 모델을 소개한다. 확장 가능한 머신러닝 모델의 핵심 아이디어는 쌍기반의 뉴럴 네트워크로써, 약물과 단백질의 미가공 데이터를 사용하여 특질을 추출하고 특질 값을 각각의 뉴럴 네트워크 레이어에 입력한다. 이 방법은 추가적인 지식없이 자동적으로 약물과 단백질의 특질을 추출한다. 또한 쌍기반 레이어는 특질 값을 풍부한 저차원의 벡터로 향상 시킴으로써 입력 값의 차이로 인한 편향 학습을 방지한다. PubChem BioAssay(PCBA) 데이터 셋에 기반한 5-폴드 교차 검증법을 통하여 제안한 모델의 성능을 평가했으며, 이전의 모델보다 우월한 성능을 보였다.

Keywords

References

  1. M. Hay, D. W. Thomas, J. L. Craighead, C. Economides, and J. Rosenthal. (2014). "Clinicaldevelopment success rates for investigational drugs," Nature Biotechnology, 32(1), pp. 40-51. https://doi.org/10.1038/nbt.2786
  2. Michael J Keiser, Vincent Setola, John J Irwin, Christian Laggner, Atheir I Abbas, Sandra J Hufeisen, Niels H Jensen, Michael B Kuijer, Roberto C Matos, Thuy B Tran, et al. Predicting new molecular targets for known drugs. Nature, 462(7270):175-181, 2009. https://doi.org/10.1038/nature08506
  3. Eugen Lounkine, Michael J Keiser, Steven Whitebread, Dmitri Mikhailov, Jacques Hamon, Jeremy L Jenkins, Paul Lavan, Eckhard Weber, Allison K Doak, Serge Cote, et al. (2012). "Large-scale prediction and testing of drug activity on side-effect targets." Nature, 486(7403): 361-367. https://doi.org/10.1038/nature11159
  4. Kaggle Merck Molecular Activity Challenge, https://www.kaggle.com/c/MerckActivity
  5. No Free Hunch, Deep Learning How I Did It: Merck 1st place interview, http://blog.kaggle.com/2012/11/01/deep-learning-how-i-did-it-merck-1st-place-interview/
  6. No Free Hunch, Merck Competition Results -Deep NN and GPUs come out to play, http://blog.kaggle.com/2012/10/31/merck-competition-results-deep-nn-and-gpus-come-out-to-play/
  7. Ma J, Sheridan RP, Liaw A, Dahl GE, Svetnik V. (2015). "Deep Neural Nets as a Method for Quantitative Structure-Activity Relationships." Journal of Chemical Information and Modeling 55, 263-274. https://doi.org/10.1021/ci500747n
  8. J. L. Jenkins, A. Bender, and J. W. Davies. (2007). "In silico target fishing: Predicting biological targets from chemical structure," Drug Discovery Today: Technologies, vol. 3, no. 4, pp. 413-421.
  9. D. B. Kitchen, H. Decornez, J. R. Furr, and J. Bajorath. (2004). "Docking and scoring in virtual screening for drug discovery: methods and applications," Nature Reviews Drug discovery, 3(11), pp. 935-949. https://doi.org/10.1038/nrd1549
  10. F. Nigsch, A. Bender, J. L. Jenkins, and J. B. O. Mitchell. (2008). "Ligand-target prediction using winnow and naive bayesian algorithms and the implications of overall performance statistics," Journal of Chemical Information and Modeling, 48(12), pp. 2313-2325. https://doi.org/10.1021/ci800079x
  11. H. Y. Mussa, J. B. O. Mitchell, and R. C. Glen. (2013). "Full "Laplacianised" posterior naive Bayesian algorithm," Journal of Cheminformatics, vol. 5, pp. 37+, Aug. https://doi.org/10.1186/1758-2946-5-37
  12. R. Lowe, H. Y. Mussa, F. Nigsch, R. C. Glen, and J. B. Mitchell (2012). "Predicting the mechanism of phospholipidosis," Journal of Cheminformatics, 4(1), p. 2. https://doi.org/10.1186/1758-2946-4-2
  13. R. Lowe, H. Y. Mussa, J. B. O. Mitchell, and R. C. Glen. (2011). "Classifying molecules using a sparse probabilistic kernel binary classifier," Journal of Chemical Information and Modeling, 51(7), pp. 1539-1544. https://doi.org/10.1021/ci200128w
  14. Cheng T, Li Q, Zhou Z, Wang Y, Bryant SH. (2012). "Structure-Based Virtual Screening for Drug Discovery: a Problem-Centric Review." The AAPS Journal 14, 133-141. https://doi.org/10.1208/s12248-012-9322-0
  15. Lengauer T, Rarey M. (Jun 1996). “Computational methods for biomolecular docking”. Current Opinion in Structural Biology, 6(3), 402-6. https://doi.org/10.1016/S0959-440X(96)80061-3
  16. Pereira JC, Caffarena ER, dos Santos CN. Boosting. (2016). "Docking-Based Virtual Screening with Deep Learning." Journal of Chemical Information and Modeling 56, 2495-2506. https://doi.org/10.1021/acs.jcim.6b00355
  17. Gomes, J., Ramsundar, B., Feinberg, E. N., & Pande, V. S. (2017). "Atomic Convolutional Networks for Predicting Protein-Ligand Binding Affinity." arXiv preprint arXiv:1703.10603.
  18. Wang R, Fang X, Lu Y, Yang C-Y, Wang S. (2005). "The PDBbind Database: Methodologies and Updates." Journal of Medicinal Chemistry 48, 4111-4119. https://doi.org/10.1021/jm048957q
  19. Wallach, Izhar, Michael Dzamba, and Abraham Heifets. (2015). "AtomNet: a deep convolutional neural network for bioactivity prediction in structure-based drug discovery." arXiv preprint arXiv:1510.02855
  20. D. Rogers and M. Hahn. (May 2010). "Extended-connectivity fingerprints.," Journal of Chemical Information and Modeling, 50, pp. 742-754. https://doi.org/10.1021/ci100050t
  21. ChemAxon documents, https://www.chemaxon.com/jchem/doc/user/ECFP_files/ecfp_generation.png
  22. Morgan, H. L. (1965). "The Generation of a Unique Machine Description for Chemical Structures -A Technique Developed at Chemical Abstracts Service." J. Chem. Doc. 5: 107-112. https://doi.org/10.1021/c160017a018
  23. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. (2013). "Distributed representations of words and phrases and their compositionality." In Advances in neural information processing systems, pages 3111-3119.
  24. Vinod Nair and Geoffrey E Hinton. (2010). "Rectified linear units improve restricted boltzmann machines." In Proceedings of the 27th International Conference on Machine Learning (ICML-10), pages 807-814.
  25. Yanli Wang, Evan Bolton, Svetlana Dracheva, Karen Karapetyan, Benjamin A. Shoemaker, Tugba O. Suzek, Jiyao Wang, Jewen Xiao, Jian Zhang, Stephen H. Bryant. (January 2010). "An overview of the PubChem BioAssay resource", Nucleic Acids Research, Volume 38, Issue suppl_1, 1 Pages D255-D266. https://doi.org/10.1093/nar/gkp965
  26. Wu, Z., Ramsundar, B., Feinberg, E. N., Gomes, J., Geniesse, C., Pappu, A. S., ... & Pande, V. (2017). "MoleculeNet: A Benchmark for Molecular Machine Learning." arXiv preprint arXiv:1703.00564.
  27. Anderson E, Veith GD, Weininger D. (1987). "SMILES: A line notation and computerized interpreter for chemical structures." Duluth, MN: U.S. EPA, Environmental Research Laboratory-Duluth. Report No. EPA/600/M-87/021.
  28. RDKit, http://www.rdkit.org/
  29. UniProt Consortium. (2014). "Uniprot: a hub for protein information." Nucleic acids research, page gku989,
  30. Gensim, https://radimrehurek.com/gensim/models/word2vec.html
  31. Keras, http://keras.io