DOI QR코드

DOI QR Code

OECD TG데이터를 이용한 그래프 기반 딥러닝 모델 분자 특성 예측

Toxicity prediction of chemicals using OECD test guideline data with graph-based deep learning models

  • 황대환 (중앙대학교 응용통계학과) ;
  • 임창원 (중앙대학교 응용통계학과)
  • Daehwan Hwang (Department of Applied Statistics, Chung-Ang University) ;
  • Changwon Lim (Department of Applied Statistics, Chung-Ang University)
  • 투고 : 2024.01.17
  • 심사 : 2024.01.27
  • 발행 : 2024.06.30

초록

본 연구에서는 OECD test guideline 데이터를 이용하여 graph기반 딥러닝 모델들의 성능을 비교하고자 한다. OECD TG는 화학물질들이 인체와 환경에 미칠 잠재적 영향에 대해 시험하는 방법이며, 많은 실험이 동물실험을 통해 독성을 확인한다. 동물실험은 많은 시간과 비용이 들며, 윤리적 이슈가 있어 대안을 찾거나 최소화하는 방법들이 연구되고 있다. 딥러닝은 화학물질을 활용하는 다양한 분야에서 사용되고 있으며, 독성예측 분야에도 사용되고 있으며, 특히 graph 기반 모델에 대한 연구가 활발하다. 우리의 목표는 OECD TG 데이터에 대한 graph기반 딥러닝 모델들의 성능을 비교하여 가장 성능이 좋은 모델을 찾는 것이다. 우리는 OECD에서 운영하는 웹사이트 eChemportal.org에서 OECD TG를 따른 결과를 수집하였으며, 전처리 과정을 통해 학습이 불가능하거나 부적절한 화학물질은 제거하였다. 수집된 OECD TG데이터와 화학물질 특성 예측 성능의 벤치마크 데이터셋인 MoleculeNet 데이터를 활용하여 5개의 graph기반 모델들의 독성 예측 성능을 비교하였다.

In this paper, we compare the performance of graph-based deep learning models using OECD test guideline (TG) data. OECD TG are a unique tool for assessing the potential effects of chemicals on health and environment. but many guidelines include animal testing. Animal testing is time-consuming and expensive, and has ethical issues, so methods to find or minimize alternatives are being studied. Deep learning is used in various fields using chemicals including toxicity prediciton, and research on graph-based models is particularly active. Our goal is to compare the performance of graph-based deep learning models on OECD TG data to find the best performance model on there. We collected the results of OECD TG from the website eChemportal.org operated by the OECD, and chemicals that were impossible or inappropriate to learn were removed through pre-processing. The toxicity prediction performance of five graph-based models was compared using the collected OECD TG data and MoleculeNet data, a benchmark dataset for predicting chemical properties.

키워드

과제정보

이 논문은 2021년도 중앙대학교 연구장학기금 지원에 의한 것임

참고문헌

  1. Alves VM, Golbraikh A, Capuzzi SJ et al. (2018). Multi-descriptor read across (MuDRA): A simple and transparent approach for developing accurate quantitative structure-activity relationship models, Journal of Chemical Information and Modeling, 58, 1214-1223.
  2. Bae SY, Lee J, Jeong J, Lim C, and Choi J (2021). Effective data-balancing methods for class-imbalanced genotoxicity datasets using machine learning algorithms and molecular fingerprints, Computational Toxicology, 20, 100178.
  3. Bahdanau D, Cho K, and Bengio Y (2014). Neural machine translation by jointly learning to align and translate, Available from: arXiv preprint arXiv:1409.0473
  4. Bemis GW and Murcko MA (1999). Properties of known drugs. 2. side chains, Journal of Medicinal Chemistry, 42, 5095-5099.
  5. Coley CW, Barzilay R, Green WH, Jaakkola TS, and Jensen KF (2017). Convolutional embedding of attributed molecular graphs for physical property prediction, Journal of Chemical Information and Modeling, 57, 1757-1772.
  6. Durant JL, Leland BA, Henry DR, and Nourse JG (2002). Reoptimization of MDL keys for use in drug discovery, Journal of Chemical Information and Computer Sciences, 42, 1273-1280.
  7. Duvenaud DK, Maclaurin D, Iparraguirre J, Bombarell R, Hirzel T, Aspuru-Guzik A, and Adams RP (2015). Convolutional networks on graphs for learning molecular fingerprints, Advances in Neural Information Processing Systems, 28, 2224-2232.
  8. Gilmer J, Schoenholz SS, Riley PF, Vinyals O, and Dahl GE (2017). Neural message passing for quantum chemistry, International Conference on Machine Learning, 70, 1263-1272. PMLR.
  9. Hamilton W, Ying Z, and Leskovec J (2017). Inductive representation learning on large graphs, Advances in Neural Information Processing Systems, 30, 1024-1034.
  10. Kipf TN and Welling M (2016). Semi-supervised classification with graph convolutional networks, Available from: arXiv preprint arXiv:1609.02907
  11. OECD (1994). OECD Guidelines for the Testing of Chemicals. OECD, Available from: https://www.oecd-ilibrary.org/environment/oecd-guidelines-for-the-testing-of-chemicals_72d77764-en
  12. OECD (2002). Test No. 423: Acute Oral toxicity - Acute Toxic Class Method, OECD Guidelines for the Testing of Chemicals, Section 4, OECD Publishing, Paris, Available from: https://doi.org/10.1787/9789264071001-en
  13. OECD (2016a). Test No. 422: Combined Repeated Dose Toxicity Study with the Reproduction/Developmental Toxicity Screening Test, OECD Guidelines for the Testing of Chemicals, Section 4, OECD Publishing, Paris, https://doi.org/10.1787/9789264264403-en
  14. OECD (2016b). Test No. 473: In Vitro Mammalian Chromosomal Aberration Test, OECD Guidelines for the Testing of Chemicals, Section 4, OECD Publishing, Paris, Available from: https://doi.org/10.1787/9789264264649-en
  15. OECD (2016c). Test No. 474: Mammalian Erythrocyte Micronucleus Test, OECD Guidelines for the Testing of Chemicals, Section 4, OECD Publishing, Paris, Available from: https://doi.org/10.1787/9789264264762-en
  16. OECD (2018). Test No. 414: Prenatal Developmental Toxicity Study, OECD Guidelines for the Testing of Chemicals, Section 4, OECD Publishing, Paris, Available from: https://doi.org/10.1787/9789264070820-en
  17. OECD (2020). Test No. 471: Bacterial Reverse Mutation Test, OECD Guidelines for the Testing of Chemicals, Section 4, OECD Publishing, Paris, Available from: https://doi.org/10.1787/9789264071247-en
  18. Landrum G (2012). Fingerprints in the RDKit, Available from: https://www.rdkit.org/UGM/2012/Landrum_RDKit_UGM.Fingerprints.Final.pptx.pdf
  19. Landrum G (2019). Rdkit documentation, Available from: https://buildmedia.readthedocs.org/media/pdf/rdkit/latest/rdkit.pdf
  20. Luechtefeld T, Marsh D, Rowlands C, and Hartung T (2018). Machine learning of toxicological big data enables read-across structure activity relationships (RASAR) outperforming animal test reproducibility, Toxicological Sciences, 165, 198-212.
  21. Mayr A, Klambauer G, Unterthiner T, and Hochreiter S (2016). DeepTox: Toxicity prediction using deep learning, Frontiers in Environmental Science, 3, 80.
  22. National Research Council (2007). Toxicity Testing in the 21st Century: A Vision and a Strategy, National Academies Press, Washington D.C.
  23. Rogers D and Hahn M (2010). Extended-connectivity fingerprints, Journal of Chemical Information and Modeling, 50, 742-754.
  24. Rong Y, Bian Y, Xu T, Xie W, Wei Y, Huang W, and Huang J (2020). Self-supervised graph transformer on large-scale molecular data, Advances in Neural Information Processing Systems, 33, 12559-12571.
  25. Scarselli F, Gori M, Tsoi AC, Hagenbuchner M, and Monfardini G (2008). The graph neural network model, IEEE Transactions on Neural Networks, 20, 61-80.
  26. Schutt K, Kindermans PJ, Sauceda Felix HE, Chmiela S, Tkatchenko A, and Muller KR (2017). Schnet: Acontinuous-filter convolutional neural network for modeling quantum interactions, Advances in Neural Information Processing Systems, 30.
  27. Silva AC, Borba JV, Alves VM et al. (2021). Novel computational models offer alternatives to animal testing for assessing eye irritation and corrosion potential of chemicals, Artificial Intelligence in the Life Sciences, 1, 100028.
  28. Vaswani A, Shazeer N, Parmar N et al. (2017). Attention is all you need, Advances in Neural Information Processing Systems, 30, 5998-6008.
  29. Wu Z, Ramsundar B, Feinberg EN, Gomes J, Geniesse C, Pappu AS, Leswingd K, and Pande V (2018). MoleculeNet: A benchmark for molecular machine learning, Chemical Science, 9, 513-530.
  30. Xu K, Hu W, Leskovec J, and Jegelka S (2018). How powerful are graph neural networks?, Available from: arXiv preprint arXiv:1810.00826
  31. Zhang Z, Liu Q, Wang H, Lu C, and Lee CK (2021). Motif-based graph self-supervised learning for molecular property prediction, Advances in Neural Information Processing Systems, 34, 15870-15882.