DOI QR코드

DOI QR Code

Similarity Model Analysis and Implementation for Enzyme Reaction Prediction

효소 반응 예측을 위한 유사도 모델 분석 및 구현

  • 오주성 (전남대학교 생명과학기술학부) ;
  • 나도균 (중앙대학교 융합공학부) ;
  • 박춘구 (전남대학교 생명과학기술학부) ;
  • 정희택 (전남대학교 멀티미디어전공)
  • Received : 2018.03.14
  • Accepted : 2018.06.15
  • Published : 2018.06.30

Abstract

With the beginning of the new era of bigdata, information extraction or prediction are an important research area. Here, we present the acquisition of semi-automatically curated large-scale biological database and the prediction of enzyme reaction annotation for analyzing the pharmacological activities of drugs. Because the xenobiotic metabolism of pharmaceutical drugs by cellular enzymes is an important aspect of pharmacology and medicine. In this study, we apply and analyze similarity models to predict bimolecular reactions between human enzymes and their corresponding substrates. Thirteen models select to reflect the characteristics of each cluster in the similarity model. These models compare based on sensitivity and AUC. Among the evaluation models, the Simpson coefficient model showed the best performance in predicting the reactivity between the enzymes. The whole similarity model implement as a web service. The proposed model can respond dynamically to the addition of reaction information, which will contribute to the shortening of new drug development time and cost reduction.

빅데이터에 대한 관심이 증가하면서 데이터로부터 의미 있는 정보의 추출 및 예측은 중요한 연구분야가 되고 있다. 본 연구에서는 신약개발과정에서 필요한 후보약물의 약리적인 활성을 분석하기 위한 데이터를 획득하고 이를 기반으로 의미 있는 예측 분석을 하고자 한다. 신약개발과정에서 대사반응 된 신약후보물질의 약리적인 활성 연구는 신약개발 성공률을 높이기 위해 필요한 단계이다. 본 연구에서, 약용 후보물질의 체내 효소 반응 유무를 예측하기 위해, 유사도 모델들을 적용 분석하였다. 유사도 모델의 군집별 특성을 반영하여 13개의 모델을 선택하여 효소 반응 예측을 수행하였다. 이들 모델들을 민감도와 AUC를 기반으로 비교 평가하였다. 평가 모델들 중, 효소 사이의 반응성을 예측하는데 있어서 Simpson coefficient 모델이 가장 좋은 성능을 보였다. 분석된 유사도 모델 전체를 웹 서비스로 구축하였다. 제안된 모델은 반응정보의 추가에 동적으로 대응 할 수 있으며 신약개발시간 단축 및 비용 절감에 기여할 것으로 여겨진다.

Keywords

References

  1. A. Tarca, V. Jarey, X. Chen, R. Romero, and S. Draghici, "Machine Learning and Its Applications to Biology," J. of Public Library of Science(PLOS) Computational Biology, vol. 3, issue 6, 2007, pp. 953-963.
  2. K. Park, D. Kim, S. Ha, and D. Lee, "Predicting pharmacodynamic drug-drug interactions through signaling propagation interference on protein-protein interaction networks," J. of Public Library of Science(PLOS) ONE, vol. 10, no. 10, 2015, pp. 1-13.
  3. H. Ceong and C. Park, "Enzyme Metabolite Analysis Using Data Mining," J. of the Korea Institute of Electronic Communication Sciences, vol. 11, no. 10, 2016, pp. 969-982. https://doi.org/10.13067/JKIECS.2016.11.10.969
  4. G. Jim and H. Lee, "The Developement of Liver cancer Vital Sign Information Prediction System using Aptamer Protein Biochip," J. of the Korea Institute of Electronic Communication Sciences, vol. 6, no. 6, 2011, pp. 965-971.
  5. S. Yoon and G. Kim, "Personal Biometric Identification based on ECG Features," J. of the Korea Institute of Electronic Communication Sciences, vol. 10, no. 4, 2015, pp. 521-526. https://doi.org/10.13067/JKIECS.2015.10.4.521
  6. Y. Kim, W. Kim and M. Jo, "Learning System for Big Data Analysis based on the Raspberry Pi Board," J. of the Korea Institute of Electronic Communication Sciences, vol. 11, no. 4, 2016, pp. 433-439. https://doi.org/10.13067/JKIECS.2016.11.4.433
  7. D. Wishart, T. Jewison, A. Guo, M. Wilson, C. Knox, Y. Liu, and S. Bouatra, "HMDB 3.0 - The Human Metabolome Database in 2013," Nucleic Acids Research, vol. 41, issue D1, 2013, pp. D801-D807.
  8. S. Placzek, I. Schomburg, A. Chang, L. Jeske, M. Ulbrich, J. Tillack, and D. Schomburg, "BRENDA in 2017: new perspectives and new tools in BRENDA," Nucleic Acids Research, vol. 45, issue D1, 2017, pp. D380-D388. https://doi.org/10.1093/nar/gkw952
  9. V. Monev, "Introduction to Similarity Searching in Chemistry," Comunication in Mathematical and in Computer Chemistry, vol. 51, no. 51, 2004, pp. 7-38.
  10. D. Ellis, J. F. Hines, and P. Willett, "Measuring the degree of similarity between objects in text retrieval systems," Perspectives in Information Management, vol. 3, no. 2, 1993, pp. 128-149.
  11. S. Cha, "Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions," Int. J. of Mathematical Models and Methods in Applied Sciences, vol. 1, issue. 4, 2007, pp. 300-307.
  12. J. Holliday, C. Hu, and P. Willett, "Grouping of Coefficients for the Calculation of Inter-Molecular Similarity and Dissimilarity using 2D Fragment Bit-Strings," Combinatorial Chemistry & High Throughput Screening, vol. 5, issue 2, 2002, pp. 155-166. https://doi.org/10.2174/1386207024607338
  13. C. Yap, "PaDEL-descriptor: An open source software to calculate molecular descriptors and fingerprints," J. of Computational Chemistry, vol. 32, issue 7, May 2011, pp. 1466-1474. https://doi.org/10.1002/jcc.21707
  14. A. Bradley, "The use of the area under the ROC curve in the evaluation of machine learning algorithms," Pattern Recognition Society, vol. 30, no. 7, 1997, pp. 1145-1159. https://doi.org/10.1016/S0031-3203(96)00142-2