DOI QR코드

DOI QR Code

Convolution Neural Network for Prediction of DNA Length and Number of Species

DNA 길이와 혼합 종 개수 예측을 위한 합성곱 신경망

  • Sunghee Yang (Department of Chemical Engineering, Jeju National University) ;
  • Yeone Kim (Department of Chemical Engineering, Jeju National University) ;
  • Hyomin Lee (Department of Chemical Engineering, Jeju National University)
  • 승희 (제주대학교 화학공학과) ;
  • 김예원 (제주대학교 화학공학과) ;
  • 이효민 (제주대학교 화학공학과)
  • Received : 2024.05.27
  • Accepted : 2024.06.11
  • Published : 2024.08.01

Abstract

Machine learning techniques utilizing neural networks have been employed in various fields such as disease gene discovery and diagnosis, drug development, and prediction of drug-induced liver injury. Disease features can be investigated by molecular information of DNA. In this study, we developed a neural network to predict the length of DNA and the number of DNA species in mixture solution which are representative molecular information of DNA. In order to address the time-consuming limitations of gel electrophoresis as conventional analysis, we analyzed the dynamic data of a microfluidic concentrating device. The dynamic data were reconstructed into a spatiotemporal map, which reduced the computational cost required for training and prediction. We employed a convolutional neural network to enhance the accuracy to analyze the spatiotemporal map. As a result, we successfully performed single DNA length prediction as single-variable regression, simultaneous prediction of multiple DNA lengths as multivariable regression, and prediction of the number of DNA species in mixture as binary classification. Additionally, based on the composition of training data, we proposed a solution to resolve the problem of prediction bias. By utilizing this study, it would be effectively performed that medical diagnosis using optical measurement such as liquid biopsy of cell-free DNA, cancer diagnosis, etc.

기계학습법의 신경망 기술을 이용한 자료분석은 질병 유전자 탐색 및 진단, 신약 개발, 약인성 간 손상 예측 등과 같은 다양한 분야에서 활용되고 있다. 질병 특징 발견을 위한 자료분석은 DNA 정보를 기반으로 이루어질 수 있다. 본 연구에서는 DNA의 분자 정보 중 DNA의 길이와 용액 내 DNA의 길이별 종 개수를 예측하는 신경망을 개발하였다. 겔 전기영동을 통한 기존 방법론의 시간 소요 한계점을 해결하고자, 미세유체역학적 농축 장치의 동역학 자료를 분석 대상으로 하여 실험 분석 과정 중의 시간 소요 문제점을 해결하였다. 동역학 자료를 공간시간 지도로 재구성하여 학습 및 예측에 필요한 계산용량을 낮추었으며, 공간시간 지도에 대한 분석 정확도를 높이기 위해 합성곱 신경망을 활용하였다. 그 결과, 단일 변수 회귀로써의 단일 DNA 길이 예측과 복합 변수 회귀로써의 다종 DNA 길이의 동시 예측 및 이진 분류로써의 DNA 혼합 종 개수 예측을 성공적으로 수행하였다. 추가적으로, 예측 과정 중 발생할 수 있는 예측 편향을 학습 자료 구성 방식을 통한 해결책을 제시하였다. 본 연구를 활용한다면, 광학 측정 자료를 이용하는 액체생검 기반의 세포유리 DNA 분석 및 암 진단 등의 의학 자료 분석을 효과적으로 수행할 수 있을 것이다.

Keywords

Acknowledgement

이 논문은 2024학년도 제주대학교 교원성과지원사업에 의하여 연구되었습니다.

References

  1. Mak, K.-K. and Pichika, M. R., "Artificial Intelligence in Drug Development: Present Status and Future Prospects," Drug Discovery Today, 24(3), 773-780(2019).
  2. Wuethrich, A. and Quirino, J. P., "A Decade of Microchip Electrophoresis for Clinical Diagnostics - A Review of 2008-2017," Analytica Chimica Acta, 1045, 42-66(2019).
  3. Goodfellow, I., Bengio, Y. and Courville, A., Deep Learning, MIT Press(2016).
  4. Pascanu, R., Mikolov, T. and Bengio, Y., "On the Difficulty of Training Recurrent Neural Networks," Proceedings of the 30th International Conference on Machine Learning (2013).
  5. Robertson, R. M., Laib, S. and Smith, D. E., "Diffusion of Isolated DNA Molecules: Dependence on Length and Topology," Proc. Natl. Acad. Sci. U.S.A., 103(19), 7310-7314(2006).
  6. Salieb-Beugelaar, G. B., Dorfman, K. D., van den Berg, A. and Eijkel, J. C. T., "Electrophoretic Separation of DNA in Gels and Nanostructures," Lab Chip, 9(17), 2508-2523(2009).
  7. Gupta, A., Kounovsky-Shafer, K., Ravindran, P. and Schwartz, D., "Optical Mapping and Nanocoding Approaches to Whole-genome Analysis," Microfluidics and Nanofluidics, 20(2016).
  8. Bird, R. B., Stewart, W. E. and Lightfoot, E. N., Transport Phenomena, Wiley(2007).
  9. Ghosh, A. and Bansal, M., "A Glossary of DNA Structures from A to Z," Acta Crystallographica Section D, 59(4), 620-626(2003).
  10. Stellwagen, N. C., Gelfi, C. and Righetti, P. G., "The Free Solution Mobility of DNA," Biopolymers, 42(6), 687-703(1997).
  11. Won, J.-I., "Recent Advances in DNA Sequencing by End-Labeled Free-Solution Electrophoresis (ELFSE)," Biotechnology and Bioprocess Engineering, 11(3), 179-186(2006).
  12. Lee, H., "Analysis of Preconcentration Dynamics inside Dead-end Microchannel," Korean Chem. Eng. Res., 61(1), 155-161(2023).
  13. Dydek, E. V. and Bazant, M. Z., "Nonlinear Dynamics of Ion Concentration Polarization in Porous Media: The Leaky Membrane Model," AIChE Journal, 59(9), 3539-3555(2013).
  14. Yap, K. K., Fukuda, K., Vail, J. R., Wong, J. and Masen, M. A., "Spatiotemporal Mapping for in-situ and Real-time Tribological Analysis in Polymer-metal Contacts," Tribology International, 171, 107533(2022).
  15. Kim, S. J., Song, Y.-A. and Han, J., "Nanofluidic Concentration Devices for Biomolecules Utilizing Ion Concentration Polarization: Theory, Fabrication, and Applications," Chem. Sov. Rev., 39(3), 912-922(2010).
  16. Choi, J., Huh, K., Moon, D. J., Lee, H., Son, S. Y., Kim, K., Kim, H. C., Chae, J.-H., Sung, G. Y., Kim, H.-Y., Hong, J. W. and Kim, S. J., "Selective Preconcentration and Online Collection of Charged Molecules Using Ion Concentration Polarization," RSC Adv., 5(81), 66178-66184(2015).
  17. Lee, H., Choi, J., Jeong, E., Baek, S., Kim, H. C., Chae, J.-H., Koh, Y., Seo, S. W., Kim, J.-S. and Kim, S. J., "dCas9-mediated Nanoelectrokinetic Direct Detection of Target Gene for Liquid Biopsy," Nano Lett., 18(12), 7642-7650(2018).
  18. Brunton, S. L., Noack, B. R. and Koumoutsakos, P., "Machine Learning for Fluid Mechanics," Annu. Rev. Fluid Mech., 52(1), 477-508(2020).
  19. Mendez, M. A., Ianiro, A., Noack, B. R. and Brunton, S. L., Data-Driven Fluid Mechanics: Combining First Principles and Machine Learning, Cambridge University Press, Cambridge (2023).
  20. Alcaide, M., Cheung, M., Hillman, J., Rassekh, S. R., Deyell, R. J., Batist, G., Karsan, A., Wyatt, A. W., Johnson, N., Scott, D. W. and Morin, R. D., "Evaluating the Quantity, Quality and Size Distribution of Cell-free DNA by Multiplex Droplet Digital PCR," Sci. Rep., 10(1), 12564(2020).