Browse > Article
http://dx.doi.org/10.3345/cep.2021.01438

Knowledge-guided artificial intelligence technologies for decoding complex multiomics interactions in cells  

Lee, Dohoon (Bioinformatics Institute, Seoul National University)
Kim, Sun (Interdisciplinary Program in Bioinformatics, Seoul National University)
Publication Information
Clinical and Experimental Pediatrics / v.65, no.5, 2022 , pp. 239-249 More about this Journal
Abstract
Cells survive and proliferate through complex interactions among diverse molecules across multiomics layers. Conventional experimental approaches for identifying these interactions have built a firm foundation for molecular biology, but their scalability is gradually becoming inadequate compared to the rapid accumulation of multiomics data measured by high-throughput technologies. Therefore, the need for data-driven computational modeling of interactions within cells has been highlighted in recent years. The complexity of multiomics interactions is primarily due to their nonlinearity. That is, their accurate modeling requires intricate conditional dependencies, synergies, or antagonisms between considered genes or proteins, which retard experimental validations. Artificial intelligence (AI) technologies, including deep learning models, are optimal choices for handling complex nonlinear relationships between features that are scalable and produce large amounts of data. Thus, they have great potential for modeling multiomics interactions. Although there exist many AI-driven models for computational biology applications, relatively few explicitly incorporate the prior knowledge within model architectures or training procedures. Such guidance of models by domain knowledge will greatly reduce the amount of data needed to train models and constrain their vast expressive powers to focus on the biologically relevant space. Therefore, it can enhance a model's interpretability, reduce spurious interactions, and prove its validity and utility. Thus, to facilitate further development of knowledge-guided AI technologies for the modeling of multiomics interactions, here we review representative bioinformatics applications of deep learning models for multiomics interactions developed to date by categorizing them by guidance mode.
Keywords
Computational biology; Artificial intelligence; Deep learning; Molecular biology;
Citations & Related Records
연도 인용수 순위
  • Reference
1 LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. In: Proceedings of the IEEE 1998;86:2278-324.   DOI
2 Lipton ZC, Berkowitz J, Elkan C. A critical review of recurrent neural networks for sequence learning. arXiv:1611.05777 [Preprint]. 2016 [cited 2021 Sep 2]. Available from: https://doi.org/10.48550/arXiv.1611.05777.
3 Quang D, Xie X. DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences. Nucleic Acids Res 2016;44:e107.   DOI
4 Avsec Z, Weilert M, Shrikumar A, Krueger S, Alexandari A, Dalal K, et al. Base-resolution models of transcription-factor binding reveal soft motif syntax. Nat Genet 2021;53:354-66.   DOI
5 Shrikumar A, Tian K, Avsec Z, Shcherbina A, Banerjee A, Sharmin M, et al. Technical note on transcription factor motif discovery from importance scores (TF-MoDISco) version 0.5. 6.5. arXiv:1811.00416 [Preprint]. 2018 [cited 2021 Sep 5]. Available from: https://doi.org/10.48550/arXiv.1811.00416.
6 Kelley DR, Reshef YA, Bileschi M, Belanger D, McLean CY, Snoek J. Sequential regulatory activity prediction across chromosomes with convolutional neural networks. Genome Res 2018;28:739-50.   DOI
7 Lanchantin J, Qi Y. Graph convolutional networks for epigenetic state prediction using both sequence and 3D genome data. Bioinformatics 2020;36(Suppl_2):i659-67.   DOI
8 Ma T, Zhang A. Multi-view factorization autoencoder with network constraints for multi-omic integrative analysis. In: 2018 IEEE International Conference on Bioinformatics and Biomedicine; 2018 Dec 3-6. IEEE BIBM 2018;702-7.
9 Gene Ontology Consortium. The gene ontology resource: 20 years and still GOing strong. Nucleic Acids Res 2019;47:D330-8.   DOI
10 Kolmykov S, Yevshin I, Kulyashov M, Sharipov R, Kondrakhin Y, Makeev VJ, et al. GTRD: an integrated view of transcription regulation. Nucleic Acids Res 2021;49:D104-11.   DOI
11 Kuenzi BM, Park J, Fong SH, Sanchez KS, Lee J, Kreisberg JF, et al. Predicting drug response and synergy using a deep learning model of human cancer cells. Cancer Cell 2020;38:672-84.   DOI
12 Fabregat A, Jupe S, Matthews L, Sidiropoulos K, Gillespie M, Garapati P, et al. The reactome pathway knowledgebase. Nucleic Acids Res 2020;48:D498-503.
13 Shrikumar A, Greenside P, Kundaje A. Learning important features through propagating activation differences. In: Proceedings of the 34th International Conference on Machine Learning; Sydney, Australia. PMLR 2017;70:3145-53.
14 Stark C, Breitkreutz BJ, Reguly T, Boucher L, Breitkreutz A, Tyers M. BioGRID: a general repository for interaction datasets. Nucleic Acids Res 2006;34(Database issue):D535-9.   DOI
15 Szklarczyk D, Gable AL, Nastou KC, Lyon D, Kirsch R, Pyysalo S, et al. The STRING database in 2021: customizable protein-protein networks, and functional characterization of user-uploaded gene/measurement sets. Nucleic Acids Res 2021;49(D1):D605-12.   DOI
16 Scarselli F, Gori M, Tsoi AC, Hagenbuchner M, Monfardini G. The graph neural network model. IEEE Trans Neural Netw 2008;20:61-80.   DOI
17 Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 2000;28:27-30.   DOI
18 Lee S, Lim S, Lee T, Sung I, Kim S. Cancer subtype classification and modeling by pathway attention and propagation. Bioinformatics 2020;36:3818-24.   DOI
19 Fortelny N, Bock C. Knowledge-primed neural networks enable biologically interpretable deep learning on single-cell sequencing data. Genome Biol 2020;21:190.   DOI
20 Lipscomb CE. Medical subject headings (MeSH). Bull Med Library Assoc 2000;88:265-6.
21 Wang J, Zhuang J, Iyer S, Lin X, Whitfield TW, Greven MC, et al. Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors. Genome Res 2012;22:1798-812.   DOI
22 Ma J, Yu MK, Fong S, Ono K, Sage E, Demchak B, et al. Using deep learning to model the hierarchical structure and function of a cell. Nat Methods 2018;15:290-8.   DOI
23 Senior AW, Evans R, Jumper J, Kirkpatrick J, Sifre L, Green T, et al. Improved protein structure prediction using potentials from deep learning. Nature 2020;577:706-10.   DOI
24 Luscombe NM, Austin SE, Berman HM, Thornton JM. An overview of the structures of protein-DNA complexes. Genome Biol 2000;1:REVIEWS001.
25 Bailey TL, Williams N, Misleh C, Li WW. MEME: discovering and analyzing DNA and protein sequence motifs. Nucleic Acids Res 2006;34:W369-73.   DOI
26 Alipanahi B, Delong A, Weirauch MT, Frey BJ. Predicting the sequence specificities of DNA-and RNA-binding proteins by deep learning. Nat Biotechnol 2015;33:831-8.   DOI
27 Agarwal V, Shendure J. Predicting mRNA abundance directly from genomic sequence using deep convolutional neural networks. Cell Rep 2020;31:107663.   DOI
28 Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, et al. Highly accurate protein structure prediction with AlphaFold. Nature 2021;596:583-9.   DOI
29 Shiraki T, Kondo S, Katayama S, Waki K, Kasukawa T, Kawaji H, et al. Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage. Proc Natl Acad Sci USA 2003;100:15776-81.   DOI
30 Kelley DR. Cross-species regulatory sequence activity prediction. PLoS Comput Biol 2020;16:e1008050.   DOI
31 Avsec Z, Agarwal V, Visentin D, Ledsam JR, Grabska-Barwinska A, Talyer KR, et al. Effective gene expression prediction from sequence by integrating long-range interactions. Nat Methods 2021;18:1196-203.   DOI
32 Zhou J, Troyanskaya OG. Predicting effects of noncoding variants with deep learning-based sequence model. Nat Methods 2015;12:931-4.   DOI
33 Jaganathan K, Panagiotopoulou SK, McRae JF, Darbandi SF, Knowles D, Li YI, et al. Predicting splicing from primary sequence with deep learning. Cell 2019;176:535-48.   DOI
34 Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. In: 31st Conference on Neural Information Processing Systems (NIPS 2017); Long Beach (CA), USA.
35 Reed R, Maniatis T. The role of the mammalian branchpoint sequence in pre-mRNA splicing. Genes Dev 1988;2:1268-76.   DOI
36 Riley TR, Slattery M, Abe N, Rastogi C, Liu D, Mann RS, et al. SELEX-seq: a method for characterizing the complete repertoire of binding site preferences for transcription factor complexes. In: Graba Y, Rezsohazy R, editors. Hox genes. New York: Humana Press, 2014:255-78.
37 Springenberg JT, Dosovitskiy A, Brox T, Riedmiller M. Striving for simplicity: the all convolutional net. arXiv:1412.6806 [Preprint]. 2014 [cited 2021 Sep 10]. Available from: https://doi.org/10.48550/arXiv.1412.6806.
38 Tian Q, Zou J, Tang J, Fang Y, Yu Z, Fan S. MRCNN: a deep learning model for regression of genome-wide DNA methylation. BMC Genomics 2019;20:1-10.   DOI
39 Schreiber J, Durham T, Bilmes J, Noble WS. Avocado: a multi-scale deep tensor factorization method learns a latent representation of the human epigenome. Genome Biol 2020;21:81.   DOI
40 Chaudhary K, Poirion OB, Lu L, Garmire LX. Deep learning-based multi-omics integration robustly predicts survival in liver cancer. Clin Cancer Res 2018;24:1248-59.   DOI
41 Angermueller C, Lee HJ, Reik W, Stegle O. DeepCpG: accurate prediction of single-cell DNA methylation states using deep learning. Genome Biol 2017;18:67.   DOI
42 Sundararajan M, Taly A, Yan Q. Axiomatic attribution for deep networks. In: Proceedings of the 34th International Conference on Machine Learning; Sydney, Australia. PMLR 2017;70:3319-28.
43 Hwang S, Kim CY, Yang S, Kim E, Hart T, Marcotte EM, et al. HumanNet v2: human gene networks for disease research. Nucleic Acids Res 2019;47:D573-80.   DOI
44 ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 2012;489:57.   DOI
45 Zhou J, Theesfeld CL, Yao K, Chen KM, Wong AK, Troyanskaya OG. Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk. Nat Genet 2018;50:1171-9.   DOI
46 Ando RK, Zhang T. Learning on graph with Laplacian regularization. In: Scholkopf B, Platt J, Hofmann T, et al. Advances in Neural Information Processing Systems 19: Proceedings of the 2006 Conference. 2007;19:25.
47 Khan A, Fornes O, Stigliani A, Gheorghe M, Castro-Mondragon JA, Van Der Lee R, et al. JASPAR 2018: update of the open-access database of transcription factor binding profiles and its web framework. Nucleic Acids Res 2018;46:D260-6.   DOI
48 Matys V, Kel-Margoulis OV, Fricke E, Liebich I, Land S, Barre-Dirrie A, et al. TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes. Nucleic Acids Res 2006;34:D108-10.   DOI
49 Ploenzke, MS, Irizarry RA. Interpretable convolution methods for learning genomic sequence motifs. bioRxiv [Preprint]. 2018 [cited 2021 Sep 3]. Available from: https://doi.org/10.1101/411934.   DOI
50 Mount SM. A catalogue of splice junction sequences. Nucleic Acids Res 1982;10:459-72.   DOI
51 Sharifi-Noghabi H, Zolotareva O, Collins CC, Ester M. MOLI: multi-omics late integration with deep neural networks for drug response prediction. Bioinformatics 2019;35:i501-9.   DOI
52 Kelley DR, Snoek J, Rinn JL. Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res 2016;26:990-9.   DOI
53 Munoz Descalzo S, Rue P, Faunes F, Hayward P, Jakt LM, Balayo T, et al. A competitive protein interaction network buffers Oct4-mediated differentiation to promote pluripotency in embryonic stem cells. Mol Syst Biol 2013;9:694.   DOI
54 Kang M, Lee S, Lee D, Kim S. Learning cell-type-specific gene regulation mechanisms by multi-attention-based deep learning with regulatory latent space. Frontier Genet 2020;11:869   DOI
55 Kundaje A, Meuleman W, Ernst J, Bilenky M, Yen A, Heravi-Moussavi A, et al. Integrative analysis of 111 reference human epigenomes. Nature 2015;518:317-30.   DOI
56 Joung JK, Ramm EI, Pabo CO. A bacterial two-hybrid selection system for studying protein-DNA and protein-protein interactions. Proc Natl Acad Sci USA 2000;97:7382-7.   DOI
57 Min S, Lee B, Yoon S. Deep learning in bioinformatics. Brief Bioinform 2017;18:851-69.   DOI
58 Lemos B, Meiklejohn CD, Hartl DL. Regulatory evolution across the protein interaction network. Nat Genet 2004;36:1059-60.   DOI
59 Fields S, Song OK. A novel genetic system to detect protein-protein interactions. Nature 1989;340:245-6.   DOI
60 Pawson T, Nash P. Protein-protein interactions define specificity in signal transduction. Genes Dev 2000;14:1027-47.   DOI
61 Mardt A, Pasquali L, Wu H, Noe F. VAMPnets for deep learning of molecular kinetics. Nat Commun 2018;9:5.   DOI
62 Park PJ. ChIP-seq: advantages and challenges of a maturing technology. Nat Rev Genet 2009;10:669-80.   DOI
63 He Q, Johnston J, Zeitlinger J. ChIP-nexus enables improved detection of in vivo transcription factor binding footprints. Nat Biotechnol 2015;33:395-401.   DOI
64 Heinz S, Benner C, Spann N, Bertolino E, Lin YC, Laslo P, et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol Cell 2010;38:576-89.   DOI
65 Hassanzadeh HR, Wang MD. DeeperBind: enhancing prediction of sequence specificities of DNA binding proteins. PProceedings (IEEE Int Conf Bioinformatics Biomed) 2016;2016:178-83.
66 Ruder S. An overview of multi-task learning in deep neural networks. arXiv:1706.05098 [Preprint]. 2017 [cited 2021 Sep 3]. Available from: https://doi.org/10.48550/arXiv.1706.05098.
67 Gavin AC, Bosche M, Krause R, Grandi P, Marzioch M, Bauer A, et al. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 2002;415:141-7.   DOI
68 Rhee S, Seo S, Kim S. Hybrid approach of relation network and localized graph convolutional filtering for breast cancer subtype classification. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI-18); Stockholm (Sweden); 2018 Jul 13-19. IJCAI 2018;3527-34.