Browse > Article
http://dx.doi.org/10.22937/IJCSNS.2021.21.12.73

Feature Selection with Ensemble Learning for Prostate Cancer Prediction from Gene Expression  

Abass, Yusuf Aleshinloye (Department of Computer Science Nile University of Nigeria)
Adeshina, Steve A. (Department of Computer Science Nile University of Nigeria)
Publication Information
International Journal of Computer Science & Network Security / v.21, no.12spc, 2021 , pp. 526-538 More about this Journal
Abstract
Machine and deep learning-based models are emerging techniques that are being used to address prediction problems in biomedical data analysis. DNA sequence prediction is a critical problem that has attracted a great deal of attention in the biomedical domain. Machine and deep learning-based models have been shown to provide more accurate results when compared to conventional regression-based models. The prediction of the gene sequence that leads to cancerous diseases, such as prostate cancer, is crucial. Identifying the most important features in a gene sequence is a challenging task. Extracting the components of the gene sequence that can provide an insight into the types of mutation in the gene is of great importance as it will lead to effective drug design and the promotion of the new concept of personalised medicine. In this work, we extracted the exons in the prostate gene sequences that were used in the experiment. We built a Deep Neural Network (DNN) and Bi-directional Long-Short Term Memory (Bi-LSTM) model using a k-mer encoding for the DNA sequence and one-hot encoding for the class label. The models were evaluated using different classification metrics. Our experimental results show that DNN model prediction offers a training accuracy of 99 percent and validation accuracy of 96 percent. The bi-LSTM model also has a training accuracy of 95 percent and validation accuracy of 91 percent.
Keywords
Feature Selection; Ensemble Learning; Prostate Cancer Prediction; Gene Expression;
Citations & Related Records
연도 인용수 순위
  • Reference
1 P. Mamoshina, A. Vieira, E. Putin and A. Zhavoronkov, "Applications of deep learning in biomedicine," Molecular pharmaceutics, vol. 13, p. 1445-1454, 2016.   DOI
2 B. P. Lewis, I.-h. Shih, M. W. Jones-Rhoades, D. P. Bartel and C. B. Burge, "Prediction of mammalian microRNA targets," Cell, vol. 115, p. 787-798, 2003.   DOI
3 Y. Chen, Y. Li, R. Narayan, A. Subramanian and X. Xie, "Gene expression inference with deep learning," Bioinformatics, vol. 32, p. 1832-1839, 2016.   DOI
4 J. Lanchantin, R. Singh, B. Wang and Y. Qi, "Deep motif dashboard: Visualizing and understanding genomic sequences using deep neural networks," in Pacific Symposium on Biocomputing 2017, 2017.
5 S. S. Sahu, "Analysis of Genomic and Proteomic Signals Using Signal Processing and Soft Computing Techniques," 2011.
6 G. De Clercq, "DEEP LEARNING FOR CLASSIFICATION OF DNA FUNCTIONAL SEQUENCES," 2019.
7 N. Mughees, S. A. Mohsin, A. Mughees and A. Mughees, "Deep sequence to sequence Bi-LSTM neural networks for day-ahead peak load forecasting," Expert Systems with Applications, vol. 175, p. 114844, 2021.   DOI
8 A. Subramanian, P. Tamayo, V. K. Mootha, S. Mukherjee, B. L. Ebert, M. A. Gillette, A. Paulovich, S. L. Pomeroy, T. R. Golub, E. S. Lander and others, "Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles," Proceedings of the National Academy of Sciences, vol. 102, p. 15545-15550, 2005.   DOI
9 S. Li, P. P. Labaj, P. Zumbo, P. Sykacek, W. Shi, L. Shi, J. Phan, P.-Y. Wu, M. Wang, C. Wang and others, "Detecting and correcting systematic variation in large-scale RNA sequencing data," Nature biotechnology, vol. 32, p. 888-895, 2014.   DOI
10 Y. A. Abass and S. A. Adeshina, "Deep Learning Methodologies for Genomic Data Prediction," Journal of Artificial Intelligence for Medical Sciences, 2021.
11 A. Arbaaeen and A. Shah, "Ontology-Based Approach to Semantically Enhanced Question Answering for Closed Domain: A Review," Information, vol. 12, p. 200, 2021.   DOI
12 K. Zarringhalam, D. Degras, C. Brockel and D. Ziemek, "Robust phenotype prediction from gene expression data using differential shrinkage of co-regulated genes," Scientific reports, vol. 8, p. 1-10, 2018.
13 D. E. Beaudoin, N. Longo, R. A. Logan, J. P. Jones and J. A. Mitchell, "Using information prescriptions to refer patients with metabolic conditions to the Genetics Home Reference website," Journal of the Medical Library Association: JMLA, vol. 99, p. 70, 2011.   DOI
14 Y. Bengio, A. Courville and P. Vincent, "Representation learning: A review and new perspectives," IEEE transactions on pattern analysis and machine intelligence, vol. 35, p. 1798-1828, 2013.   DOI
15 N. S. Madhukar and O. Elemento, "Bioinformatics approaches to predict drug responses from genomic sequencing," Cancer Systems Biology, p. 277-296, 2018.
16 S. Tiwari, S. Ramachandran, A. Bhattacharya, S. Bhattacharya and R. Ramaswamy, "Prediction of probable genes by Fourier analysis of genomic sequences," Bioinformatics, vol. 13, p. 263-270, 1997.   DOI
17 R. Singh, J. Lanchantin, G. Robins and Y. Qi, "DeepChrome: deep-learning for predicting gene expression from histone modifications," Bioinformatics, vol. 32, p. i639-i648, 2016.   DOI
18 B. M. Kuenzi, J. Park, S. H. Fong, K. S. Sanchez, J. Lee, J. F. Kreisberg, J. Ma and T. Ideker, "Predicting drug response and synergy using a deep learning model of human cancer cells," Cancer cell, vol. 38, p. 672-684, 2020.   DOI
19 J. Schmidhuber, "Deep learning in neural networks: An overview," Neural networks, vol. 61, p. 85-117, 2015.   DOI
20 J. Ni, P. Cozzi, J. Beretov, W. Duan, J. Bucci, P. Graham and Y. Li, "Epithelial cell adhesion molecule (EpCAM) is involved in prostate cancer chemotherapy/radiotherapy response in vivo," BMC cancer, vol. 18, p. 1-12, 2018.   DOI
21 D. Anastassiou, "Genomic signal processing," IEEE signal processing magazine, vol. 18, p. 8-20, 2001.   DOI
22 H. a. S. M. a. S. M. a. G. F. Saberkari, "Prediction of protein coding regions in DNA sequences using signal processing methods," in 2012 IEEE Symposium on Industrial Electronics and Applications, 2012.
23 I. Goodfellow, Y. Bengio and A. Courville, Deep learning, MIT press, 2016.
24 E. S. Lander, L. M. Linton, B. Birren, C. Nusbaum, M. C. Zody, J. Baldwin, K. Devon, K. Dewar, M. Doyle, W. FitzHugh and others, "Initial sequencing and analysis of the human genome," 2001.
25 Y. Miura, Y. Sakurai and T. Endo, "O-GlcNAc modification affects the ATM-mediated DNA damage response," Biochimica et Biophysica Acta (BBA)-General Subjects, vol. 1820, p. 1678-1685, 2012.   DOI
26 C. L. M. Marcelis and A. P. M. de Brouwer, "Feingold syndrome 1," 2019.
27 E. Castro and R. Eeles, "The role of BRCA1 and BRCA2 in prostate cancer," Asian journal of andrology, vol. 14, p. 409, 2012.   DOI
28 S. Sunyaev, J. Hanke, A. Aydin, U. Wirkner, I. Zastrow, J. Reich and P. Bork, "Prediction of nonsynonymous single nucleotide polymorphisms in human diseaseassociated genes," Journal of molecular medicine, vol. 77, p. 754-760, 1999.   DOI
29 H. Saberkari, M. Shamsi and M. H. Sedaaghi, "Identification of genomic islands in DNA sequences using a non-DSP technique based on the Z-Curve," in 11th Iranian Conference on Intelligent Systems (ICIS 2013) February 27th & 28th, 2013.
30 A. Krizhevsky, I. Sutskever and G. E. Hinton, "ImageNet classification with deep convolutional neural networks," Communications of the ACM, vol. 60, p. 84-90, 2017.   DOI
31 C. Olah, "Understanding lstm networks-colah's blog," Colah. github. io, 2015.
32 H. P. Desai, A. P. Parameshwaran, R. Sunderraman and M. Weeks, "Comparative study using neural networks for 16S ribosomal gene classification," Journal of Computational Biology, vol. 27, p. 248-258, 2020.   DOI
33 M. Axelson-Fisk, "Comparative Gene Finding," in Comparative Gene Finding, Springer, 2010, p. 157-180.
34 L. Fu, Q. Peng and L. Chai, "Predicting dna methylation states with hybrid information based deep-learning model," IEEE/ACM transactions on computational biology and bioinformatics, vol. 17, p. 1721-1728, 2019.   DOI
35 R. Lopez, J. Regier, M. B. Cole, M. I. Jordan and N. Yosef, "Deep generative modeling for single-cell transcriptomics," Nature methods, vol. 15, p. 1053-1058, 2018.   DOI
36 S. Park, S. Min, H. Choi and S. Yoon, "deepMiRGene: Deep neural network based precursor microrna prediction," arXiv preprint arXiv:1605.00017, 2016.
37 B. Lee, J. Baek, S. Park and S. Yoon, "deepTarget: end-to-end learning framework for microRNA target prediction using deep recurrent neural networks," in Proceedings of the 7th ACM international conference on bioinformatics, computational biology, and health informatics, 2016.
38 Y.-J. Shen and S.-G. Huang, "Improve survival prediction using principal components of gene expression data," Genomics, proteomics & bioinformatics, vol. 4, p. 110-119, 2006.   DOI
39 R. C. Edgar, "Search and clustering orders of magnitude faster than BLAST," Bioinformatics, vol. 26, p. 2460-2461, 2010.   DOI
40 L. Pinello, G. Lo Bosco and G.-C. Yuan, "Applications of alignment-free methods in epigenomics," Briefings in Bioinformatics, vol. 15, p. 419-430, 2014.   DOI
41 D. Urda, J. Montes-Torres, F. Moreno, L. Franco and J. M. Jerez, "Deep learning to analyze RNA-seq gene expression data," in International work-conference on artificial neural networks, 2017.
42 K. Tutlewska, J. Lubinski and G. Kurzawski, "Germline deletions in the EPCAM gene as a cause of Lynch syndrome-literature review," Hereditary cancer in clinical practice, vol. 11, p. 1-9, 2013.   DOI
43 S. Siami-Namini, N. Tavakoli and A. S. Namin, "A comparative analysis of forecasting financial time series using arima, lstm, and bilstm," arXiv preprint arXiv:1911.09512, 2019.
44 D. P. Snustad and M. J. Simmons, Principles of genetics, John Wiley & Sons, 2015.
45 J. S. Bridle, "Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition," in Neurocomputing, Springer, 1990, p. 227-236.
46 G. L. Bosco, "Alignment free dissimilarities for nucleosome classification," in International Meeting on Computational Intelligence Methods for Bioinformatics and Biostatistics, 2015.
47 T. Yue and H. Wang, "Deep learning for genomics: A concise overview," arXiv preprint arXiv:1802.00810, 2018.