DOI QR코드

DOI QR Code

Enhancing Gene Expression Classification of Support Vector Machines with Generative Adversarial Networks

  • Received : 2018.11.06
  • Accepted : 2019.02.26
  • Published : 2019.03.31

Abstract

Currently, microarray gene expression data take advantage of the sufficient classification of cancers, which addresses the problems relating to cancer causes and treatment regimens. However, the sample size of gene expression data is often restricted, because the price of microarray technology on studies in humans is high. We propose enhancing the gene expression classification of support vector machines with generative adversarial networks (GAN-SVMs). A GAN that generates new data from original training datasets was implemented. The GAN was used in conjunction with nonlinear SVMs that efficiently classify gene expression data. Numerical test results on 20 low-sample-size and very high-dimensional microarray gene expression datasets from the Kent Ridge Biomedical and Array Expression repositories indicate that the model is more accurate than state-of-the-art classifying models.

Keywords

E1ICAW_2019_v17n1_14_f0001.png 이미지

Fig. 1. SVM for binary classification.

E1ICAW_2019_v17n1_14_f0002.png 이미지

Fig. 2. Architecture of a generative adversarial network.

E1ICAW_2019_v17n1_14_f0003.png 이미지

Fig. 3. Accuracy of these models on 20 datasets.

Table 1. Description characterizes of 20 datasets

E1ICAW_2019_v17n1_14_t0001.png 이미지

Table 2. Hyper-parameters of GAN-SVM

E1ICAW_2019_v17n1_14_t0002.png 이미지

Table 3. Classification results on 20 datasets

E1ICAW_2019_v17n1_14_t0003.png 이미지

Table 4. Accuracy comparison between these models

E1ICAW_2019_v17n1_14_t0004.png 이미지

References

  1. F. Bray, J. Ferlay, I. Soerjomataram, R. L. Siegel, L. A. Torre, and A. Jemal, "Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries," CA. Cancer J. Clin, 2018. DOI: 10.3322/caac.21492.
  2. P. W. Novianti, V. L. Jong, K. C. B. Roes, and M. J. C. Eijkemans, "Factors affecting the accuracy of a class prediction model in gene expression data," BMC Bioinformatics, vol. 16, no. 1, 2015. DOI: 10.1186/s12859-015-0610-4.
  3. S. Y. Kim, "Effects of sample size on robustness and prediction accuracy of a prognostic gene signature," BMC Bioinformatics, vol. 10, no. 1, 2009. DOI: 10.1186/1471-2105-10-147.
  4. V. Vapnik, The nature of statistical learning theory, Springer science & business media, 1995.
  5. T. S. Furey, N. Cristianini, N. Duffy, D. W. Bednarski, M. Schummer, and D. Haussler, "Support vector machine classification and validation of cancer tissue samples using microarray expression data," Bioinformatics, vol. 16, no. 10, pp. 906-914, 2000. DOI: 10.1093/bioinformatics/16.10.906.
  6. I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, "Gene selection for cancer classification using support vector machines," Mach. Learn, vol. 46, no. 1-3, pp. 389-422, 2002. DOI: 10.1023/A:1012487302797.
  7. J. Khan, J. S. Wei, M. Ringnér, L. H. Saal, M. Ladanyi, F. Westermann, F. Berthold, M. Schwab, C. R. Antonescu, C. Peterson and P. S. Meltzer, "Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks," Nat. Med, vol. 7, no. 6, p. 673, 2001. DOI: 10.1038/89044.
  8. L. Li, C. R. Weinberg, T. A. Darden, and L. G. Pedersen. "Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method," Bioinformatics, vol. 17, no. 12, pp. 1131-1142, 2001. DOI: 10.1093/bioinformatics/17.12.1131.
  9. O. P. Netto, S. R. Nozawa, R. A. R. Mitrowski, A. A. Maced and J. A. Baranauskas, "Applying decision trees to gene expression data from DNA microarrays: A leukemia case study," Anais, 2010.
  10. L. Breiman, "Random forests," Mach. Learn, vol. 45, no. 1, pp. 5-32, 2001. DOI: 10.1023/A:1010933404324.
  11. R. Diaz-Uriarte and S. A. De Andres, "Gene selection and classification of microarray data using random forest," BMC Bioinformatics, vol. 7, no. 1, p. 3, 2006. DOI: 10.1186/1471-2105-7-3.
  12. T. N. Do, P. Lenca, S. Lallich, and N. K. Pham, "Classifying veryhigh-dimensional data with random forests of oblique decision trees," in Advances in Knowledge Discovery and Management, Springer, 2010, pp. 39-55. DOI: 10.1007/978-3-642-00580-0_3.
  13. L. Breiman, "Bagging predictors," Mach. Learn, vol. 24, no. 2, pp. 123-140, 1996. DOI: 10.1023/A:1018054314350.
  14. Y. Freund and R.E. Schapire, "A decision-theoretic generalization of on-line learning and an application to boosting," J. Comput. Syst. Sci, vol. 55, no. 1, pp. 119-139, 1995. DOI: 10.1006/jcss.1997.1504.
  15. M. Dettling, "BagBoosting for tumor classification with gene expression data," Bioinformatics, vol. 20, no. 18, pp. 3583-3593, 2004. DOI: 10.1093/bioinformatics/bth447.
  16. A. C. Tan and D. Gilbert, "Ensemble machine learning on gene expression data for cancer classification," Appl. Bioinformatics, vol. 2, no. 3 Suppl, pp. S75-83, 2003.
  17. P. H. Huynh, V. H. Nguyen, and T. N. Do, "A coupling support vector machines with the feature learning of deep convolutional neural networks for classifying microarray gene expression data," in Modern Approaches for Intelligent Information and Database Systems, Springer, 2018, pp. 233-243. DOI: 10.1007/978-3-319-76081-0.
  18. R. R. Bhat, V. Viswanath, and X. Li, "DeepCancer: Detecting cancer via deep generative learning through gene expressions," in Proceedings of 2017 IEEE 15th Intl Conf on Dependable, Autonomic and Secure Computing, 15th Intl Conf on Pervasive Intelligence and Computing, 3rd Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress(DASC/PiCom/DataCom/CyberSciTech), 2017. DOI: 10.1109/DASC-PICom-DataCom-CyberSciTec. 2017.152.
  19. I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. W. Farley, S. Ozair, A. Courville, Y. Bengio, "Generative adversarial nets," in Advances in Neural Information Processing Systems, pp. 2672-2680, 2014.
  20. A. Creswell, T. White, V. Dumoulin, K. Arulkumaran, B. Sengupta, and A. A. Bharath, "Generative adversarial networks: An overview," IEEE Signal Process. Mag, vol. 35, no. 1, pp. 53-65, 2018. DOI: 10.1109/MSP.2017.2765202.
  21. A. Ghahramani, F. M. Watt, and N. M. Luscombe, "Generative adversarial networks simulate gene expression and predict perturbations in single cells." Cold Spring Harbor Laboratory, 08-Feb-2018, [Online] Available: http://dx.doi.org/10.1101/262501.
  22. L. Jinyan and L. Huiqing, Kent Ridge Biomedical datasets repository. Technical report, 2002.
  23. A. Brazma et al., "ArrayExpress a public repository for microarray gene expression data at the EBI," Nucleic Acids Res, vol. 31, no. 1, pp. 68-71, 2003. DOI: 10.1093/nar/gkg091.
  24. A. Dosovitskiy, J. T. Springenberg, M. Tatarchenko, and T. Brox, "Learning to generate chairs, tables and cars with convolutional networks," IEEE Trans. Pattern Anal. Mach. Intell, vol. 39, no. 4, pp. 692-705, 2017. DOI: 10.1109/TPAMI.2016.2567384.
  25. C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, and W. Shi, "Photorealistic single image super-resolution using a generative adversarial network," in Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2, p. 4, 2017. DOI: 10.1109/CVPR.2017.19.
  26. E. Choi, S. Biswal, B. Malin, J. Duke, W. F. Stewart, and J. Sun, "Generating multi-label discrete patient records using generative adversarial networks," 2017, [Online] Available: https://arxiv.org/abs/1703.06490.
  27. O. Press, A. Bar, B. Bogin, J. Berant, and L. Wolf, "Language generation with recurrent generative adversarial networks without pre-training," 2017, [Online] Available: https://arxiv.org/abs/1706.01399, 2017.
  28. E. L. Denton, S. Chintala, R. Fergus, and others, "Deep generative image models using a laplacian pyramid of adversarial networks," in Advances in Neural Information Processing Systems, pp. 1486-1494, 2015.
  29. P. Costa, A. Galdran, M. I. Meyer, M. Abramoff, A. M. Mendonca, and A. Campilho, "End-to-end adversarial retinal image synthesis," IEEE Trans. Med. Imaging, vol. 8, 2017. DOI: 10.1109/TMI.2017.2759102.
  30. P. Moeskops, M. Veta, M. W. Lafarge, K. A. Eppenhof, and J. P. Pluim, "Adversarial training and dilated convolutions for brain MRI segmentation," in Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, Springer, pp. 56-64, 2017 DOI:10.1007/978-3-319-67558-9_7.
  31. C. J. Burges, "A tutorial on support vector machines for pattern recognition," Data Min. Knowl. Discover, vol. 2, no. 2, pp. 121-167, 1998. DOI: 10.1023/A:1009715923555
  32. V. Vapnik, "An overview of statistical learning theory," IEEE Transactions on Neural Networks, vol. 10, no. 5, pp. 988-999, 1998. https://doi.org/10.1109/72.788640
  33. U. H. G. Kressel, "Pairwise Classification and Support Vector Machines," Advances in Kernel Methods: Support Vector Learning, 1999, pp. 255-268.
  34. N. Cristianini and J. Shawe-Taylor, "An introduction to support vector machines and other kernel-based learning methods." Cambridge University Press, 2000 [Online]. Available: http://dx.doi.org/10.1017/CBO9780511801389.
  35. M. Pirooznia, J. Y. Yang, M. Q. Yang, and Y. Deng, "A comparative study of different machine learning methods on microarray gene expression data," BMC Genomics, vol. 9, no. 1, p. S13, 2008. DOI: 10.1186/1471-2164-9-S1-S13.
  36. S. Ioffe and C. Szegedy, "Batch normalization: Accelerating deep network training by reducing internal covariate shift," in Proceedings of International Conference on Machine Learning, pp. 448-456, 2015.
  37. C. W. Hsu, C. C. Chang, and C. J. Lin, "A practical guide to support vector classification," 2003, [Online] Available: https://www.csie.ntu.edu.tw/-cjlin/papers/guide/guide.pdf.
  38. M. Abadi, A. Agarwal, P. Barham, E. Brevdo, and C. Tensor, "Flow: Large-scale machine learning on heterogeneous systems," 2015, [Online] Available:.http://download.tensorflow.org/paper/whitepaper2015.pdf.
  39. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, and R. Weiss, "Scikit-learn: Machine learning in python," J. Mach. Learn. Res., vol. 12, pp. 2825-2830, 2011.
  40. C. C. Chang and C. J. Lin, "LIBSVM: a library for support vector machines," ACM Trans. Intell. Syst. Technol. TIST, vol. 2, no. 3, p. 1-27, 2011. DOI: 10.1145/1961189.1961199.

Cited by

  1. PM10 예측 성능 향상을 위한 이진 분류 모델 비교 분석 vol.25, pp.1, 2021, https://doi.org/10.6109/jkiice.2021.25.1.56