Browse > Article
http://dx.doi.org/10.6109/jicce.2019.17.1.14

Enhancing Gene Expression Classification of Support Vector Machines with Generative Adversarial Networks  

Huynh, Phuoc-Hai (Information Technology Faculty, An Giang University)
Nguyen, Van Hoa (Information Technology Faculty, An Giang University)
Do, Thanh-Nghi (College of Information Technology, Can Tho University)
Abstract
Currently, microarray gene expression data take advantage of the sufficient classification of cancers, which addresses the problems relating to cancer causes and treatment regimens. However, the sample size of gene expression data is often restricted, because the price of microarray technology on studies in humans is high. We propose enhancing the gene expression classification of support vector machines with generative adversarial networks (GAN-SVMs). A GAN that generates new data from original training datasets was implemented. The GAN was used in conjunction with nonlinear SVMs that efficiently classify gene expression data. Numerical test results on 20 low-sample-size and very high-dimensional microarray gene expression datasets from the Kent Ridge Biomedical and Array Expression repositories indicate that the model is more accurate than state-of-the-art classifying models.
Keywords
Classification; Support vector machines; Generative adversarial networks; Enhancing data; Gene expression data;
Citations & Related Records
연도 인용수 순위
  • Reference
1 C. J. Burges, "A tutorial on support vector machines for pattern recognition," Data Min. Knowl. Discover, vol. 2, no. 2, pp. 121-167, 1998. DOI: 10.1023/A:1009715923555   DOI
2 V. Vapnik, "An overview of statistical learning theory," IEEE Transactions on Neural Networks, vol. 10, no. 5, pp. 988-999, 1998.   DOI
3 U. H. G. Kressel, "Pairwise Classification and Support Vector Machines," Advances in Kernel Methods: Support Vector Learning, 1999, pp. 255-268.
4 N. Cristianini and J. Shawe-Taylor, "An introduction to support vector machines and other kernel-based learning methods." Cambridge University Press, 2000 [Online]. Available: http://dx.doi.org/10.1017/CBO9780511801389.
5 M. Pirooznia, J. Y. Yang, M. Q. Yang, and Y. Deng, "A comparative study of different machine learning methods on microarray gene expression data," BMC Genomics, vol. 9, no. 1, p. S13, 2008. DOI: 10.1186/1471-2164-9-S1-S13.
6 S. Ioffe and C. Szegedy, "Batch normalization: Accelerating deep network training by reducing internal covariate shift," in Proceedings of International Conference on Machine Learning, pp. 448-456, 2015.
7 C. W. Hsu, C. C. Chang, and C. J. Lin, "A practical guide to support vector classification," 2003, [Online] Available: https://www.csie.ntu.edu.tw/-cjlin/papers/guide/guide.pdf.
8 M. Abadi, A. Agarwal, P. Barham, E. Brevdo, and C. Tensor, "Flow: Large-scale machine learning on heterogeneous systems," 2015, [Online] Available:.http://download.tensorflow.org/paper/whitepaper2015.pdf.
9 F. Bray, J. Ferlay, I. Soerjomataram, R. L. Siegel, L. A. Torre, and A. Jemal, "Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries," CA. Cancer J. Clin, 2018. DOI: 10.3322/caac.21492.   DOI
10 P. W. Novianti, V. L. Jong, K. C. B. Roes, and M. J. C. Eijkemans, "Factors affecting the accuracy of a class prediction model in gene expression data," BMC Bioinformatics, vol. 16, no. 1, 2015. DOI: 10.1186/s12859-015-0610-4.
11 F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, and R. Weiss, "Scikit-learn: Machine learning in python," J. Mach. Learn. Res., vol. 12, pp. 2825-2830, 2011.
12 C. C. Chang and C. J. Lin, "LIBSVM: a library for support vector machines," ACM Trans. Intell. Syst. Technol. TIST, vol. 2, no. 3, p. 1-27, 2011. DOI: 10.1145/1961189.1961199.   DOI
13 V. Vapnik, The nature of statistical learning theory, Springer science & business media, 1995.
14 T. S. Furey, N. Cristianini, N. Duffy, D. W. Bednarski, M. Schummer, and D. Haussler, "Support vector machine classification and validation of cancer tissue samples using microarray expression data," Bioinformatics, vol. 16, no. 10, pp. 906-914, 2000. DOI: 10.1093/bioinformatics/16.10.906.   DOI
15 I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, "Gene selection for cancer classification using support vector machines," Mach. Learn, vol. 46, no. 1-3, pp. 389-422, 2002. DOI: 10.1023/A:1012487302797.   DOI
16 J. Khan, J. S. Wei, M. Ringnér, L. H. Saal, M. Ladanyi, F. Westermann, F. Berthold, M. Schwab, C. R. Antonescu, C. Peterson and P. S. Meltzer, "Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks," Nat. Med, vol. 7, no. 6, p. 673, 2001. DOI: 10.1038/89044.   DOI
17 L. Li, C. R. Weinberg, T. A. Darden, and L. G. Pedersen. "Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method," Bioinformatics, vol. 17, no. 12, pp. 1131-1142, 2001. DOI: 10.1093/bioinformatics/17.12.1131.   DOI
18 O. P. Netto, S. R. Nozawa, R. A. R. Mitrowski, A. A. Maced and J. A. Baranauskas, "Applying decision trees to gene expression data from DNA microarrays: A leukemia case study," Anais, 2010.
19 L. Breiman, "Random forests," Mach. Learn, vol. 45, no. 1, pp. 5-32, 2001. DOI: 10.1023/A:1010933404324.   DOI
20 R. Diaz-Uriarte and S. A. De Andres, "Gene selection and classification of microarray data using random forest," BMC Bioinformatics, vol. 7, no. 1, p. 3, 2006. DOI: 10.1186/1471-2105-7-3.   DOI
21 S. Y. Kim, "Effects of sample size on robustness and prediction accuracy of a prognostic gene signature," BMC Bioinformatics, vol. 10, no. 1, 2009. DOI: 10.1186/1471-2105-10-147.
22 T. N. Do, P. Lenca, S. Lallich, and N. K. Pham, "Classifying veryhigh-dimensional data with random forests of oblique decision trees," in Advances in Knowledge Discovery and Management, Springer, 2010, pp. 39-55. DOI: 10.1007/978-3-642-00580-0_3.
23 L. Breiman, "Bagging predictors," Mach. Learn, vol. 24, no. 2, pp. 123-140, 1996. DOI: 10.1023/A:1018054314350.   DOI
24 Y. Freund and R.E. Schapire, "A decision-theoretic generalization of on-line learning and an application to boosting," J. Comput. Syst. Sci, vol. 55, no. 1, pp. 119-139, 1995. DOI: 10.1006/jcss.1997.1504.   DOI
25 M. Dettling, "BagBoosting for tumor classification with gene expression data," Bioinformatics, vol. 20, no. 18, pp. 3583-3593, 2004. DOI: 10.1093/bioinformatics/bth447.   DOI
26 A. C. Tan and D. Gilbert, "Ensemble machine learning on gene expression data for cancer classification," Appl. Bioinformatics, vol. 2, no. 3 Suppl, pp. S75-83, 2003.
27 P. H. Huynh, V. H. Nguyen, and T. N. Do, "A coupling support vector machines with the feature learning of deep convolutional neural networks for classifying microarray gene expression data," in Modern Approaches for Intelligent Information and Database Systems, Springer, 2018, pp. 233-243. DOI: 10.1007/978-3-319-76081-0.
28 R. R. Bhat, V. Viswanath, and X. Li, "DeepCancer: Detecting cancer via deep generative learning through gene expressions," in Proceedings of 2017 IEEE 15th Intl Conf on Dependable, Autonomic and Secure Computing, 15th Intl Conf on Pervasive Intelligence and Computing, 3rd Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress(DASC/PiCom/DataCom/CyberSciTech), 2017. DOI: 10.1109/DASC-PICom-DataCom-CyberSciTec. 2017.152.
29 I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. W. Farley, S. Ozair, A. Courville, Y. Bengio, "Generative adversarial nets," in Advances in Neural Information Processing Systems, pp. 2672-2680, 2014.
30 A. Creswell, T. White, V. Dumoulin, K. Arulkumaran, B. Sengupta, and A. A. Bharath, "Generative adversarial networks: An overview," IEEE Signal Process. Mag, vol. 35, no. 1, pp. 53-65, 2018. DOI: 10.1109/MSP.2017.2765202.   DOI
31 A. Ghahramani, F. M. Watt, and N. M. Luscombe, "Generative adversarial networks simulate gene expression and predict perturbations in single cells." Cold Spring Harbor Laboratory, 08-Feb-2018, [Online] Available: http://dx.doi.org/10.1101/262501.
32 L. Jinyan and L. Huiqing, Kent Ridge Biomedical datasets repository. Technical report, 2002.
33 A. Brazma et al., "ArrayExpress a public repository for microarray gene expression data at the EBI," Nucleic Acids Res, vol. 31, no. 1, pp. 68-71, 2003. DOI: 10.1093/nar/gkg091.   DOI
34 A. Dosovitskiy, J. T. Springenberg, M. Tatarchenko, and T. Brox, "Learning to generate chairs, tables and cars with convolutional networks," IEEE Trans. Pattern Anal. Mach. Intell, vol. 39, no. 4, pp. 692-705, 2017. DOI: 10.1109/TPAMI.2016.2567384.   DOI
35 C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, and W. Shi, "Photorealistic single image super-resolution using a generative adversarial network," in Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2, p. 4, 2017. DOI: 10.1109/CVPR.2017.19.
36 E. Choi, S. Biswal, B. Malin, J. Duke, W. F. Stewart, and J. Sun, "Generating multi-label discrete patient records using generative adversarial networks," 2017, [Online] Available: https://arxiv.org/abs/1703.06490.
37 O. Press, A. Bar, B. Bogin, J. Berant, and L. Wolf, "Language generation with recurrent generative adversarial networks without pre-training," 2017, [Online] Available: https://arxiv.org/abs/1706.01399, 2017.
38 E. L. Denton, S. Chintala, R. Fergus, and others, "Deep generative image models using a laplacian pyramid of adversarial networks," in Advances in Neural Information Processing Systems, pp. 1486-1494, 2015.
39 P. Costa, A. Galdran, M. I. Meyer, M. Abramoff, A. M. Mendonca, and A. Campilho, "End-to-end adversarial retinal image synthesis," IEEE Trans. Med. Imaging, vol. 8, 2017. DOI: 10.1109/TMI.2017.2759102.
40 P. Moeskops, M. Veta, M. W. Lafarge, K. A. Eppenhof, and J. P. Pluim, "Adversarial training and dilated convolutions for brain MRI segmentation," in Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, Springer, pp. 56-64, 2017 DOI:10.1007/978-3-319-67558-9_7.