[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.6109/jicce.2019.17.1.14

Enhancing Gene Expression Classification of Support Vector Machines with Generative Adversarial Networks

Huynh, Phuoc-Hai (Information Technology Faculty, An Giang University)
Nguyen, Van Hoa (Information Technology Faculty, An Giang University)
Do, Thanh-Nghi (College of Information Technology, Can Tho University)

Publication Information

Journal of information and communication convergence engineering / v.17, no.1, 2019 , pp. 14-20 More about this Journal

Abstract

Currently, microarray gene expression data take advantage of the sufficient classification of cancers, which addresses the problems relating to cancer causes and treatment regimens. However, the sample size of gene expression data is often restricted, because the price of microarray technology on studies in humans is high. We propose enhancing the gene expression classification of support vector machines with generative adversarial networks (GAN-SVMs). A GAN that generates new data from original training datasets was implemented. The GAN was used in conjunction with nonlinear SVMs that efficiently classify gene expression data. Numerical test results on 20 low-sample-size and very high-dimensional microarray gene expression datasets from the Kent Ridge Biomedical and Array Expression repositories indicate that the model is more accurate than state-of-the-art classifying models.

Keywords

Classification; Support vector machines; Generative adversarial networks; Enhancing data; Gene expression data;

Citations & Related Records

Reference

1	C. J. Burges, "A tutorial on support vector machines for pattern recognition," Data Min. Knowl. Discover, vol. 2, no. 2, pp. 121-167, 1998. DOI: 10.1023/A:1009715923555 DOI
2	V. Vapnik, "An overview of statistical learning theory," IEEE Transactions on Neural Networks, vol. 10, no. 5, pp. 988-999, 1998. DOI
3	U. H. G. Kressel, "Pairwise Classification and Support Vector Machines," Advances in Kernel Methods: Support Vector Learning, 1999, pp. 255-268.
4	N. Cristianini and J. Shawe-Taylor, "An introduction to support vector machines and other kernel-based learning methods." Cambridge University Press, 2000 [Online]. Available: http://dx.doi.org/10.1017/CBO9780511801389.
5	M. Pirooznia, J. Y. Yang, M. Q. Yang, and Y. Deng, "A comparative study of different machine learning methods on microarray gene expression data," BMC Genomics, vol. 9, no. 1, p. S13, 2008. DOI: 10.1186/1471-2164-9-S1-S13.
6	S. Ioffe and C. Szegedy, "Batch normalization: Accelerating deep network training by reducing internal covariate shift," in Proceedings of International Conference on Machine Learning, pp. 448-456, 2015.
7	C. W. Hsu, C. C. Chang, and C. J. Lin, "A practical guide to support vector classification," 2003, [Online] Available: https://www.csie.ntu.edu.tw/-cjlin/papers/guide/guide.pdf.
8	M. Abadi, A. Agarwal, P. Barham, E. Brevdo, and C. Tensor, "Flow: Large-scale machine learning on heterogeneous systems," 2015, [Online] Available:.http://download.tensorflow.org/paper/whitepaper2015.pdf.
9	P. W. Novianti, V. L. Jong, K. C. B. Roes, and M. J. C. Eijkemans, "Factors affecting the accuracy of a class prediction model in gene expression data," BMC Bioinformatics, vol. 16, no. 1, 2015. DOI: 10.1186/s12859-015-0610-4.
10	F. Bray, J. Ferlay, I. Soerjomataram, R. L. Siegel, L. A. Torre, and A. Jemal, "Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries," CA. Cancer J. Clin, 2018. DOI: 10.3322/caac.21492. DOI
11	O. P. Netto, S. R. Nozawa, R. A. R. Mitrowski, A. A. Maced and J. A. Baranauskas, "Applying decision trees to gene expression data from DNA microarrays: A leukemia case study," Anais, 2010.
12	F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, and R. Weiss, "Scikit-learn: Machine learning in python," J. Mach. Learn. Res., vol. 12, pp. 2825-2830, 2011.
13	C. C. Chang and C. J. Lin, "LIBSVM: a library for support vector machines," ACM Trans. Intell. Syst. Technol. TIST, vol. 2, no. 3, p. 1-27, 2011. DOI: 10.1145/1961189.1961199. DOI
14	V. Vapnik, The nature of statistical learning theory, Springer science & business media, 1995.
15	T. S. Furey, N. Cristianini, N. Duffy, D. W. Bednarski, M. Schummer, and D. Haussler, "Support vector machine classification and validation of cancer tissue samples using microarray expression data," Bioinformatics, vol. 16, no. 10, pp. 906-914, 2000. DOI: 10.1093/bioinformatics/16.10.906. DOI
16	I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, "Gene selection for cancer classification using support vector machines," Mach. Learn, vol. 46, no. 1-3, pp. 389-422, 2002. DOI: 10.1023/A:1012487302797. DOI
17	J. Khan, J. S. Wei, M. Ringnér, L. H. Saal, M. Ladanyi, F. Westermann, F. Berthold, M. Schwab, C. R. Antonescu, C. Peterson and P. S. Meltzer, "Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks," Nat. Med, vol. 7, no. 6, p. 673, 2001. DOI: 10.1038/89044. DOI
18	L. Li, C. R. Weinberg, T. A. Darden, and L. G. Pedersen. "Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method," Bioinformatics, vol. 17, no. 12, pp. 1131-1142, 2001. DOI: 10.1093/bioinformatics/17.12.1131. DOI
19	L. Breiman, "Random forests," Mach. Learn, vol. 45, no. 1, pp. 5-32, 2001. DOI: 10.1023/A:1010933404324. DOI
20	R. Diaz-Uriarte and S. A. De Andres, "Gene selection and classification of microarray data using random forest," BMC Bioinformatics, vol. 7, no. 1, p. 3, 2006. DOI: 10.1186/1471-2105-7-3. DOI
21	S. Y. Kim, "Effects of sample size on robustness and prediction accuracy of a prognostic gene signature," BMC Bioinformatics, vol. 10, no. 1, 2009. DOI: 10.1186/1471-2105-10-147.
22	M. Dettling, "BagBoosting for tumor classification with gene expression data," Bioinformatics, vol. 20, no. 18, pp. 3583-3593, 2004. DOI: 10.1093/bioinformatics/bth447. DOI
23	T. N. Do, P. Lenca, S. Lallich, and N. K. Pham, "Classifying veryhigh-dimensional data with random forests of oblique decision trees," in Advances in Knowledge Discovery and Management, Springer, 2010, pp. 39-55. DOI: 10.1007/978-3-642-00580-0_3.
24	L. Breiman, "Bagging predictors," Mach. Learn, vol. 24, no. 2, pp. 123-140, 1996. DOI: 10.1023/A:1018054314350. DOI
25	Y. Freund and R.E. Schapire, "A decision-theoretic generalization of on-line learning and an application to boosting," J. Comput. Syst. Sci, vol. 55, no. 1, pp. 119-139, 1995. DOI: 10.1006/jcss.1997.1504. DOI
26	A. C. Tan and D. Gilbert, "Ensemble machine learning on gene expression data for cancer classification," Appl. Bioinformatics, vol. 2, no. 3 Suppl, pp. S75-83, 2003.
27	P. H. Huynh, V. H. Nguyen, and T. N. Do, "A coupling support vector machines with the feature learning of deep convolutional neural networks for classifying microarray gene expression data," in Modern Approaches for Intelligent Information and Database Systems, Springer, 2018, pp. 233-243. DOI: 10.1007/978-3-319-76081-0.
28	A. Creswell, T. White, V. Dumoulin, K. Arulkumaran, B. Sengupta, and A. A. Bharath, "Generative adversarial networks: An overview," IEEE Signal Process. Mag, vol. 35, no. 1, pp. 53-65, 2018. DOI: 10.1109/MSP.2017.2765202. DOI
29	R. R. Bhat, V. Viswanath, and X. Li, "DeepCancer: Detecting cancer via deep generative learning through gene expressions," in Proceedings of 2017 IEEE 15th Intl Conf on Dependable, Autonomic and Secure Computing, 15th Intl Conf on Pervasive Intelligence and Computing, 3rd Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress(DASC/PiCom/DataCom/CyberSciTech), 2017. DOI: 10.1109/DASC-PICom-DataCom-CyberSciTec. 2017.152.
30	I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. W. Farley, S. Ozair, A. Courville, Y. Bengio, "Generative adversarial nets," in Advances in Neural Information Processing Systems, pp. 2672-2680, 2014.
31	A. Ghahramani, F. M. Watt, and N. M. Luscombe, "Generative adversarial networks simulate gene expression and predict perturbations in single cells." Cold Spring Harbor Laboratory, 08-Feb-2018, [Online] Available: http://dx.doi.org/10.1101/262501.
32	L. Jinyan and L. Huiqing, Kent Ridge Biomedical datasets repository. Technical report, 2002.
33	A. Brazma et al., "ArrayExpress a public repository for microarray gene expression data at the EBI," Nucleic Acids Res, vol. 31, no. 1, pp. 68-71, 2003. DOI: 10.1093/nar/gkg091. DOI
34	O. Press, A. Bar, B. Bogin, J. Berant, and L. Wolf, "Language generation with recurrent generative adversarial networks without pre-training," 2017, [Online] Available: https://arxiv.org/abs/1706.01399, 2017.
35	A. Dosovitskiy, J. T. Springenberg, M. Tatarchenko, and T. Brox, "Learning to generate chairs, tables and cars with convolutional networks," IEEE Trans. Pattern Anal. Mach. Intell, vol. 39, no. 4, pp. 692-705, 2017. DOI: 10.1109/TPAMI.2016.2567384. DOI
36	C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, and W. Shi, "Photorealistic single image super-resolution using a generative adversarial network," in Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2, p. 4, 2017. DOI: 10.1109/CVPR.2017.19.
37	E. Choi, S. Biswal, B. Malin, J. Duke, W. F. Stewart, and J. Sun, "Generating multi-label discrete patient records using generative adversarial networks," 2017, [Online] Available: https://arxiv.org/abs/1703.06490.
38	P. Moeskops, M. Veta, M. W. Lafarge, K. A. Eppenhof, and J. P. Pluim, "Adversarial training and dilated convolutions for brain MRI segmentation," in Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, Springer, pp. 56-64, 2017 DOI:10.1007/978-3-319-67558-9_7.
39	E. L. Denton, S. Chintala, R. Fergus, and others, "Deep generative image models using a laplacian pyramid of adversarial networks," in Advances in Neural Information Processing Systems, pp. 1486-1494, 2015.
40	P. Costa, A. Galdran, M. I. Meyer, M. Abramoff, A. M. Mendonca, and A. Campilho, "End-to-end adversarial retinal image synthesis," IEEE Trans. Med. Imaging, vol. 8, 2017. DOI: 10.1109/TMI.2017.2759102.