Browse > Article
http://dx.doi.org/10.4218/etrij.2018-0522

An enhanced feature selection filter for classification of microarray cancer data  

Mazumder, Dilwar Hussain (Department of Computer Science and Engineering, National Institute of Technology Nagaland)
Veilumuthu, Ramachandran (Department of Computer Science and Engineering, Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology)
Publication Information
ETRI Journal / v.41, no.3, 2019 , pp. 358-370 More about this Journal
Abstract
The main aim of this study is to select the optimal set of genes from microarray cancer datasets that contribute to the prediction of specific cancer types. This study proposes the enhancement of the feature selection filter algorithm based on Joe's normalized mutual information and its use for gene selection. The proposed algorithm is implemented and evaluated on seven benchmark microarray cancer datasets, namely, central nervous system, leukemia (binary), leukemia (3 class), leukemia (4 class), lymphoma, mixed lineage leukemia, and small round blue cell tumor, using five well-known classifiers, including the naive Bayes, radial basis function network, instance-based classifier, decision-based table, and decision tree. An average increase in the prediction accuracy of 5.1% is observed on all seven datasets averaged over all five classifiers. The average reduction in training time is 2.86 seconds. The performance of the proposed method is also compared with those of three other popular mutual information-based feature selection filters, namely, information gain, gain ratio, and symmetric uncertainty. The results are impressive when all five classifiers are used on all the datasets.
Keywords
cancer classification; feature selection; gene expression; microarray; normalized mutual information;
Citations & Related Records
연도 인용수 순위
  • Reference
1 M. S. Mohamad et al., A modified binary particle swarm optimization for selecting the small subset of in‐formative genes from gene expression data, IEEE Trans. Inf. Technol. Biomed. 15 (2011), 813-822.   DOI
2 K. Kira and L. Rendell, The feature selection problem: Traditional methods and a new algorithm, in Proc. Tenth Natl Conf, Artif. Intell., AAAI Press/The MIT Press, Menlo Park, 1992, pp. 129-134.
3 M. Dash, H. Liu, and H. Motoda, Consistency based feature selection, in Proc. Fourth Pacific Asia Conf. Knowl. Discov. Data Min., Springer‐Verlag, 2000, pp. 98-109.
4 M. Hall, Correlation based feature selection for machine learning, Ph.D. Thesis, Univ. Waikato, Dept. Comp. Sci. (1999).
5 L. Yu and H. Liu, Feature selection for high‐dimensional data: a fast correlation‐based filter solution, in Proc. Twentieth Int. Conf. Mach. Learning ICML, Washington, DC, USA, Aug. 21-24, 2003, pp. 856-863.
6 C. E. Sarndal, A comparative study of association measures, Psychometrika 39 (1974), 165-187.   DOI
7 H. Joe, Relative entropy measures of multivariate dependence, J. Am. Stat. Assoc. 84 (1989), 157-164.   DOI
8 C. A. Shannon, A mathematical theory of communication, Bell Syst. Tech. J. 27 (1948), 379-423.   DOI
9 I. H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools with Java Implementations, Morgan Kaufmann, San Francisco, CA, 2000.
10 T. Li, C. Zhang, and M. Ogihara, A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression, Bioinformatics 20 (2004), 2429-2437.   DOI
11 Z. Zhu, Y. S. Ong, and M. Dash, Markov blanket‐embedded genetic algorithm for gene selection, Pattern Recognit. 49 (2007), 3236-3248.
12 M. Dash and H. Liu, Feature selection for classifications, Intell. Data Anal. 1 (1997), 131-156.   DOI
13 I. Guyon and A. Elisseeff, An introduction to variable and feature selection, J. Mach. Learn. Res. 3 (2003), 1157-1182.
14 A. L. Blum and P. Langley, Selection of relevant features and examples in machine learning, Artif. Intell. 97 (1997), 245-271.   DOI
15 P. Meyer, C. Schretter, and G. Bontempi, Information‐theoretic feature selection in microarray data using variable complementarity, IEEE J. Sel. Top. Signal Process. 2 (2008), 261-274.   DOI
16 H. H. Hsu, C. W. Hsieh and M. D. Lu, Hybrid feature selection by combining filters and wrappers, Expert Syst. Appl. 38 (2011), 8144-8150.   DOI
17 J. Wang et al., Maximum weight and minimum redundancy: a novel framework for feature subset selection, Pattern Recognit. 46 (2013), 1616-1627.   DOI
18 B. Liu et al., Discrete biogeography based optimization for feature selection in molecular signatures, Mol. Inf. 34 (2015), 197-215.   DOI
19 Y. Samaneh, J. Shanbehzadeh, and E. Aminian, Feature subset selection using constrained binary/integer biogeography based optimization, ISA Trans. 52 (2013), 383-390.   DOI
20 V. Bolon‐Canedo et al., Statistical dependence measure for feature selection in microarray datasets, in Proc. Eur. Symp. Artif. Neural Netw. ‐ESANN, Bruges, Belgium, Apr. 27-29, 2011, pp. 23-28.
21 L. Song et al., Feature selection via dependence maximization, J. Mach. Learn. Res. 13 (2012), 1393-1434.
22 X. Li and M. Yin, Multi‐objective binary biogeography based optimization for feature selection using gene expression data, IEEE Trans. Nano Biosci. 12 (2013), 343-353.   DOI
23 A. Sharma, S. Imoto, and S. Miyano, A top‐r feature selection algorithm for microarray gene expression data, IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB) 9 (2012), 754-764.   DOI
24 S. Thawkar and R. Ingolikar, Classification of masses in digital mammograms using Biogeography‐based optimization technique, J. King Saud Univ. Comp. Inf. Sci. (2018), https://doi.org/10.1016/j.jksuci.2018.01.004.   DOI