Browse > Article
http://dx.doi.org/10.5351/CKSS.2009.16.1.143

Variable Selection Based on Mutual Information  

Huh, Moon-Y. (Dept. of Statistics, Sungkyunkwan Univ.)
Choi, Byong-Su (Dept. of Multimedia Engineering, Hansung Univ.)
Publication Information
Communications for Statistical Applications and Methods / v.16, no.1, 2009 , pp. 143-155 More about this Journal
Abstract
Best subset selection procedure based on mutual information (MI) between a set of explanatory variables and a dependent class variable is suggested. Derivation of multivariate MI is based on normal mixtures. Several types of normal mixtures are proposed. Also a best subset selection algorithm is proposed. Four real data sets are employed to demonstrate the efficiency of the proposals.
Keywords
Best subset selection; feature selection; mutual information; normal mixture;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Tourassi, G. D., Frederick, E. D., Markey, M. K. and Floyd, C. E. Jr. (2001). Application of the mutual information criterion for feature selection in computer-aided diagnosis, Medicine Physicist, 28, 2394-2402   DOI   ScienceOn
2 Wang, J. (2001). Generating daily changes in market variables using a multivariate mixture of normal distributions, Proceedings of the 33nd conference on Winter simulation, IEEE computer Society   DOI
3 Witten, I. and Frank, E. (1999). Data Mining, Morgan and Kaufmann. http://www.cs. waikato.ac.nz/ml/weka
4 Shannon, C. E. (1948). A mathematical theory of communication, Bell System Technical Journal, 27, 379-423 and 623-656   DOI
5 Torkkola, K. and Campbell, W. M. (2000). Mutual information in learning feature transformations, In Proceeding ICML'2000, The Seventeenth International Conference on Machine Learning
6 Hutter, M. (2002). Distribution of mutual information, In Advances in Neural Information Processing Systems 14, editor T. G. Dietterich and S. Becker and Z. Ghahramani, MIT Press, Cambridge, MA, 399-406
7 Ihaka, R. and Gentleman, R. (1996). R: A language for data analysis and graphics, Journal of Com-putational and Graphical Statistics, 5, 299-314, http://www.r-project.org   DOI   ScienceOn
8 Joe, H. (1989). Relative entropy measures of multivariate dependence, Journal of the American Statistical Association, 84, I57-I64   DOI   ScienceOn
9 Kononenko, I., Simec, E. and Robnik-Sikonja, M. (1997). Overcoming the myopia of inductive learning algorithms with RELIEFF, Applied Intelligence, 7, 39-55   DOI
10 Kojadinovic, I. (2005). Relevance measures for subset variable selection in regression problems based on k-additive mutual information, Computational Statistics & Data Analysis, 49, 1205-1227   DOI   ScienceOn
11 Lee, S.-C. and Huh, M. Y. (2003). A measure of association for complex data, Computational Statistics & Data Analysis, 44, 211-222   DOI   ScienceOn
12 Liu, H. and Motoda, H. (1998). Feature Extraction, Construction and Selection: A Data Mining Perspective, 2nd Printing, Kluwer Academic Publishers
13 Merz, C. J. and Murphy, P. M. (1996). UCI Repository of Machine Learning Databases, Department of Information and Computer Science, University of California, Irvine, $CA(http://www.ics.uci.edu/^{~} mlearn/MLRepository.html)$
14 Miller, A. J. (1990). Subset Selection in Regression, Chapman & Hall/CRC, London
15 Nguyen, H. S. and Skowron, A. (1995). Quantization of real value attributes. Proceedins of Second Joint Annual Conf. on Information Science, Wrightsville Beach, North Carolina, 34-37
16 Brillinger, D. R. (2004). Some data analyses using mutual information, Brazilian Journal of Proba-bility and Statistics, 18, 163-183
17 Christensen, R. (1997). Log-linear Models and Logistic Regression, Springer, New York
18 Collett, D. (2003). Modelling Binary Data, 2nd ed., Chapman & Hall/CRC
19 Cover, T. M. and Thomas, J. A. (1991). Element of Information Theory, John Wiley & Sons
20 Darbellay, G. A. (1999). An estimator of the mutual information based on a criterion for indepen-dence, Computational Statistics & Data Analysis, 32, 1-17   DOI   ScienceOn
21 Fraley, C. and Raftery, A. E. (2002). MCLUST: Software for model-based clustering, density estima-tion and discriminant analysis, Technical report No. 415, Department of Statistics, University of Washington
22 Huh, M. Y. (1995). Exploring multidimensional data with the flipped empirical distribution function, Journal of Computational and Graphical Statistics, 4, 335-343   DOI   ScienceOn
23 Huh, M. Y. and Song, K. Y. (2002). DAVIS: A Java-based data visualization system, Computational Statistics, 17, 411-423
24 Battiti, R. (1994). Using mutual information for selecting features in supervised neural net learning, IEEE Transactions on Neural Networks, 5, 537-550   DOI   ScienceOn