[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.3745/JIPS.2006.2.1.023

A Feature Selection Technique based on Distributional Differences

Kim, Sung-Dong (Dept. of Computer Engineering, Hansung University)

Publication Information

Journal of Information Processing Systems / v.2, no.1, 2006 , pp. 23-27 More about this Journal

Abstract

This paper presents a feature selection technique based on distributional differences for efficient machine learning. Initial training data consists of data including many features and a target value. We classified them into positive and negative data based on the target value. We then divided the range of the feature values into 10 intervals and calculated the distribution of the intervals in each positive and negative data. Then, we selected the features and the intervals of the features for which the distributional differences are over a certain threshold. Using the selected intervals and features, we could obtain the reduced training data. In the experiments, we will show that the reduced training data can reduce the training time of the neural network by about 40%, and we can obtain more profit on simulated stock trading using the trained functions as well.

Keywords

Feature Selection; Distributional Differences;

Citations & Related Records

Reference

1	H. Liu and H. Motoda, 'Feature Selection for Knowledge Discovery and Data Mining', Kluwer Academic Publishers, 1998
2	Daphne Koller and Mehran Sahami, 'Toward Optimal Feature Selection', In Proceedings of the 13th ICML, pp. 284-292, 1996
3	Sung-Dong Kim, Jae Won Lee, 'Induction of Stock Trading Rules Using Distributional Differences', In Proceedings of Korea Data Mining Conference, pp. 206-216, 2001
4	Yiming Yang and Jan O. Pedersen, 'A Comparative Study on Feature Selection in Text Categorization', In Proceedings of the 14th International Conference on Machine Learning, pp. 412-420, 1997
5	G.H. John, R. Kohavi, and K. Pfleger, 'Irrelevant feature and the subset selection problem', In Proceedings of the 11th International Conference on Machine Learning, pp. 121-129, 1994
6	M. Dash and H. Lie, 'Feature selection methods for classification', Intelligent Data Analysis, Vol. 1, No. 3, pp. 131-156, 1997 DOI ScienceOn
7	N. Wyse, R. Dubes, and A.K. Jain, 'A critical evaluation of intrinsic dimensionality algorithms', In E.S. Gelsema and L.N. Kanal, editors, Pattern Recognition in Practice, Morgan Kaufmann Publishers, Inc., pp. 415-425, 1980
8	K. Kira and L. Rendell, 'A practical approach to feature selection', In Proceedings of the 9th ICML, pp. 249-256, 1992
9	J.G. Dy and C.E. Brodley, 'Feature subset selection and order identification for unsupervised learning', In Proceedings of the 17th International Conference on Machine Learning, pp. 247-254, 2000
10	L. Talavera, 'Feature selection as a preprocessing step for hierarchical clustering', In Proceedings of International Conference on Machine Learning, pp. 389-397, 1999
11	Sung-Dong Kim, Jae Won Lee, Jongwoo Lee, and Jinseok Chae, 'A Two-Phase Stock Trading System Using Distributional Differences', In Proceedings of the 13th DEXA, LNCS 2453, pp. 143-152, 2002
12	J.G. Dy and C.E. Brodley, 'Feature subset selection and order identification for unsupervised learning', In Proceedings of the 17th International Conference on Machine Learning, pp. 247-254, 2000
13	H. Almuallim and T.G. Dietterich, 'Learning with many irrelevant features', In Proceedings of the 9th National Conference on Artificial Intelligence, pp. 547-552, 1991
14	A.L. Blum and P. Langley, 'Selection of relevant features and examples in machine learning', Artificial Intelligence, pp.245-271, 1997