Browse > Article
http://dx.doi.org/10.9728/dcs.2018.19.4.739

A Hybrid Efficient Feature Selection Model for High Dimensional Data Set based on KNHNAES (2013~2015)  

Kwon, Tae il (BigSun Systems Co. LTd.)
Li, Dingkun (Database/Bioinformatics Lab, School of Electrical & Computer Engineering, Chungbuk National University)
Park, Hyun Woo (Database/Bioinformatics Lab, School of Electrical & Computer Engineering, Chungbuk National University)
Ryu, Kwang Sun (Database/Bioinformatics Lab, School of Electrical & Computer Engineering, Chungbuk National University)
Kim, Eui Tak (Database/Bioinformatics Lab, School of Electrical & Computer Engineering, Chungbuk National University)
Piao, Minghao (Agency of Smart Factory, Chungbuk National University)
Publication Information
Journal of Digital Contents Society / v.19, no.4, 2018 , pp. 739-747 More about this Journal
Abstract
With a large feature space data, feature selection has become an extremely important procedure in the Data Mining process. But the traditional feature selection methods with single process may no longer fit for this procedure. In this paper, we proposed a hybrid efficient feature selection model for high dimensional data. We have applied our model on KNHNAES data set, the result shows that our model outperforms many existing methods in terms of accuracy over than at least 5%.
Keywords
feature selection; high dimensional data; hybrid feature selection model; data mining; parallel computing;
Citations & Related Records
연도 인용수 순위
  • Reference
1 E. Guldogan, M. Gabbouj, "Feature Selection for Content-Based Image Retrieval", Signal, Image and Video Processing, Vol. 2, pp. 241-250, 2008.   DOI
2 KNHANES, Available: https://knhanes.cdc.go.kr/knhanes/sub03/sub03_02_02.do
3 R. Chakraborty, R. P. Nikhil, "Feature selection using a neural framework with controlled redundancy", IEEE transactions on neural networks and learning systems Vol. 26, No. 1, pp. 35-50, 2015.   DOI
4 L. Yu, H. Liu, "Feature selection for high-dimensional data: a fast correlation-based filter solution", Proceedings of the 12th International Conference on Machine Learning, Washington, DC, USA, 2003.
5 K. I. Kim, M. I. M. Ishag, M. Kim, J. S. Kim, and K. H. Ryu, "Proposal of a Resource-Monitoring Improvement System Using Amazon Web Service API." In Advances in Computer Science and Ubiquitous Computing, pp. 1103-1107, 2016
6 S. Kweon, et al., "Data resource profile: the Korea national health and nutrition examination survey (KNHANES)", International journal of epidemiology, Vol. 43, No. 1, pp. 69-77, 2014.   DOI
7 C. B. Begg, A. B. Jesse, "Publication bias: a problem in interpreting medical data", Journal of the Royal Statistical Society. Series A (Statistics in Society), pp. 419-463, 1988.
8 M. Piao, H. S. Shon, J. Y. Lee, and K. H. Ryu, "Subspace projection method based clustering analysis in load profiling", IEEE Transactions on Power Systems, vol. 29, no. 6, pp. 2628-2635, 2014.   DOI
9 M. E. A. Bashir, D. G. Lee, M. Li et al., "Trigger learning and ECG parameter customization for remote cardiac clinical care information system", IEEE Transactions on Information Technology in Biomedicine, vol. 16, no. 4, pp. 561-571, 2012.   DOI
10 S. S. Kannan, N. Ramraj, "A Novel Hybrid Feature Selection via Symmetrical Uncertainty Ranking Based Local Memetic Search Algorithm", Knowledge-Based Systems, Vol. 23, pp. 580-585, 2010.   DOI
11 M. A. Hall, "Correlation-based Feature Subset Selection for Machine Learning", Hamilton, New Zealand, 1998.
12 S. Maldonado, R. Weber, and J. Basak, "Simultaneous feature selection and classification using kernel-penalized support vector machines", Information Sciences, vol. 181, no. 1, pp. 115-128, 2011.   DOI
13 D. R. Cox, "The regression analysis of binary sequences (with discussion)", J Roy Stat Soc B. Vol. 20, pp. 215-242, 1958.
14 Q. Gu, Z. Li, J. Han, "Generalized fisher score for feature selection." arXiv preprint arXiv, 1202.3725, 2012.
15 K. Eamonn, A. Mueen, "Curse of dimensionality", Encyclopedia of Machine Learning and Data Mining, Springer, pp.314-315, 2017.
16 S. Bharat, N. Kushwaha, O. P. Vyas, "A feature subset selection technique for high dimensional data using symmetric uncertainty." Journal of Data Analysis and Information Processing,Vol. 2 No. 04, pp. 95, 2014.   DOI
17 G. Isabelle, A. Elisseeff, "An introduction to variable and feature selection." Journal of machine learning research, Vol. 3, pp. 1157-1182, Mar, 2003.
18 H. H. Hsu, C. W. Hsieh, M. D. Lu, "Hybrid feature selection by combining filters and wrappers." Expert Systems with Applications, Vol. 38, No. 7, pp. 8144-8150, 2011.   DOI
19 Y. Lei, H. Liu, "Feature selection for high-dimensional data: A fast correlation-based filter solution." Proceedings of the 20th international conference on machine learning (ICML-03). 2003.
20 Z. M. Hira, D. F. Gillies, "A review of feature selection and feature extraction methods applied on microarray data." Advances in bioinformatics 2015, 2015.
21 L. Wang, Y. Wang, Q. Chang, "Feature selection methods for big data bioinformatics: A survey from the search perspective." Methods, Vol. 111, pp. 21-31, 2016.   DOI
22 N. A. Capela, E. D. Lemaire, N. Baddour, "Feature selection for wearable smartphone-based human activity recognition with able bodied, elderly, and stroke patients." PloS one, Vol. 10, No. 4, 2015.
23 T. Fawcett, "An Introduction to ROC Analysis", Pattern Recognition Letters, Vol. 27, No. 8, pp. 861-874, 2006.   DOI
24 Y. Lee, Y. J. Jung, K. W. Nam, S. Nittel, K. Beard, and K. H. Ryu, "Geosensor data representation using layered slope grids", Sensors, vol. 12, no. 12, pp. 17074-17093, 2012.   DOI
25 H. Kim, M. I. M.Ishag, M. Piao, T. Kwon, and K. H. Ryu, "A data mining approach for cardiovascular disease diagnosis using heart rate variability and images of carotid arteries", Symmetry, vol. 8, no.6, 47, 2016.   DOI
26 P, Li, Y. Piao, H. S. Shon, K. H. Ryu, "Comparing the normalization methods for the differential analysis of Illumina high-throughput RNA-Seq data", BMC bioinformatics, , 16(1): 347, 2015.   DOI
27 S. Russell, P. Norvig, Artificial Intelligence: "A Modern Approach (2nd ed.)." Prentice Hall, 1995
28 R. Pandya, P. Jayati, "C5. 0 algorithm to improved decision tree with feature selection and reduced error pruning", International Journal of Computer Applications Vol. 117, No. 16, 2015.
29 J. R. Quinlan, "C4. 5: programs for machine learning." Elsevier, 2014.
30 C. Cortes, V. Vapnik, "Support-vector networks", Machine learning, Vol. 20, No. 3, pp. 273-297, 1995.   DOI
31 http://blog.exsilio.com/all/accuracy-precision-recall-f1-score-interpretation-of-performance-measures/
32 A. V. Chobanian, G. L. Bakris, H. R. Black, et al., "Seventh report of the joint national committee on prevention, detection, evaluation, and treatment of high blood pressure", Hypertension, Vol. 42, No. 6, pp. 1206-1252, 2003   DOI
33 T. M. Cover, J. A. Thomas, Elements of Information Theory (Wiley ed.), 1991.