Browse > Article
http://dx.doi.org/10.7472/jksii.2012.13.4.1

Semantic-based Genetic Algorithm for Feature Selection  

Kim, Jung-Ho (한국항공대학교 대학원 컴퓨터공학과)
In, Joo-Ho (한국항공대학교 대학원 컴퓨터공학과)
Chae, Soo-Hoan (한국항공대학 전자 및 정보통신공학부)
Publication Information
Journal of Internet Computing and Services / v.13, no.4, 2012 , pp. 1-10 More about this Journal
Abstract
In this paper, an optimal feature selection method considering sematic of features, which is preprocess of document classification is proposed. The feature selection is very important part on classification, which is composed of removing redundant features and selecting essential features. LSA (Latent Semantic Analysis) for considering meaning of the features is adopted. However, a supervised LSA which is suitable method for classification problems is used because the basic LSA is not specialized for feature selection. We also apply GA (Genetic Algorithm) to the features, which are obtained from supervised LSA to select better feature subset. Finally, we project documents onto new selected feature subset and classify them using specific classifier, SVM (Support Vector Machine). It is expected to get high performance and efficiency of classification by selecting optimal feature subset using the proposed hybrid method of supervised LSA and GA. Its efficiency is proved through experiments using internet news classification with low features.
Keywords
Classification; Feature Selection; Latent Semantic Analysis; Genetic Algorithm; Support Vector Machine;
Citations & Related Records
연도 인용수 순위
  • Reference
1 X. Qi, B. D. Davison, "Web page classification: Features and Algorithms," ACM Computing Surveys(CSUR), Vol. 41, No. 2, Feb. 2009, pp. 1-31.
2 I. Guyon, A. Elisseeff, "An Introduction to Variable and Feature Selection," Journal of Machine Learning Research, Vol. 3, Jan. 2003, pp. 1157-1182.
3 A. Dasgupta, P. Drineas, B. Harb, V. Josifovski, M. W. Mahnoney, "Feature Selection methods for Text Categorization," Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2007, pp. 230-239.
4 Landauer, T. K., S. T. Dumais, "A solution to Plato's problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge," Psychological Review, Vol. 104, No. 2, Apr. 1997, pp. 211-240.   DOI
5 S. C. Deerwester, S. T. Dumais, T. K. Landaner, G. W. Furnas, R. A. Harshman, "Indexing by latent semantic analysis," Journal of the American Society for Information Science, Vol. 41, No. 6, 1990, pp. 391-407.   DOI
6 S. Chakraborti, R. Lothian, N. Wiratunga, S. Watt, "Sprinkling: Supervised Latent Semantic Indexing," Advances in Information Retrieval, 2006, pp. 510-514.
7 7] J. T. Sun, Z. Chen, H. J. Zeng, Y. C. Lu, C. Y. Shi, W. Y. Ma, "Supervised Latent Semantic Indexing for Document Categorization," Fourth IEEE International Conference on Data Mining(ICDM '04), Nov. 2004, pp. 535-538.
8 L. S. Oliveira, N. Benahmed, R. Sabourin, F. Bortolozzi, C. Y. Suen, "Feature Subset Selection Using Genetic Algorithms for Handwritten Digit Recognition," Proceeding SIBGRAPI '01 Proceedings of the 14th Brazilian Symposium on Computer Graphics and Image Processing, 2001, pp.362-370
9 H. Liu, L. Yu, "Toward Integrating Feature selection algorithm for Classification and Clustering," IEEE Transactions on Knowledge and Data Engineering, Vol. 17, No. 4, 2005, pp. 491-502.   DOI   ScienceOn
10 C. M. Chen, H. M. Lee, Y. J. Chang, "Tow novel feature selection approaches for Web page Classification," Expert Systems with Application, Vol. 36, No. 1, Jan. 2009, pp. 260-272.   DOI   ScienceOn
11 J. Cheng, H. Huang, S. Tian, "Feature Selection for Text Classification with Naïve Byes," Expert Systems with Application, Vol. 36, No. 3, Apr. 2009, pp. 5432-5435.   DOI
12 A. Selamat, S. Omatu, "Web page Feature Selection and Classification using Neural Networks," Information Sciences, Vol. 158, Jan. 2004, pp. 69-88.   DOI   ScienceOn
13 Y. Yang, J. O. Pedersen, "A Comparative Study on Feature Selection in Text Categorization," Proceedings of the 14th International Conference on Machine Learning(ICML '97), Jul. 1997, pp. 412-420.
14 H. Peng, F. Long, C. Ding, "Feature selection Based on Mutual Information Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 27, No. 8, Aug. 2005, pp. 1226-1238.   DOI   ScienceOn
15 D. Mladenic, J. Brank, M. Grobelnik, N. Milic-Frayling, "Feature selection using Linear Classification weights: Interaction with Classification models," Proceedings of the 27th annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Jul. 2004, pp. 234-241.
16 I. Inza, P. Larranaga, R. Etxeberria, B. Sierra, "Feature Subset Selection by Bayesian networkbased Optimization," Artificial Intelligence, Vol. 123, No. 1-2, 2000, pp. 157-184.   DOI
17 G. John, R. Kohavi, K. Pfleger, "Irrelevant Feature and the Subset Selection Problem," In Proceedings of 11th International Conference on Machine Learning, 1994, pp. 121-129.
18 P. Luukka, "Feature selection using fuzzy entropy measures with similarity classifier," Expert Systems with Applications, Vol. 38, No. 4, Apr. 2011, pp. 4600-4607.   DOI   ScienceOn
19 I. A. Gheyas, L. S. Smith, "Feature subset selection in large dimensionality domains," Pattern Recognition, Vol. 43, No. 1, Jan. 2010, pp. 5-13.   DOI   ScienceOn
20 J. Hua, W. D. Tembe, E. R. Dougherty, "Performance of feature-selection methods in the classification of high-dimension data," Pattern Recognition, Vol. 42, No. 3, Mar. 2009, pp.409-424.   DOI
21 M. Kudo, J. Sklansky, "Comparison of Algorithms that Select Features for Pattern Classifiers," Pattern Recognition, Vol. 33, No. 1, 2000, pp. 25-41.   DOI