Browse > Article
http://dx.doi.org/10.3745/KIPSTB.2009.16-B.3.233

Document Classification of Small Size Documents Using Extended Relief-F Algorithm  

Park, Heum (부산대학교 컴퓨터공학과)
Abstract
This paper presents an approach to the classifications of small size document using the instance-based feature filtering Relief-F algorithm. In the document classifications, we have not always good classification performances of small size document included a few features. Because total number of feature in the document set is large, but feature count of each document is very small relatively, so the similarities between documents are very low when we use general assessment of similarity and classifiers. Specially, in the cases of the classification of web document in the directory service and the classification of the sectors that cannot connect with the original file after recovery hard-disk, we have not good classification performances. Thus, we propose the Extended Relief-F(ERelief-F) algorithm using instance-based feature filtering algorithm Relief-F to solve problems of Relief-F as preprocess of classification. For the performance comparison, we tested information gain, odds ratio and Relief-F for feature filtering and getting those feature values, and used kNN and SVM classifiers. In the experimental results, the Extended Relief-F(ERelief-F) algorithm, compared with the others, performed best for all of the datasets and reduced many irrelevant features from document sets.
Keywords
Feature Selection; Feature Filtering; Instance-Based Filtering; Classification;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Marko Robnik-Sikonja & Igor Kononenko, 'Theoretical and Empirical Analysis of ReliefF and RReliefF', Journal of Machine Learning Vol.53 Issue1-2, 2003, pp.23-69   DOI
2 이재윤, 최보영, 정영미, '문헌 자동분류에서 용어 가중치 기법에 대한 연구', 한국정보관리학회 제7회 학술대회 논문집, pp.41-44, 2000   과학기술학회마을
3 Zhi-Hong Deng, Shi-Wei Tang, Dong-Qing Yang, Ming Zhang, Xiao-Bin Wu and Meng Yang, 'Two Odds-Radio- Based Text Classification Algorithms', Proceedings of Web Information Systems Engineering(Workshops) pp.223-231, 2002
4 Sanmay Das, 'Filters, Wrappers and a Boosting-Based Hybrid for Feature Selection', The Proceedings of the Eighteenth International Conference on Machine Pages, pp.74-81, 2001
5 Yijun Sun, Jian Li, 'Iterative RELIEF for feature weighting', Proceedings of the 23rd international conference on Machine learning Vol.148, 2006, pp.913-920   DOI
6 Yiming Yang and Jan O. Pederson, 'A comparative study on feature selection in text categorization', Proceedings of the 14th International Conference on Machine Learning ICML97, 1997, pp.412-420
7 Kira K & Rendell L, 'A practical approach to feature selection', Proceedings of the Ninth International Workshop on Machine Learning, Morgan Kaufmann Publishers Inc, 1992, pp.249-256
8 Igor Kononenko, 'Estimating Attributes: Analysis and Extensions of RELIEF', Proceedings of the 1994 European Conference on Machine Learning, 1994, pp.171-182   DOI   ScienceOn
9 Baranidharan Raman &Thomas R. Ioerger, 'Instance based filter for feature selection', Journal of Machine Learning Reseach 1, 2002, pp.1-23
10 Pascal Soucy & Guy W. Mineau, 'A Simple KNN Algorithm for Text Categorization', Proceedings of the 2001 IEEE International Conference on Data Mining, 2001, pp.647-648