Evaluation of Attribute Selection Methods and Prior Discretization in Supervised Learning

Cha, Woon Ock;Huh, Moon Yul;

doi:10.5351/CKSS.2003.10.3.879

Communications for Statistical Applications and Methods

제10권3호
/
Pages.879-894
/
2003
/
2287-7843(pISSN)
/
2383-4757(eISSN)

한국통계학회 (The Korean Statistical Society)

DOI QR Code

Evaluation of Attribute Selection Methods and Prior Discretization in Supervised Learning

Cha, Woon Ock (Division of Computer Engineering, Hansung University) ;
Huh, Moon Yul (Department of Statistics, SungkyunK$\$kwan University)

발행 : 2003.12.01

https://doi.org/10.5351/CKSS.2003.10.3.879 인용 PDF KSCI

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

We evaluated the efficiencies of applying attribute selection methods and prior discretization to supervised learning, modelled by C4.5 and Naive Bayes. Three databases were obtained from UCI data archive, which consisted of continuous attributes except for one decision attribute. Four methods were used for attribute selection : MDI, ReliefF, Gain Ratio and Consistency-based method. MDI and ReliefF can be used for both continuous and discrete attributes, but the other two methods can be used only for discrete attributes. Discretization was performed using the Fayyad and Irani method. To investigate the effect of noise included in the database, noises were introduced into the data sets up to the extents of 10 or 20%, and then the data, including those either containing the noises or not, were processed through the steps of attribute selection, discretization and classification. The results of this study indicate that classification of the data based on selected attributes yields higher accuracy than in the case of classifying the full data set, and prior discretization does not lower the accuracy.

키워드

참고문헌

Classification and regression trees Breiman,L.;Friedman,J.H.;Olshen,R.A.;Stone,C.J.
Intelligent Data Analysis Feature selection for classification Dash,M.;Liu,H.
Pattern Reognition: A Statistical Approach Devijver,P.A.;Kittler,J.
Machine Learning v.8 On the Handling of Continuous-valued Attributes in Decision Tree Generation Fayyad,U.M.;Irani,K.B.
Benchmarking Attribute Selection Techniques for Data Mining Hall,M.A.;Holmes,G.
Journal of Computational and Graphical statistics v.5 no.3 A language for data analysis and graphics Ihaka,R.;Gentleman,R. https://doi.org/10.2307/1390807
Proceed. of Nat'l Conf. of AI The feature selection problem : Traditional methods and a new algorithm Kira,K.;Rendell,L.A.
Proceed. of European Conference on Machine Learning Estimating attributes : Analysis and extension of RELIEF Kononenko,I.
Computational Statistics and Data Analysis v.44 no.Issue 1-2 A Measure of Association for Complex Data Lee,S.C.;Huh,M.Y.
Proceedings of the 13th International Conference on Machine Learning A Probabilistic Approach to Feature Selection: A Filter Solution Liu,H.;Setino,R.
Feature selection for Knowledge Discovery and Data Mining Liu,H.;Motoda,H.
UCI Repository of Machine Learning Databases Merz,C.J.;Murphuy,P.M.
Machine Learning v.1 Induction of decision trees Quinlan,J.R.
C4.5: Programs for machine learning Quinlan,J.R.
Data Mining Witten,I.;Frank,E.

Communications for Statistical Applications and Methods

Evaluation of Attribute Selection Methods and Prior Discretization in Supervised Learning

초록

키워드

참고문헌

자세히 찾기