Browse > Article
http://dx.doi.org/10.5391/JKIIS.2010.20.2.165

Fuzzy discretization with spatial distribution of data and Its application to feature selection  

Son, Chang-Sik (명대학교 의과대학 의료정보학교실)
Shin, A-Mi (명대학교 의과대학 의료정보학교실)
Lee, In-Hee (명대학교 의과대학 의료정보학교실)
Park, Hee-Joon (계명대학교 의과대학 의용공학과)
Park, Hyoung-Seob (계명대학교 의과대학 내과학(심장내과) 교실)
Kim, Yoon-Nyun (계명대학교 의과대학 내과학(심장내과) 교실)
Publication Information
Journal of the Korean Institute of Intelligent Systems / v.20, no.2, 2010 , pp. 165-172 More about this Journal
Abstract
In clinical data minig, choosing the optimal subset of features is such important, not only to reduce the computational complexity but also to improve the usefulness of the model constructed from the given data. Moreover the threshold values (i.e., cut-off points) of selected features are used in a clinical decision criteria of experts for differential diagnosis of diseases. In this paper, we propose a fuzzy discretization approach, which is evaluated by measuring the degree of separation of redundant attribute values in overlapping region, based on spatial distribution of data with continuous attributes. The weighted average of the redundant attribute values is then used to determine the threshold value for each feature and rough set theory is utilized to select a subset of relevant features from the overall features. To verify the validity of the proposed method, we compared experimental results, which applied to classification problem using 668 patients with a chief complaint of dyspnea, based on three discretization methods (i.e., equal-width, equal-frequency, and entropy-based) and proposed discretization method. From the experimental results, we confirm that the discretization methods with fuzzy partition give better results in two evaluation measures, average classification accuracy and G-mean, than those with hard partition.
Keywords
Continuous attribute; Separation; Discretization; Feature selection;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 Y.M. Sun, M.S. Kamel, A.K.C. Wong, and Y. Wang, "Cost-sensitive boosting for classification of imbalanced data," Pattern Recognition, vol.40, no.12, pp.3358-3378, 2007.   DOI
2 전시자 외 9인, 성인간호학 (상) 4판, 현문사, 2005.
3 R. Slowinski, Intelligent decision support, Handbook of applications and advances of the rough sets theory, Kluwer Academic Publishers, Dordrecht, 1992.
4 B. Walczak and D.L. Massart, "Rough set theory," Chemometrics and Intelligent Laboratory Systems, vol.47, no.1, pp.1-16, 1999.   DOI
5 이상훈, 박정은, 오경환, "데이터 분포를 고려한 연속 값 속성의 이산화," 한국퍼지 및 지능시스템 학회 논문지, 제13권, 4호, pp.391-396, 2003.   과학기술학회마을   DOI
6 D. Chiu, A. Wong, and B. Cheung, Information discovery through hierarchical maximum entropy discretization and synthesis, MIT Press, 1991.
7 U.M. Fayyad and K.B. Irani, "Multi-interval discretization of continuous attributes as preprocessing for classification learning," Proc. 13th International Joint Conference on Artificial Intelligence, pp.1022-1027, 1993.
8 R. Kerber, "Discretization of numerical attributes," Proc. the 10th National Conference on Artificial Intelligence (AAAI-92), pp.123-128, 1992.
9 J. MacQueen, "Some methods for classification and analysis of multivariate observations," Proc. the 5th Berkeley Symposium on Mathematical Statistics and Probability, pp.281-297, 1967.
10 L. Kaufman and P.J. Rousseeuw, Finding group in data: an introduction to cluster analysis, John Wiley & Sons, New York, 1990.
11 C.-C. Shen and Y.-L. Chen, "A dynamic-programming algorithm for hierarchical discretization of continuous attributes," European Journal of Operational Research, vol.184, no.2, pp. 636-651, 2008.   DOI
12 Z. Pawlak, "Rough set," Int. J. Comput. Inform. Sci., vol.11, no.5, pp.341-356, 1982.   DOI
13 Z. Pawlak, Rough sets, Theoretical aspects of reasoning about data, Kluwer, Dordrecht, Netherlands, 1991.
14 H. Ishibuchi and T. Nakashima, "Effect of rule weights in fuzzy rule-based classification systems," IEEE Trans. Fuzzy Systems, vol.9, no.4, pp.506-515, 2001.   DOI
15 손창식, 정환묵, 서석태, 권순학, "규칙의 커플링문제를 최소화하기 위한 퍼지-러프 분류방법," 한국 퍼지 및 지능시스템 학회 논문지, 제17권, 4호, pp.460-465, 2007.   과학기술학회마을   DOI
16 I. Kononenko, "Machine learning for medical diagnosis: history, state of the art and perspective," Artificial Intelligence in Medicine, vol.23, no.1, pp.89-109, 2001.   DOI