Browse > Article
http://dx.doi.org/10.9718/JBER.2011.32.2.134

Extraction Method of Significant Clinical Tests Based on Data Discretization and Rough Set Approximation Techniques: Application to Differential Diagnosis of Cholecystitis and Cholelithiasis Diseases  

Son, Chang-Sik (Dept. of Medical Informatics, School of Medicine, Keimyung Univ.)
Kim, Min-Soo (Biomedical Information Technology Center, School of Medicine, Keimyung Univ.)
Seo, Suk-Tae (Biomedical Information Technology Center, School of Medicine, Keimyung Univ.)
Cho, Yun-Kyeong (Dept. of Internal Medicine, School of Medicine, Keimyung Univ.)
Kim, Yoon-Nyun (Dept. of Medical Informatics, School of Medicine, Keimyung Univ.)
Publication Information
Journal of Biomedical Engineering Research / v.32, no.2, 2011 , pp. 134-143 More about this Journal
Abstract
The selection of meaningful clinical tests and its reference values from a high-dimensional clinical data with imbalanced class distribution, one class is represented by a large number of examples while the other is represented by only a few, is an important issue for differential diagnosis between similar diseases, but difficult. For this purpose, this study introduces methods based on the concepts of both discernibility matrix and function in rough set theory (RST) with two discretization approaches, equal width and frequency discretization. Here these discretization approaches are used to define the reference values for clinical tests, and the discernibility matrix and function are used to extract a subset of significant clinical tests from the translated nominal attribute values. To show its applicability in the differential diagnosis problem, we have applied it to extract the significant clinical tests and its reference values between normal (N = 351) and abnormal group (N = 101) with either cholecystitis or cholelithiasis disease. In addition, we investigated not only the selected significant clinical tests and the variations of its reference values, but also the average predictive accuracies on four evaluation criteria, i.e., accuracy, sensitivity, specificity, and geometric mean, during l0-fold cross validation. From the experimental results, we confirmed that two discretization approaches based rough set approximation methods with relative frequency give better results than those with absolute frequency, in the evaluation criteria (i.e., average geometric mean). Thus it shows that the prediction model using relative frequency can be used effectively in classification and prediction problems of the clinical data with imbalanced class distribution.
Keywords
Data Discretization; Rough Set; Cholecystitis; Cholelithiasis; Differential Diagnosis;
Citations & Related Records
Times Cited By KSCI : 4  (Citation Analysis)
연도 인용수 순위
1 C.S. Son, A.M. Shin, Y.D. Lee, H.S. Park, H.J. Park, and Y.N. Kim, "Rule weight-based fuzzy classification model for analyzing admission-discharge of dyspnea patients," J. Biomed. Eng. Res., vol. 31, no. 1, pp. 40-49, 2010.
2 ICD10 version 2007, http://apps.who.int/classifications/apps/icd/icd10online/
3 D. Chiu, A. Wong, and B. Cheung, Information discovery through hierarchical maximum entropy discretization and synthesis, MIT Press, 1991.
4 Z. Pawlak, "Rough sets," Int. J. Comput. Inf. Sci., vol. 11, no. 5, pp. 341-356, 1982.   DOI   ScienceOn
5 R. Slowinski and J. Stefanowski, "Rough classification in incomplete information systems," Math.Comput.Modeling, vol. 12, no. 10-11, pp. 1347-1357, 1989.   DOI   ScienceOn
6 Z. Pawlak, Rough sets: theoretical aspects of reasoning about data, Kluwer Academic Publisher, Dordrecht, Netherlands, 1991.
7 R. Jensen and Q. Shen, Computational intelligence and feature selection: rough and fuzzy approaches, Wiley-IEEE Press, 2008.
8 Y.M. Sun, M.S. Kamel, A.K.C. Wong, and Y.Wang, "Cost-sensitive boosting for classification of imbalanced data," Patt.Recog., vol. 40, no. 12, pp. 3358-3378, 2007.   DOI   ScienceOn
9 C.S. Son, A.M. Shin, I.H. Lee, H.J. Park, H.S. Park, and Y.N. Kim, "Fuzzy discretization with spatial distribution of data and its application to feature selection," J. Kor. Inst. Int. Syst., vol. 20, no. 2, pp. 165-172, 2010.
10 K.S. Yoo, "Diagnosis of gallstone," Korean J. Med., vol. 75, no. 6, pp. 616-623, 2008.
11 R. Kerber, "ChiMerge: discretization of numeric attributes," in Proceedings of AAAI-92, Ninth Intpppppl Conf. Artificial Intelligence, AAAI-Press, pp. 123-128, 1992.
12 H. Liu and R. Setiono, "Feature selection via discretization of numeric attributes," IEEE Trans. Knowl.Data Eng., vol. 9, no. 4, pp. 642-645, 1997.   DOI   ScienceOn
13 U.M. Fayyad and K.B. Irani, "Multi-interval discretization of continuous attributes as preprocessing for classification learning," in Proceedings of 13th International Joint Conference on Artificial Intelligence, pp. 1022-1027, 1993.
14 L. Kaufman and P.J. Rousseeuw, "Finding group in data: an introduction to cluster analysis, John Wiley & Sons, New York, 1990.
15 K.N. Lee, J.H. Yoon, Y.H. Choi, H.I. Cho, K.W. Bae, C.H. Yoon, and S.I. Kim, "Standardization of reference values among laboratories of Korean association of health promotion," J. Lab. Med. & Quality Assuarance, vol. 24, no. 2, pp. 185-195, 2002.
16 E.J. Cha, T.S. Lee, Y.S. Whang, J.W. Kim, S.O. Yang, K.H. Jung, and H.K. Ryu, "Automated clinical test results analysis system application to liver function test," J. Biomed. Eng. Res., vol. 14, no. 4, pp. 341-348, 1993.
17 C.S. Son, A.M. Shin, YD. Lee, H.J. Park, H.S. Park, and Y.N. Kim, "Variable threshold based feature selection using spatial distribution of data," J. Kor. Soc. Med. Informatics, vol. 15, no. 4, pp. 475-481, 2009.   DOI