Browse > Article
http://dx.doi.org/10.9708/jksci.2017.22.01.023

Subgroup Discovery Method with Internal Disjunctive Expression  

Kim, Seyoung (Dept. of Computer Science and Engineering, Pusan National University)
Ryu, Kwang Ryel (Dept. of Computer Science and Engineering, Pusan National University)
Abstract
We can obtain useful knowledge from data by using a subgroup discovery algorithm. Subgroup discovery is a rule model learning method that finds data subgroups containing specific information from data and expresses them in a rule form. Subgroups are meaningful as they account for a high percentage of total data and tend to differ significantly from the overall data. Subgroup is expressed with conjunction of only literals previously. So, the scope of the rules that can be derived from the learning process is limited. In this paper, we propose a method to increase expressiveness of rules through internal disjunctive representation of attribute values. Also, we analyze the characteristics of existing subgroup discovery algorithms and propose an improved algorithm that complements their defects and takes advantage of them. Experiments are conducted with the traffic accident data given from Busan metropolitan city. The results shows that performance of the proposed method is better than that of existing methods. Rule set learned by proposed method has interesting and general rules more.
Keywords
Data mining; Subgroup discovery; Rule learning; Traffic accident data;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 S. Wrobel, "An algorithm for multi-relational discovery of subgroups," Principles of Data Mining and Knowledge Discovery, vol. 1763, pp. 78-87, 1997.
2 J. Kim and K. R. Ryu, "Mining Traffic Accident Data by Subgroup Discovery Using Combinatorial Targets," Computer Systems and Applications (AICCSA), pp. 1-6, Nov. 2015.
3 N. Lavrac, B. Kavsek, P. Flach and L. Todorovski, "Subgroup Discovery with CN2-SD," The Journal of Machine Learning Research, vol. 5, pp. 153-188, Dec. 2004.
4 P. Flach, "Machine Learning: The Art and Science of Algorithms that Make Sense of Data," Cambridge University Press, 2012.
5 G. Wets, K. Vanhoof, B. Depaire, "Traffic accident segmentation by means of latent class clustering," Accident Analysis & Prevention, vol. 40, No. 4, pp. 1257-1266, July 2008.   DOI
6 P. Clark and T. Niblett, "The CN2 induction algorithm," Machine Learning, vol. 3, No. 4, pp. 261-283, Mar. 1989.   DOI
7 J. Kim and K. R. Ryu, "Comparison of Association Rule Learning and Subgroup Discovery for Mining Traffic Accident Data," Journal of Intelligence and Information Systems, vol. 21, No. 4, pp. 1-16, Dec. 2015.   DOI
8 B. Kavseka and N. Lavrac, "APRIORI-SD: ADAPTING ASSOCIATION RULE LEARNING TO SUBGROUP DISCOVERY," Applied Artificial Intelligence: An International Journal, vol. 20, No. 7, pp. 543-583, 2006.   DOI
9 M. J. del Jesus, P. Gonzalez, F. Herrera and M. Mesonero, "Evolutionary fuzzy rule induction process for subgroup discovery: a case study in marketing," IEEE Transactions on Fuzzy Systems, vol. 15, No. 4, pp. 578-592, Aug. 2007.   DOI
10 P. Clark and R. Boswell, "Rule Induction with CN2: Some Recent Improvements," Machine Learning - EWSL-91, vol. 482, pp. 151-163, Mar. 1991.
11 W. W. Cohen, "Fast Effective Rule Induction," Proceedings of the twelfth international conference on machine learning, pp. 115-123, July 1995.
12 F. Herrera, C. J. Carmona, P. González and M. J. Del Jesus, "An overview on subgroup discovery: foundations and applications," Knowledge and information systems, vol. 29, No. 3, pp. 495-525, 2011.   DOI
13 F. Wilcoxon, "Some rapid approximate statistical procedures," Annals of the New York Academy of Sciences, vol. 52, No. 6, pp. 808-814, Mar. 1950.   DOI
14 H. Song, M. Kull, P. Flack and G. Kalogridis, "Subgroup Discovery with Proper Scoring Rules," Joint European Conference on Machine Learning and Knowledge Discovery in Databases, vol. 9852 pp. 492-510, Sept. 2016.
15 S. Sumyea, "Subgroup Discovery Algorithms: A Survey and Empirical Evaluation," Journal of Computer Science and Technology, vol. 31, no. 3, pp. 561-576, May, 2016   DOI