Browse > Article
http://dx.doi.org/10.3745/JIPS.2008.4.1.027

An Empirical Study of Qualities of Association Rules from a Statistical View Point  

Dorn, Maryann (Dept. of Computer Science, Southern Illinois University)
Hou, Wen-Chi (Dept. of Computer Science, Southern Illinois University)
Che, Dunren (Dept. of Computer Science, Southern Illinois University)
Jiang, Zhewei (Dept. of Computer Science, Southern Illinois University)
Publication Information
Journal of Information Processing Systems / v.4, no.1, 2008 , pp. 27-32 More about this Journal
Abstract
Minimum support and confidence have been used as criteria for generating association rules in all association rule mining algorithms. These criteria have their natural appeals, such as simplicity; few researchers have suspected the quality of generated rules. In this paper, we examine the rules from a more rigorous point of view by conducting statistical tests. Specifically, we use contingency tables and chi-square test to analyze the data. Experimental results show that one third of the association rules derived based on the support and confidence criteria are not significant, that is, the antecedent and consequent of the rules are not correlated. It indicates that minimum support and minimum confidence do not provide adequate discovery of meaningful associations. The chi-square test can be considered as an enhancement or an alternative solution.
Keywords
Data Mining; Association Rule Mining; Rule Evaluation; Chi-square Test;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Liu B., Hsu W., and Ma Y. "Mining Association Rules with Multiple Minimum Supports" in Proc. of the ACM SIGKDD Int'l Conference on Knowledge Discovery & Data Mining, 1999
2 Srikant, R. and Agrawal, R. Mining Generalized Association Rules. IBM Research Report RJ9963, June 1995. IBM Almaden Research Center, San Jose, CA
3 Park, J. S.; Chen, M.-S.; and Yu, P. S. An Effective Hash Based Algorithm for Mining Association Rules. In Proc. of SIGMOD Conf. on the Management of Data, 1995, pp 175-186
4 Srikant, R. and Agrawal, R. “Mining Generalized Association Rules,” In Proc. of the 21st Int'l Conf. on VLDB, 1995, pp. 407-419
5 Srikant, R., Vu, Q., and Agrawal, R. “Mining Association Rules with Item Constraints,” In Proc. of the Third Int'l Conf. on Knowledge Discovery in Databases and Data Mining, 1997, pp. 67-73
6 Toivonen H. “Sampling Large Databases for Association Rules,” In Proc. of the 22th VLDB Conference, Mumbai, India, 1996, pp. 134-144
7 Zaki, M. J.; Parthasarathy, S.; Ogihara, M.; and Li, W. New Algorithms for Fast Discovery of Association Rules. In Proc. of the Third Int'l Conf. on Knowledge Discovery in Databases and Data Mining, 1997, pp. 283-286
8 Agrawal R., Srikant, R. “Fast algorithms for Mining Association Rules,” In Proc. of the $20^{th}$ VLDB Conference, Santiago, Chile, 1994, pp. 487-499
9 Agrawal, R. and Srikant, R. “Fast Algorithms for Mining Association Rules,” IBM Research Report RJ9839, June 1994. IBM Almaden Research Center, San Jose, CA
10 Gokhale, D. V. and Kullback, S. The Information in Contingency Tables. Marcel Dekker Inc., New York, 1978
11 Bayardo, R. J. and Agrawal, R. “Mining the Most Interesting Rules,” In Proc. of the Fifth ACM SIGKDD Conf. on Knowledge Discovery and Data Mining, 1999, pp.145-154
12 Bayardo, R., Agrawal, R, and Gunopulos, D. "Constraint-Based Rule Mining in Large, Dense Databases,” In Proc. of the 15th Int'l Conf. on Data Engineering, 188-197, 1999
13 Brin, S. Motwani, R. and Silverstein, R. “Beyond Market Basket: Generalizing Association Rules to Correlations.” SIGMOD-97, 1997, 265-276
14 Brin, S., Motwani, R., Ullman, J., and Tsur, S. “Dynamic Itemset Counting and Implication Rules for Market Basket Data.” In Proc. of the 1997 ACM-SIGMOD Int'l Conf. on the Management of Data, 1997, 255-264
15 Ganti, V., Gebrke, and Ramakrishnan, R. "Mining Very Large Databases," Computer, Vol. 32, No. 8, Aug. 1999, pp. 38-45   DOI   ScienceOn
16 Glass, G. V. and Hopkins, K. D. Statistical Methods in Education and Psychology. (2nd ed. ) Prentice Hall, New Jersey, 1984
17 Han, J. and Fu, Y. “Discovery of multiple-level association rules from large databases.” VLDB-95
18 Liu B., Hsu W., and Ma Y. "Pruning and Summarizing the Discovered Associations, " in Proc. of the ACM SIGKDD Int'l Conference on Knowledge Discovery & Data Mining, San Diego, CA, 1999
19 Mason, R. D., Lind, D. A., and Marchal, W. G. STATISTICS: An Introduction, 5th ed. Duxbury Press, 1998
20 Liu B., Hsu W., Wang K., and Chen S. "Mining Interesting Knowledge Using DM-II" in Proc. of the ACM SIGKDD Int'l Conference on Knowledge Discovery & Data Mining, 1999
21 Agrawal, R., Imielinski, T., and Swami, A. “Mining Association Rules Between Sets of Items in Large Databases,” In Proc. of the ACM-SIGMOD Conf. on Management of Data, Washington, D. C., 1993, pp. 207-216