Browse > Article

Mining Quantitative Association Rules using Commercial Data Mining Tools  

Kang, Gong-Mi (강원대학교 컴퓨터과학)
Moon, Yang-Sae (강원대학교 컴퓨터과학)
Choi, Hun-Young (강원대학교 컴퓨터과학)
Kim, Jin-Ho (강원대학교 컴퓨터과학)
Abstract
Commercial data mining tools basically support binary attributes only in mining association rules, that is, they can mine binary association rules only. In general, however. transaction databases contain not only binary attributes but also quantitative attributes. Thus, in this paper we propose a systematic approach to mine quantitative association rules---association rules which contain quantitative attributes---using commercial mining tools. To achieve this goal, we first propose an overall working framework that mines quantitative association rules based on commercial mining tools. The proposed framework consists of two steps: 1) a pre-processing step which converts quantitative attributes into binary attributes and 2) a post-processing step which reconverts binary association rules into quantitative association rules. As the pre-processing step, we present the concept of domain partition, and based on the domain partition, we formally redefine the previous bipartition and multi-partition techniques, which are mean-based or median-based techniques for bipartition, and are equi-width or equi-depth techniques for multi-partition. These previous partition techniques, however, have the problem of not considering distribution characteristics of attribute values. To solve this problem, in this paper we propose an intuitive partition technique, named standard deviation minimization. In our standard deviation minimization, adjacent attributes are included in the same partition if the change of their standard deviations is small, but they are divided into different partitions if the change is large. We also propose the post-processing step that integrates binary association rules and reconverts them into the corresponding quantitative rules. Through extensive experiments, we argue that our framework works correctly, and we show that our standard deviation minimization is superior to other partition techniques. According to these results, we believe that our framework is practically applicable for naive users to mine quantitative association rules using commercial data mining tools.
Keywords
Association Rules; Quantitative Association Rules; Data Mining; Commercial Data Mining Tools;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Srikant, R., Vu, Q. and Agrawal, R., "Mining Association Rules with Items Constraints," In Proc. the 3rd Int'l Conf. on Knowledge Discovery and Data Mining, pp. 67-73, Aug. 1997
2 Imberman, S. and Domanski, B., "Finding Association Rules From Quantitative Data Using Data Booleanization," In Proc. the 7th Americas Conf. on Information Systems, City University of New York, 2001
3 Silicon Graphics MineSet. http://www.sgi.com/
4 Mendenhall, W. and Beaver, R. J., Introduction to Probability and Statistics, Eighth Edition, Thomson Information, pp. 23-56, 2005
5 Grahne, G. and Zhu, J., "Fast Algorithms for Frequent Itemset Mining Using FP-Trees," IEEE Trans. Knowl. on Data Engineering, Vol.17, No.3, pp. 1347-1362, Oct. 2005   DOI   ScienceOn
6 Toivonen, H., "Sampling Large Databases for Association Rules," In Proc. the 22th Int'l Conf. on Very Large Data Bases, Mumbai(Bombay), India, pp. 134-145, Sept. 1996
7 Savasere, A., Omiencinski, E. and Navathe, S., "Mining for Strong Negative Associations in a Large Database of Customer Transactions," In Proc. the 14th Int'l Conf. on Data Engineering, Olrando, Florida, pp. 494-502, Feb, 1998
8 Wang L., David W. C. and Yiu, S. M., "An Efficient Algorithm for Finding Dense Regions for Mining Quantitative Association Rules," Computers & Mathematics with Applications, Vol.50, No.3-4, pp. 471-490, Aug. 2005   DOI   ScienceOn
9 이혜정, "병렬 처리를 이용한 효과적인 수량 연관규칙에 관한 연구", 순천향대학교 대학원, 전산학과, 박사학위 논문, 2007. 02
10 Gibbons, P., Matias, Y. and Poosala, V., "Fast Incremental Maintenance of Approximate Histograms," In Proc. the 23th Int'l Conf. on Very Large Data Bases, Athens, Greece, pp. 466-475, Aug. 1997
11 강현철, 한상태, 최종후, 김차용, 김은성, 김미경, "SAS Enterprise Miner를 이용한 데이타 마이닝(방법론 및 활용)", 자유아카데미, 1999
12 Srikant, R. and Agrawal, R., "Mining Quantitative Association Rules in Large Relational Tables," In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, Montreal Canada, pp. 1-12, June. 1996
13 Agrawal, R. and Srikant, R., "Fast Algorithms for Mining Association Rules in Large Databases," In Proc. the 20th Int'l Conf. on Very Large Data Bases, Santiago, Chile, pp. 487-499, Sept. 1994
14 IBM. http://www-07.ibm.com/software/kr/data/db2/ product/intelligent_miner_data.html
15 SPSS Clemetine. http://www.spss.com/clementine/
16 Argrawal, R., Imielinski, T. and Swami, A., "Mining Association Rules in Large Databases," In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, Washington D.C, pp. 207-216, May. 1993
17 Park, J.-S., Chen, M.-S. and Philip S. Y., "An Effective Hash-based Algorithm for Mining Association Rules," In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, San Jose, California, pp. 175-186, May, 1995
18 Hu, C., et al., "Mining Quantitative Associations in Large Database," In Proc. the 7th Asia-Pacific Conf. on Web Technologies Research and Development, APWeb2005, Shanghai China, pp. 405-416, Mar. 2005
19 SAS Enterprise Miner. http://www.sas.com/technologies/ analytics/datamining/miner/
20 Srikant, R. and Agrawal, R., "Mining Genralized Association Rules," In Proc. the 21st Int'l Conf. on Very Large Databases, pp. 407-419, Sept, 1995
21 Park, J.-S., Yu, P.-S. and Chen, M.-S. "Mining Association Rules with Adjustable Accuracy," In Proc. the ACM Sixth Int'l Conf. on Information and Knowledge Management, Las Vagas, Nevada, pp. 151-160, Nov. 1997
22 최종후, 한상태, 강현철, 김차용, 김은성, 김미경, "SAS Enterprise Miner를 이용한 데이타 마이닝(기능과 사용법)", 자유아카데미, 1999
23 Savasere, A., Omiecinski, E. and Navathe, S., "An Efficient Algorithm for Mining Association Rules in Large Databases," In Proc. the 21st Int'l Conf. on Very Large Databases, Zurich, Switzerland, pp. 432-443, Sept. 1995
24 Brin, S., Motwani, R., Ullman, J. D. and Tsur, S., "Dynamic Itemset Counting and Implication Rules for Market Basket Data," In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, Tucson, Arizona, pp. 255-264, 1997