DOI QR코드

DOI QR Code

Non-linear regression model considering all association thresholds for decision of association rule numbers

기본적인 연관평가기준 전부를 고려한 비선형 회귀모형에 의한 연관성 규칙 수의 결정

  • Received : 2013.02.04
  • Accepted : 2013.03.01
  • Published : 2013.03.31

Abstract

Among data mining techniques, the association rule is the most recently developed technique, and it finds the relevance between two items in a large database. And it is directly applied in the field because it clearly quantifies the relationship between two or more items. When we determine whether an association rule is meaningful, we utilize interestingness measures such as support, confidence, and lift. Interestingness measures are meaningful in that it shows the causes for pruning uninteresting rules statistically or logically. But the criteria of these measures are chosen by experiences, and the number of useful rules is hard to estimate. If too many rules are generated, we cannot effectively extract the useful rules.In this paper, we designed a variety of non-linear regression equations considering all association thresholds between the number of rules and three interestingness measures. And then we diagnosed multi-collinearity and autocorrelation problems, and used analysis of variance results and adjusted coefficients of determination for the best model through numerical experiments.

데이터 마이닝 기법들 중에서도 연관성 규칙은 가장 최근에 개발된 기법으로 대용량 데이터베이스에서 각 항목들 간의 관련성을 찾아내며, 두 항목간의 관계를 명확히 수치화함으로써 두 개 이상의 항목간의 관련성을 표시하여 주기 때문에 현장에서 직접 적용이 가능하다. 일반적으로 연관성 규칙 생성 여부를 판단할 때, 각 항목간의 연관성을 반영하는 기준인 지지도, 신뢰도, 향상도 등의 흥미도 측도를 활용하게 된다. 실제적으로 연관성 규칙의 수를 결정하기 위해서는 이들 흥미도 측도들의 평가기준을 정하기 위해 반복적으로 조정 과정을 거쳐야 한다. 본 논문에서는 이러한 문제를 해결하기 위해 연관성 평가기준 모두를 일반적으로 많이 활용되고 있는 비선형 회귀모형에 적용하여 연관성 규칙의 수를 추정하는 방안을 강구하였다. 또한 분산팽창계수를 이용하여 다중공선성 문제를 진단하는 동시에 분산분석 결과와 수정 결정계수를 이용하여 각 모형의 기여도를 비교하여 가장 바람직한 회귀 모형을 구하였다.

Keywords

References

  1. Agrawal, R., Imielinski, R. and Swami, A. (1993). Mining association rules between sets of items in large databases. Proceedings of the ACM SIGMOD Conference on Management of Data, 207-216.
  2. Agrawal, R. and Srikant, R. (1994). Fast algorithms for mining association rules. Proceeding of the 20th VLDB Conference, 487-499.
  3. Bayardo, R. J. (1998). Efficiently mining long patterns from databases. Proceedings of ACM SIGMOD Conference on Management of Data, 85-93.
  4. Cai, C. H., Fu, A. W. C., Cheng, C. H. and Kwong, W. W. (1998). Mining association rules with weighted items. Proceedings of International Database Engineering and Applications Symposium, 68-77.
  5. Cho, K. H. and Park, H. C. (2011a). Study on the multi intervening relation in association rules. Journal of the Korean Data Analysis Society, 13, 297-306.
  6. Cho, K. H. and Park, H. C. (2011b). Discovery of insignificant association rules using external variable. Journal of the Korean Data Analysis Society, 13, 1343-1352.
  7. Han, J. and Fu, Y. (1995). Discovery of multiple-level association rules from large databases. Proceeding of the 21st VLDB Conference, 420-431.
  8. Han, J. and Fu, Y. (1999). Mining multiple-level association rules in large databases. IEEE Transactions on Knowledge and Data Engineering, 11, 68-77.
  9. Han J. and Kamber, M. (2006). Data mining ; Concepts and techniques, Morgam Kaufmann Publishers, San Francisco.
  10. Han, J., Pei, J. and Yin, Y. (2000). Mining frequent patterns without candidate generation. Proceedings of ACM SIGMOD Conference on Management of Data, 1-12.
  11. Lim, J., Lee, K. and Cho, Y. (2010). A study of association rule by considering the frequency. Journal of the Korean Data & Information Science Society, 21, 1061-1069.
  12. Liu, B., Hsu, W. and Ma, Y. (1999). Mining association rules with multiple minimum supports. Proceedings of the 5th International Conference on Knowledge Discovery and Data Mining, 337-241.
  13. Park, H. C. (2010a). Weighted association rules considering item RFM scores. Journal of the Korean Data & Information Science Society, 21, 1147-1154.
  14. Park, H. C. (2010b). Standardization for basic association measures in association rule mining. Journal of the Korean Data & Information Science Society, 21, 891-899.
  15. Park, H. C. (2011a). Proposition of negatively pure association rule threshold. Journal of the Korean Data & Information Science Society, 22, 179-188.
  16. Park, H. C. (2011b). The proposition of attributably pure confidence in association rule mining. Journal of the Korean Data & Information Science Society, 22, 235-243.
  17. Park, H. C. (2011c). The application of some similarity measures to association rule thresholds. Journal of the Korean Data Analysis Society, 13, 1331-1342.
  18. Park, H. C. (2012). Exploration of symmetric similarity measures by conditional probabilities as association rule thresholds. Journal of the Korean Data Analysis Society, 14, 707-716.
  19. Park, H. C. (2013). A study on comparison of non-linear regression model for decision of association rule numbers. Journal of the Korean Data Analysis Society, 15, To be published.
  20. Park J. S., Chen M. S. and Philip, S. Y. (1995). An effective hash-based algorithms for mining association rules. Proceedings of ACM SIGMOD Conference on Management of Data, 175-186.
  21. Pasquier, N., Bastide, Y., Taouil, R. and Lakhal, L. (1999). Discovering frequent closed itemsets for association rules. Proceedings of the 7th International Conference on Database Theory, 398-416.
  22. Pei, J., Han, J. and Mao, R. (2000). CLOSET: an efficient algorithm for mining frequent closed itemsets. Proceedings of ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, 21-30.
  23. Srinkant, R., Vu, Q. and Agrawal, R. (1997). Mining association rules with item constraints. Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, 67-73.
  24. Toivonen, H. (1996). Sampling large database for association rules. Proceedings of the 22nd VLDB Conference, 134-145.
  25. Yi, W, Lu, M. and Liu, Z. (2011). Regression analysis in the number of association rules. International Journal of Automation and Computing, 8, 78-82. https://doi.org/10.1007/s11633-010-0557-x

Cited by

  1. A linearity test statistic in a simple linear regression vol.25, pp.2, 2014, https://doi.org/10.7465/jkdi.2014.25.2.305
  2. Generally non-linear regression model containing standardized lift for association number estimation vol.27, pp.3, 2016, https://doi.org/10.7465/jkdi.2016.27.3.629