• Title/Summary/Keyword: 연관성 평가 기준

Search Result 262, Processing Time 0.025 seconds

Association rule thresholds considering the number of possible rules of interest items (관심 항목의 발생 가능한 규칙의 수를 고려한 연관성 평가기준)

  • Park, Hee-Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.23 no.4
    • /
    • pp.717-725
    • /
    • 2012
  • Data mining is a method to find useful information for large amounts of data in database. One of the well-studied problems in data mining is exploration for association rules. Association rule mining searches for interesting relationships among items in a given database by support, confidence, and lift. If we use the existing association rules, we can commit some errors by information loss not to consider the size of occurrence frequency. In this paper, we proposed a new association rule thresholds considering the number of possible rules of interest items and compare with existing association rule thresholds by example and real data. As the results, the new association rule thresholds were more useful than existing thresholds.

Proposition of causal association rule thresholds (인과적 연관성 규칙 평가 기준의 제안)

  • Park, Hee Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.6
    • /
    • pp.1189-1197
    • /
    • 2013
  • Data mining is the process of analyzing a huge database from different perspectives and summarizing it into useful information. One of the well-studied problems in data mining is association rule generation. Association rule mining finds the relationship among several items in massive volume database using the interestingness measures such as support, confidence, lift, etc. Typical applications for this technique include retail market basket analysis, item recommendation systems, cross-selling, customer relationship management, etc. But these interestingness measures cannot be used to establish a causality relationship between antecedent and consequent item sets. This paper propose causal association thresholds to compensate for this problem, and then check the three conditions of interestingness measures. The comparative studies with basic and causal association thresholds are shown by numerical example. The results show that causal association thresholds are better than basic association thresholds.

Proposition of causally confirmed measures in association rule mining (인과적 확인 측도에 의한 연관성 규칙 탐색)

  • Park, Hee Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.4
    • /
    • pp.857-868
    • /
    • 2014
  • Data mining is the representative analysis methodology in the era of big data, and is the process to analyze a massive volume database and summarize it into meaningful information. Association rule technique finds the relationship among several items in huge database using the interestingness measures such as support, confidence, lift, etc. But these interestingness measures cannot be used to establish a causality relationship between antecedent and consequent item sets. Moreover, we can not know association direction by them. This paper propose causally confirmed association thresholds to compensate for these problems, and then check the three conditions of interestingness measures. The comparative studies with basic association thresholds, causal association thresholds, and causally confirmed association thresholds are shown by simulation studies. The results show that causally confirmed association thresholds are better than basic and causal association thresholds.

Exploration of relationship between confirmation measures and association thresholds (기준 확인 측도와 연관성 평가기준과의 관계 탐색)

  • Park, Hee Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.4
    • /
    • pp.835-845
    • /
    • 2013
  • Association rule of data mining techniques is the method to quantify the relevance between a set of items in a big database, andhas been applied in various fields like manufacturing industry, shopping mall, healthcare, insurance, and education. Philosophers of science have proposed interestingness measures for various kinds of patterns, analyzed their theoretical properties, evaluated them empirically, and suggested strategies to select appropriate measures for particular domains and requirements. Such interestingness measures are divided into objective, subjective, and semantic measures. Objective measures are based on data used in the discovery process and are typically motivated by statistical considerations. Subjective measures take into account not only the data but also the knowledge and interests of users who examine the pattern, while semantic measures additionally take into account utility and actionability. In a very different context, researchers have devoted a lot of attention to measures of confirmation or evidential support. The focus in this paper was on asymmetric confirmation measures, and we compared confirmation measures with basic association thresholds using some simulation data. As the result, we could distinguish the direction of association rule by confirmation measures, and interpret degree of association operationally by them. Futhermore, the result showed that the measure by Rips and that by Kemeny and Oppenheim were better than other confirmation measures.

Standardization for basic association measures in association rule mining (연관 규칙 마이닝에서의 평가기준 표준화 방안)

  • Park, Hee-Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.21 no.5
    • /
    • pp.891-899
    • /
    • 2010
  • Association rule is the technique to represent the relationship between two or more items by numerical representing for the relevance of each item in vast amounts of databases, and is most being used in data mining. The basic thresholds for association rule are support, confidence, and lift. these are used to generate the association rules. We need standardization of lift because the range of lift value is different from that of support and confidence. And also we need standardization of support and confidence to compare objectively association level of antecedent variables for one descendant variable. In this paper we propose a method for standardization of association thresholds considering marginal probability for each item to grasp objectively and exactly association level, check the conditions for association criteria and then compare association thresholds with standardized association thresholds using some concrete examples.

The application for predictive similarity measures of binary data in association rule mining (이분형 예측 유사성 측도의 연관성 평가 기준 적용 방안)

  • Park, Hee-Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.22 no.3
    • /
    • pp.495-503
    • /
    • 2011
  • The most widely used data mining technique is to find association rules. Association rule mining is the method to quantify the relationship between each set of items in very huge database based on the association thresholds. There are some basic association thresholds to explore meaningful association rules ; support, confidence, lift, etc. Among them, confidence is the most frequently used, but it has the drawback that it can not determine the direction of the association. The net confidence and the attributably pure confidence were developed to compensate for this drawback, but they have other drawbacks.In this paper we consider some predictive similarity measures for binary data in cluster analysis and multi-dimensional analysis as association threshold to compensate for these drawbacks. The comparative studies with net confidence, attributably pure confidence, and some predictive similarity measures are shown by numerical example.

Negative Relative Feedback Using Reinforcement Learning (강화학습을 이용한 부정적 연관성 피드백)

  • Son, Ki-Jun;Lee, Jae-An;Lee, Sang-Jo
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2007.06c
    • /
    • pp.351-355
    • /
    • 2007
  • 문서 여과 시스템은 사용자의 정보요구를 기준으로 문서들을 선별하여 제시한다. 사용자의 정보요구는 하나 이상의 단어들로 구성된 프로파일로 표현이 되며, 문서의 여과 과정 동안에 발생하는 사용자의 연관성 평가를 통해 구체적인 내용으로 변할 수 있다. 기존 연구의 경우 사용자는 자신이 직접 연관성 평가에 참여하여 평가 정보를 입력하고, 사용자가 평가한 긍정적 피드백 정보를 이용하여 사용자 프로파일을 학습한다. 본 연구는 사용자가 평가한 긍정적 연관성 피드백 뿐만 아니라 부정적 연관성 피드백을 함께 이용한 사용자 프로파일 학습 방법을 제안한다. 제안된 방법과, 대표적인 연관성 피드백 방법인 Rocchio 방법과의 성능을 측정하기 위해 네 가지 토픽에 대하여 여과를 수행하였다. 실험한 결과 부정적 연관성 피드백 정보를 이용하였을 경우 Rocchio 방법 보다는 6% 더 성능이 높은 것을 볼 수 있었다. 실험결과 부정적 평가를 받은 문서를 이용하여 사용자가 선호하지 않는 문서를 제거함으로써 여과 시스템의 성능을 향상 시킬 수 있었다.

  • PDF

Generally non-linear regression model containing standardized lift for association number estimation (연관성 규칙 수의 추정을 위한 일반적인 비선형 회귀모형에서의 표준화 향상도 활용 방안)

  • Park, Hee Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.3
    • /
    • pp.629-638
    • /
    • 2016
  • Among data mining techniques, the association rule is one of the most used in the real fields because it clearly displays the relationship between two or more items in large databases by quantifying the relationship between the items. There are three primary quality measures for association rule; support, confidence, and lift. We evaluate association rules using these measures. The approach taken in the previous literatures as to estimation of association rule number has been one of a determination function method or a regression modeling approach. In this paper, we proposed a few of non-linear regression equations useful in estimating the number of rules and also evaluated the estimated association rules using the quality measures. Furthermore we assessed their usefulness as compared to conventional regression models using the values of regression coefficients, F statistics, adjusted coefficients of determination and variation inflation factor.

Non-linear regression model considering all association thresholds for decision of association rule numbers (기본적인 연관평가기준 전부를 고려한 비선형 회귀모형에 의한 연관성 규칙 수의 결정)

  • Park, Hee Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.2
    • /
    • pp.267-275
    • /
    • 2013
  • Among data mining techniques, the association rule is the most recently developed technique, and it finds the relevance between two items in a large database. And it is directly applied in the field because it clearly quantifies the relationship between two or more items. When we determine whether an association rule is meaningful, we utilize interestingness measures such as support, confidence, and lift. Interestingness measures are meaningful in that it shows the causes for pruning uninteresting rules statistically or logically. But the criteria of these measures are chosen by experiences, and the number of useful rules is hard to estimate. If too many rules are generated, we cannot effectively extract the useful rules.In this paper, we designed a variety of non-linear regression equations considering all association thresholds between the number of rules and three interestingness measures. And then we diagnosed multi-collinearity and autocorrelation problems, and used analysis of variance results and adjusted coefficients of determination for the best model through numerical experiments.

Utilization of similarity measures by PIM with AMP as association rule thresholds (모든 주변 비율을 고려한 확률적 흥미도 측도 기반 유사성 측도의 연관성 평가 기준 활용 방안)

  • Park, Hee Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.1
    • /
    • pp.117-124
    • /
    • 2013
  • Association rule of data mining techniques is the method to quantify the relationship between a set of items in a huge database, andhas been applied in various fields like internet shopping mall, healthcare, insurance, and education. There are three primary interestingness measures for association rule, support and confidence and lift. Confidence is the most important measure of these measures, and we generate some association rules using confidence. But it is an asymmetric measure and has only positive value. So we can face with difficult problems in generation of association rules. In this paper we apply the similarity measures by probabilistic interestingness measure (PIM) with all marginal proportions (AMP) to solve this problem. The comparative studies with support, confidences, lift, chi-square statistics, and some similarity measures by PIM with AMPare shown by numerical example. As the result, we knew that the similarity measures by PIM with AMP could be seen the degree of association same as confidence. And we could confirm the direction of association because they had the sign of their values, and select the best similarity measure by PIM with AMP.