• Title/Summary/Keyword: 연관성 측도

Search Result 47, Processing Time 0.028 seconds

Mining Generalized Association Rules Using Fuzzy Concept Hierarchy (퍼지 개념 계층을 도입한 일반화된 연관 규칙 마이닝)

  • 손봉기;김동호;이건명
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2000.10b
    • /
    • pp.84-86
    • /
    • 2000
  • 연관 규칙 마이닝 과정에 참조되는 일반 개념 계층은 개념간의 명확한 관계만을 표현한다. 실제로는 개념 사이의 관계가 애매한 경우가 많다. 이 논문에서는 개념간의 애매한 관계까지 반영할 수 있는 퍼지 개념 계층을 이용하여 일반화된 연관 규칙을 마이닝하는 방법을 제안한다. 퍼지 개념 계층에서의 하위 개념을 상위 개념으로 적절하게 반영하는 방법과 마이닝된 연관 규칙에서 중복되는 규칙의 가지치기(pruning)에 사용되는 측도를 소개한다. 또한 퍼지 개념 계층을 이용한 일반화된 연관 규칙 마이닝 방법의 응용성을 보이기 위해 실험 과정과 결과를 보인다.

  • PDF

The development of symmetrically and attributably pure confidence in association rule mining (연관성 규칙에서 활용 가능한 대칭적 기여 순수 신뢰도의 개발)

  • Park, Hee Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.3
    • /
    • pp.601-609
    • /
    • 2014
  • The most widely used data mining technique for big data analysis is to generate meaningful association rules. This method has been used to find the relationship between set of items based on the association criteria such as support, confidence, lift, etc. Among them, confidence is the most frequently used, but it has the drawback that we can not know the direction of association by it. The attributably pure confidence was developed to compensate for this drawback, but the value was changed by the position of two item sets. In this paper, we propose four symmetrically and attributably pure confidence measures to compensate the shortcomings of confidence and the attributably pure confidence. And then we prove three conditions of interestingness measure by Piatetsky-Shapiro, and comparative studies with confidence, attributably pure confidence, and four symmetrically and attributably pure confidence measures are shown by numerical examples. The results show that the symmetrically and attributably pure confidence measures are better than confidence and the attributably pure confidence. Also the measure NSAPis found to be the best among these four symmetrically and attributably pure confidence measures.

Weighted association rules considering item RFM scores (항목 알에프엠 점수를 고려한 가중 연관성 규칙)

  • Park, Hee-Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.21 no.6
    • /
    • pp.1147-1154
    • /
    • 2010
  • One of the important goals in data mining is to discover and decide the relationships between different variables. Association rules are required for this technique and it find meaningful rules by quantifying the relationship between two items based on association measures such as support, confidence, and lift. In this paper, we presented the evaluation criteria of weighted association rule considering item RFM scores as importance of items. Original RFM technique has been used most widely applied method using customer information to find the most profitable customers. And then we compared general association rule technique with weighted association rule technique through the simulation data.

Categorical Date Analysis System in the internet (인터넷상에서의 범주형 자료분석 시스템 개발)

  • 홍종선;김동욱;오민권
    • The Korean Journal of Applied Statistics
    • /
    • v.12 no.1
    • /
    • pp.83-95
    • /
    • 1999
  • 본 논문의 목적은 인터넷에서 범주형 자료분석에 대한 전문적인 지식이 없는 일반 분석자들에게 보다 쉽고, 간편하게 다룰 수 있는 범주형 자료 분석 시스템을 제공하는것이다. 이 분석 시스템은 크게 세 가지 측면으로 설계하여 구현하였다. 첫째, 범주형 자료에 대한 탐색적 자료분석을 위하여 세 가지 종류의 히스토그램을 제공한다. 둘째, 범주형 변수들간에 존재하는 연관성을 측정하기 위한 여러 연관성 측도들을 제공한다. 특히, 현재 많이 사용되는 통계 패키지들에서 제공하지 못하는 모자익 그림과 연관 그림을 동적 그래픽스로 구현하여 연관성을 측정하거나 모형을 설정하는데 유용한 정보를 얻을 수 있도록 하였다. 셋째, 대수선형모형에 대한 분석을 통해 사용자가 가장 잘 적합된 대수선형모형을 선택할 수 있게 하였다.

  • PDF

Standardization for basic association measures in association rule mining (연관 규칙 마이닝에서의 평가기준 표준화 방안)

  • Park, Hee-Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.21 no.5
    • /
    • pp.891-899
    • /
    • 2010
  • Association rule is the technique to represent the relationship between two or more items by numerical representing for the relevance of each item in vast amounts of databases, and is most being used in data mining. The basic thresholds for association rule are support, confidence, and lift. these are used to generate the association rules. We need standardization of lift because the range of lift value is different from that of support and confidence. And also we need standardization of support and confidence to compare objectively association level of antecedent variables for one descendant variable. In this paper we propose a method for standardization of association thresholds considering marginal probability for each item to grasp objectively and exactly association level, check the conditions for association criteria and then compare association thresholds with standardized association thresholds using some concrete examples.

On the Privacy Preserving Mining Association Rules by using Randomization (연관규칙 마이닝에서 랜덤화를 이용한 프라이버시 보호 기법에 관한 연구)

  • Kang, Ju-Sung;Cho, Sung-Hoon;Yi, Ok-Yeon;Hong, Do-Won
    • The KIPS Transactions:PartC
    • /
    • v.14C no.5
    • /
    • pp.439-452
    • /
    • 2007
  • We study on the privacy preserving data mining, PPDM for short, by using randomization. The theoretical PPDM based on the secure multi-party computation techniques is not practical for its computational inefficiency. So we concentrate on a practical PPDM, especially randomization technique. We survey various privacy measures and study on the privacy preserving mining of association rules by using randomization. We propose a new randomization operator, binomial selector, for privacy preserving technique of association rule mining. A binomial selector is a special case of a select-a-size operator by Evfimievski et al.[3]. Moreover we present some simulation results of detecting an appropriate parameter for a binomial selector. The randomization by a so-called cut-and-paste method in [3] is not efficient and has high variances on recovered support values for large item-sets. Our randomization by a binomial selector make up for this defects of cut-and-paste method.

Generating Multidimensional Random Tables (다차원 임의 분할표 생성)

  • Choi, Hyun-Jip
    • The Korean Journal of Applied Statistics
    • /
    • v.19 no.3
    • /
    • pp.545-554
    • /
    • 2006
  • We suggest a method for generating multidimensional random tables based on the log-linear models. A linear combination approach by Lee(1997) is applied to get the joint distribution with the well known Pearson chi-squared statistics. We can generate completely associated joint distributions which have the fixed association among three variables by using the suggested method. Therefore the method can be extended to more higher dimension than the three dimensional tables.

Analysis of Large Tables (대규모 분할표 분석)

  • Choi, Hyun-Jip
    • The Korean Journal of Applied Statistics
    • /
    • v.18 no.2
    • /
    • pp.395-410
    • /
    • 2005
  • For the analysis of large tables formed by many categorical variables, we suggest a method to group the variables into several disjoint groups in which the variables are completely associated within the groups. We use a simple function of Kullback-Leibler divergence as a similarity measure to find the groups. Since the groups are complete hierarchical sets, we can identify the association structure of the large tables by the marginal log-linear models. Examples are introduced to illustrate the suggested method.

Improvements of K-modes Algorithm and ROCK Algorithm (K-모드 알고리즘과 ROCK 알고리즘의 개선)

  • 김보화;김규성
    • The Korean Journal of Applied Statistics
    • /
    • v.15 no.2
    • /
    • pp.381-393
    • /
    • 2002
  • K-modes algorithm and ROCK(RObust Clustering using linKs) algorithm we useful clustering methods for large categorical data. In the paper, we investigate these algorithms and propose improved algorithms of them to correct their weakness. A simulation study shows that the proposed algorithms could increase the performance of data clustering.

Non-parametric approach for the grouped dissimilarities using the multidimensional scaling and analysis of distance (다차원척도법과 거리분석을 활용한 그룹화된 비유사성에 대한 비모수적 접근법)

  • Nam, Seungchan;Choi, Yong-Seok
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.4
    • /
    • pp.567-578
    • /
    • 2017
  • Grouped multivariate data can be tested for differences between two or more groups using multivariate analysis of variance (MANOVA). However, this method cannot be used if several assumptions of MANOVA are violated. In this case, multidimensional scaling (MDS) and analysis of distance (AOD) can be applied to grouped dissimilarities based on the various distances. A permutation test is a non-parametric method that can also be used to test differences between groups. MDS is used to calculate the coordinates of observations from dissimilarities and AOD is useful for finding group structure using the coordinates. In particular, AOD is mathematically associated with MANOVA if using the Euclidean distance when computing dissimilarities. In this paper, we study the between and within group structure by applying MDS and AOD to the grouped dissimilarities. In addition, we propose a new test statistic using the group structure for the permutation test. Finally, we investigate the relationship between AOD and MANOVA from dissimilarities based on the Euclidean distance.