• Title/Summary/Keyword: Hellinger measure

Search Result 11, Processing Time 0.019 seconds

Signed Hellinger measure for directional association (연관성 방향을 고려한 부호 헬링거 측도의 제안)

  • Park, Hee Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.2
    • /
    • pp.353-362
    • /
    • 2016
  • By Wikipedia, data mining is the process of discovering patterns in a big data set involving methods at the intersection of association rule, decision tree, clustering, artificial intelligence, machine learning. and database systems. Association rule is a method for discovering interesting relations between items in large transactions by interestingness measures. Association rule interestingness measures play a major role within a knowledge discovery process in databases, and have been developed by many researchers. Among them, the Hellinger measure is a good association threshold considering the information content and the generality of a rule. But it has the drawback that it can not determine the direction of the association. In this paper we proposed a signed Hellinger measure to be able to interpret operationally, and we checked three conditions of association threshold. Furthermore, we investigated some aspects through a few examples. The results showed that the signed Hellinger measure was better than the Hellinger measure because the signed one was able to estimate the right direction of association.

Learning Multidimensional Sequential Patterns Using Hellinger Entropy Function (Hellinger 엔트로피를 이용한 다차원 연속패턴의 생성방법)

  • Lee, Chang-Hwan
    • The KIPS Transactions:PartB
    • /
    • v.11B no.4
    • /
    • pp.477-484
    • /
    • 2004
  • The technique of sequential pattern mining means generating a set of inter-transaction patterns residing in time-dependent data. This paper proposes a new method for generating sequential patterns with the use of Hellinger measure. While the current methods are generating single dimensional sequential patterns within a single attribute, the proposed method is able to detect multi-dimensional patterns among different attributes. A number of heuristics, based on the characteristics of Hellinger measure, are proposed to reduce the computational complexity of the sequential pattern systems. Some experimental results are presented.

A Combined Method of Rule Induction Learning and Instance-Based Learning (귀납법칙 학습과 개체위주 학습의 결합방법)

  • Lee, Chang-Hwan
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.9
    • /
    • pp.2299-2308
    • /
    • 1997
  • While most machine learning research has been primarily concerned with the development of systems that implement one type of learning strategy, we use a multistrategy approach which integrates rule induction learning and instance-based learning, and show how this marriage allows for overall better performance. In the rule induction learning phase, we derive an entropy function, based on Hellinger divergence, which can measure the amount of information each inductive rule contains, and show how well the Hellinger divergence measures the importance of each rule. We also propose some heuristics to reduce the computational complexity by analyzing the characteristics of the Hellinger measure. In the instance-based learning phase, we improve the current instance-based learning method in a number of ways. The system has been implemented and tested on a number of well-known machine learning data sets. The performance of the system has been compared with that of other classification learning technique.

  • PDF

A New Importance Measure of Association Rules Using Information Theory (정보이론에 기반한 연관 규칙들의 새로운 중요도 측정 방법)

  • Lee, Chang-Hwan;Bae, Joohyun
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.3 no.1
    • /
    • pp.37-42
    • /
    • 2014
  • The abstract should concisely state what was done, how it was done, principal results, and their significance. It should be less than 300 words for all forms of publication. The abstract should be written as one paragraph and should not contain tabular material or numbered references. At the end of abstract, keywords should be given in 3 to 5 words or phrases.

An Efficient Mining Algorithm for Generating Probabilistic Multidimensional Sequential Patterns (확률적 다차원 연속패턴의 생성을 위한 효율적인 마이닝 알고리즘)

  • Lee Chang-Hwan
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.2
    • /
    • pp.75-84
    • /
    • 2005
  • Sequential pattern mining is an important data mining problem with broad applications. While the current methods are generating sequential patterns within a single attribute, the proposed method is able to detect them among different attributes. By incorporating these additional attributes, the sequential patterns found are richer and more informative to the user This paper proposes a new method for generating multi-dimensional sequential patterns with the use of Hellinger entropy measure. Unlike the Previously used methods, the proposed method can calculate the significance of each sequential pattern. Two theorems are proposed to reduce the computational complexity of the proposed system. The proposed method is tested on some synthesized purchase transaction databases.

Empirical Comparisons of Disparity Measures for Partial Association Models in Three Dimensional Contingency Tables

  • Jeong, D.B.;Hong, C.S.;Yoon, S.H.
    • Communications for Statistical Applications and Methods
    • /
    • v.10 no.1
    • /
    • pp.135-144
    • /
    • 2003
  • This work is concerned with comparison of the recently developed disparity measures for the partial association model in three dimensional categorical data. Data are generated by using simulation on each term in the log-linear model equation based on the partial association model, which is a proposed method in this paper. This alternative Monte Carlo methods are explored to study the behavior of disparity measures such as the power divergence statistic I(λ), the Pearson chi-square statistic X$^2$, the likelihood ratio statistic G$^2$, the blended weight chi-square statistic BWCS(λ), the blended weight Hellinger distance statistic BWHD(λ), and the negative exponential disparity statistic NED(λ) for moderate sample sizes. We find that the power divergence statistic I(2/3) and the blended weight Hellinger distance family BWHD(1/9) are the best tests with respect to size and power.

Empirical Comparisons of Disparity Measures for Three Dimensional Log-Linear Models

  • Park, Y.S.;Hong, C.S.;Jeong, D.B.
    • Journal of the Korean Data and Information Science Society
    • /
    • v.17 no.2
    • /
    • pp.543-557
    • /
    • 2006
  • This paper is concerned with the applicability of the chi-square approximation to the six disparity statistics: the Pearson chi-square, the generalized likelihood ratio, the power divergence, the blended weight chi-square, the blended weight Hellinger distance, and the negative exponential disparity statistic. Three dimensional contingency tables of small and moderate sample sizes are generated to be fitted to all possible hierarchical log-linear models: the completely independent model, the conditionally independent model, the partial association models, and the model with one variable independent of the other two. For models with direct solutions of expected cell counts, point estimates and confidence intervals of the 90 and 95 percentage points of six statistics are explored. For model without direct solutions, the empirical significant levels and the empirical powers of six statistics to test the significance of the three factor interaction are computed and compared.

  • PDF

Bayesian Model Selection in the Unbalanced Random Effect Model

  • Kim, Dal-Ho;Kang, Sang-Gil;Lee, Woo-Dong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.15 no.4
    • /
    • pp.743-752
    • /
    • 2004
  • In this paper, we develop the Bayesian model selection procedure using the reference prior for comparing two nested model such as the independent and intraclass models using the distance or divergence between the two as the basis of comparison. A suitable criterion for this is the power divergence measure as introduced by Cressie and Read(1984). Such a measure includes the Kullback -Liebler divergence measures and the Hellinger divergence measure as special cases. For this problem, the power divergence measure turns out to be a function solely of $\rho$, the intraclass correlation coefficient. Also, this function is convex, and the minimum is attained at $\rho=0$. We use reference prior for $\rho$. Due to the duality between hypothesis tests and set estimation, the hypothesis testing problem can also be solved by solving a corresponding set estimation problem. The present paper develops Bayesian method based on the Kullback-Liebler and Hellinger divergence measures, rejecting $H_0:\rho=0$ when the specified divergence measure exceeds some number d. This number d is so chosen that the resulting credible interval for the divergence measure has specified coverage probability $1-{\alpha}$. The length of such an interval is compared with the equal two-tailed credible interval and the HPD credible interval for $\rho$ with the same coverage probability which can also be inverted into acceptance regions of $H_0:\rho=0$. Example is considered where the HPD interval based on the one-at- a-time reference prior turns out to be the shortest credible interval having the same coverage probability.

  • PDF

Minimum Disparity Estimation for Normal Models: Small Sample Efficiency

  • Cho M. J.;Hong C. S.;Jeong D. B.
    • Communications for Statistical Applications and Methods
    • /
    • v.12 no.1
    • /
    • pp.149-167
    • /
    • 2005
  • The minimum disparity estimators introduced by Lindsay and Basu (1994) are studied empirically. An extensive simulation in this paper provides a location estimate of the small sample and supplies empirical evidence of the estimator performance for the univariate contaminated normal model. Empirical results show that the minimum generalized negative exponential disparity estimator (MGNEDE) obtains high efficiency for small sample sizes and dominates the maximum likelihood estimator (MLE) and the minimum blended weight Hellinger distance estimator (MBWHDE) with respect to efficiency at the contaminated model.

Object Detection Based on Hellinger Distance IoU and Objectron Application (Hellinger 거리 IoU와 Objectron 적용을 기반으로 하는 객체 감지)

  • Kim, Yong-Gil;Moon, Kyung-Il
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.22 no.2
    • /
    • pp.63-70
    • /
    • 2022
  • Although 2D Object detection has been largely improved in the past years with the advance of deep learning methods and the use of large labeled image datasets, 3D object detection from 2D imagery is a challenging problem in a variety of applications such as robotics, due to the lack of data and diversity of appearances and shapes of objects within a category. Google has just announced the launch of Objectron that has a novel data pipeline using mobile augmented reality session data. However, it also is corresponding to 2D-driven 3D object detection technique. This study explores more mature 2D object detection method, and applies its 2D projection to Objectron 3D lifting system. Most object detection methods use bounding boxes to encode and represent the object shape and location. In this work, we explore a stochastic representation of object regions using Gaussian distributions. We also present a similarity measure for the Gaussian distributions based on the Hellinger Distance, which can be viewed as a stochastic Intersection-over-Union. Our experimental results show that the proposed Gaussian representations are closer to annotated segmentation masks in available datasets. Thus, less accuracy problem that is one of several limitations of Objectron can be relaxed.