• Title/Summary/Keyword: Kullback Information Distance

Search Result 17, Processing Time 0.022 seconds

Direct Divergence Approximation between Probability Distributions and Its Applications in Machine Learning

  • Sugiyama, Masashi;Liu, Song;du Plessis, Marthinus Christoffel;Yamanaka, Masao;Yamada, Makoto;Suzuki, Taiji;Kanamori, Takafumi
    • Journal of Computing Science and Engineering
    • /
    • v.7 no.2
    • /
    • pp.99-111
    • /
    • 2013
  • Approximating a divergence between two probability distributions from their samples is a fundamental challenge in statistics, information theory, and machine learning. A divergence approximator can be used for various purposes, such as two-sample homogeneity testing, change-point detection, and class-balance estimation. Furthermore, an approximator of a divergence between the joint distribution and the product of marginals can be used for independence testing, which has a wide range of applications, including feature selection and extraction, clustering, object matching, independent component analysis, and causal direction estimation. In this paper, we review recent advances in divergence approximation. Our emphasis is that directly approximating the divergence without estimating probability distributions is more sensible than a naive two-step approach of first estimating probability distributions and then approximating the divergence. Furthermore, despite the overwhelming popularity of the Kullback-Leibler divergence as a divergence measure, we argue that alternatives such as the Pearson divergence, the relative Pearson divergence, and the $L^2$-distance are more useful in practice because of their computationally efficient approximability, high numerical stability, and superior robustness against outliers.

An Analysis of Fuzzy Survey Data Based on the Maximum Entropy Principle (최대 엔트로피 분포를 이용한 퍼지 관측데이터의 분석법에 관한 연구)

  • 유재휘;유동일
    • Journal of the Korea Society of Computer and Information
    • /
    • v.3 no.2
    • /
    • pp.131-138
    • /
    • 1998
  • In usual statistical data analysis, we describe statistical data by exact values. However, in modem complex and large-scale systems, it is difficult to treat the systems using only exact data. In this paper, we define these data as fuzzy data(ie. Linguistic variable applied to make the member-ship function.) and Propose a new method to get an analysis of fuzzy survey data based on the maximum entropy Principle. Also, we propose a new method of discrimination by measuring distance between a distribution of the stable state and estimated distribution of the present state using the Kullback - Leibler information. Furthermore, we investigate the validity of our method by computer simulations under realistic situations.

  • PDF

Bayesian Model Selection in the Unbalanced Random Effect Model

  • Kim, Dal-Ho;Kang, Sang-Gil;Lee, Woo-Dong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.15 no.4
    • /
    • pp.743-752
    • /
    • 2004
  • In this paper, we develop the Bayesian model selection procedure using the reference prior for comparing two nested model such as the independent and intraclass models using the distance or divergence between the two as the basis of comparison. A suitable criterion for this is the power divergence measure as introduced by Cressie and Read(1984). Such a measure includes the Kullback -Liebler divergence measures and the Hellinger divergence measure as special cases. For this problem, the power divergence measure turns out to be a function solely of $\rho$, the intraclass correlation coefficient. Also, this function is convex, and the minimum is attained at $\rho=0$. We use reference prior for $\rho$. Due to the duality between hypothesis tests and set estimation, the hypothesis testing problem can also be solved by solving a corresponding set estimation problem. The present paper develops Bayesian method based on the Kullback-Liebler and Hellinger divergence measures, rejecting $H_0:\rho=0$ when the specified divergence measure exceeds some number d. This number d is so chosen that the resulting credible interval for the divergence measure has specified coverage probability $1-{\alpha}$. The length of such an interval is compared with the equal two-tailed credible interval and the HPD credible interval for $\rho$ with the same coverage probability which can also be inverted into acceptance regions of $H_0:\rho=0$. Example is considered where the HPD interval based on the one-at- a-time reference prior turns out to be the shortest credible interval having the same coverage probability.

  • PDF

On the comparison of cumulative hazard functions

  • Park, Sangun;Ha, Seung Ah
    • Communications for Statistical Applications and Methods
    • /
    • v.26 no.6
    • /
    • pp.623-633
    • /
    • 2019
  • This paper proposes two distance measures between two cumulative hazard functions that can be obtained by comparing their difference and ratio, respectively. Then we estimate the measures and present goodness of t test statistics. Since the proposed test statistics are expressed in terms of the cumulative hazard functions, we can easily give more weights on earlier (or later) departures in cumulative hazards if we like to place an emphasis on earlier (or later) departures. We also show that these test statistics present comparable performances with other well-known test statistics based on the empirical distribution function for an exponential null distribution. The proposed test statistic is an omnibus test which is applicable to other lots of distributions than an exponential distribution.

Secure and Robust Clustering for Quantized Target Tracking in Wireless Sensor Networks

  • Mansouri, Majdi;Khoukhi, Lyes;Nounou, Hazem;Nounou, Mohamed
    • Journal of Communications and Networks
    • /
    • v.15 no.2
    • /
    • pp.164-172
    • /
    • 2013
  • We consider the problem of secure and robust clustering for quantized target tracking in wireless sensor networks (WSN) where the observed system is assumed to evolve according to a probabilistic state space model. We propose a new method for jointly activating the best group of candidate sensors that participate in data aggregation, detecting the malicious sensors and estimating the target position. Firstly, we select the appropriate group in order to balance the energy dissipation and to provide the required data of the target in the WSN. This selection is also based on the transmission power between a sensor node and a cluster head. Secondly, we detect the malicious sensor nodes based on the information relevance of their measurements. Then, we estimate the target position using quantized variational filtering (QVF) algorithm. The selection of the candidate sensors group is based on multi-criteria function, which is computed by using the predicted target position provided by the QVF algorithm, while the malicious sensor nodes detection is based on Kullback-Leibler distance between the current target position distribution and the predicted sensor observation. The performance of the proposed method is validated by simulation results in target tracking for WSN.

Region-based Multi-level Thresholding for Color Image Segmentation (영역 기반의 Multi-level Thresholding에 의한 컬러 영상 분할)

  • Oh, Jun-Taek;Kim, Wook-Hyun
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.43 no.6 s.312
    • /
    • pp.20-27
    • /
    • 2006
  • Multi-level thresholding is a method that is widely used in image segmentation. However most of the existing methods are not suited to be directly used in applicable fields and moreover expanded until a step of image segmentation. This paper proposes region-based multi-level thresholding as an image segmentation method. At first we classify pixels of each color channel to two clusters by using EWFCM(Entropy-based Weighted Fuzzy C-Means) algorithm that is an improved FCM algorithm with spatial information between pixels. To obtain better segmentation results, a reduction of clusters is then performed by a region-based reclassification step based on a similarity between regions existing in a cluster and the other clusters. The clusters are created using the classification information of pixels according to color channel. We finally perform a region merging by Bayesian algorithm based on Kullback-Leibler distance between a region and the neighboring regions as a post-processing method as many regions still exist in image. Experiments show that region-based multi-level thresholding is superior to cluster-, pixel-based multi-level thresholding, and the existing mettled. And much better segmentation results are obtained by the post-processing method.

A Study on Particle Filter based on KLD-Resampling for Wireless Patient Tracking

  • Ly-Tu, Nga;Le-Tien, Thuong;Mai, Linh
    • Industrial Engineering and Management Systems
    • /
    • v.16 no.1
    • /
    • pp.92-102
    • /
    • 2017
  • In this paper, we consider a typical health care system via the help of Wireless Sensor Network (WSN) for wireless patient tracking. The wireless patient tracking module of this system performs localization out of samples of Received Signal Strength (RSS) variations and tracking through a Particle Filter (PF) for WSN assisted by multiple transmit-power information. We propose a modified PF, Kullback-Leibler Distance (KLD)-resampling PF, to ameliorate the effect of RSS variations by generating a sample set near the high-likelihood region for improving the wireless patient tracking. The key idea of this method is to approximate a discrete distribution with an upper bound error on the KLD for reducing both location error and the number of particles used. To determine this bound error, an optimal algorithm is proposed based on the maximum gap error between the proposal and Sampling Important Resampling (SIR) algorithms. By setting up these values, a number of simulations using the health care system's data sets which contains the real RSSI measurements to evaluate the location error in term of various power levels and density nodes for all methods. Finally, we point out the effect of different power levels vs. different density nodes for the wireless patient tracking.