• Title/Summary/Keyword: Divergence

Search Result 1,162, Processing Time 0.029 seconds

Improving the Performance of Document Clustering with Distributional Similarities (분포유사도를 이용한 문헌클러스터링의 성능향상에 대한 연구)

  • Lee, Jae-Yun
    • Journal of the Korean Society for information Management
    • /
    • v.24 no.4
    • /
    • pp.267-283
    • /
    • 2007
  • In this study, measures of distributional similarity such as KL-divergence are applied to cluster documents instead of traditional cosine measure, which is the most prevalent vector similarity measure for document clustering. Three variations of KL-divergence are investigated; Jansen-Shannon divergence, symmetric skew divergence, and minimum skew divergence. In order to verify the contribution of distributional similarities to document clustering, two experiments are designed and carried out on three test collections. In the first experiment the clustering performances of the three divergence measures are compared to that of cosine measure. The result showed that minimum skew divergence outperformed the other divergence measures as well as cosine measure. In the second experiment second-order distributional similarities are calculated with Pearson correlation coefficient from the first-order similarity matrixes. From the result of the second experiment, secondorder distributional similarities were found to improve the overall performance of document clustering. These results suggest that minimum skew divergence must be selected as document vector similarity measure when considering both time and accuracy, and second-order similarity is a good choice for considering clustering accuracy only.

The Bandwidth from the Density Power Divergence

  • Pak, Ro Jin
    • Communications for Statistical Applications and Methods
    • /
    • v.21 no.5
    • /
    • pp.435-444
    • /
    • 2014
  • The most widely used optimal bandwidth is known to minimize the mean integrated squared error(MISE) of a kernel density estimator from a true density. In this article proposes, we propose a bandwidth which asymptotically minimizes the mean integrated density power divergence(MIDPD) between a true density and a corresponding kernel density estimator. An approximated form of the mean integrated density power divergence is derived and a bandwidth is obtained as a product of minimization based on the approximated form. The resulting bandwidth resembles the optimal bandwidth by Parzen (1962), but it reflects the nature of a model density more than the existing optimal bandwidths. We have one more choice of an optimal bandwidth with a firm theoretical background; in addition, an empirical study we show that the bandwidth from the mean integrated density power divergence can produce a density estimator fitting a sample better than the bandwidth from the mean integrated squared error.

Reliability based analysis of torsional divergence of long span suspension bridges

  • Cheng, Jin;Li, Q.S.
    • Wind and Structures
    • /
    • v.12 no.2
    • /
    • pp.121-132
    • /
    • 2009
  • A systematic reliability evaluation approach for torsional divergence analysis of long span suspension bridges is proposed, consisting of the first order reliability method and a simplified torsional divergence analysis method. The proposed method was implemented in the deterministic torsional divergence analysis program SIMTDB through a new strategy involving interfacing the proposed method with SIMTDB via a freely available MATLAB software tool (FERUM). A numerical example involving a detailed computational model of a long span suspension bridge with a main span of 888 m is presented to demonstrate the applicability and merits of the proposed method and the associated software strategy. Finally, the most influential random variables on the reliability of long span suspension bridges against torsional divergence failure are identified by a sensitivity analysis.

Bayesian Model Selection in the Unbalanced Random Effect Model

  • Kim, Dal-Ho;Kang, Sang-Gil;Lee, Woo-Dong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.15 no.4
    • /
    • pp.743-752
    • /
    • 2004
  • In this paper, we develop the Bayesian model selection procedure using the reference prior for comparing two nested model such as the independent and intraclass models using the distance or divergence between the two as the basis of comparison. A suitable criterion for this is the power divergence measure as introduced by Cressie and Read(1984). Such a measure includes the Kullback -Liebler divergence measures and the Hellinger divergence measure as special cases. For this problem, the power divergence measure turns out to be a function solely of $\rho$, the intraclass correlation coefficient. Also, this function is convex, and the minimum is attained at $\rho=0$. We use reference prior for $\rho$. Due to the duality between hypothesis tests and set estimation, the hypothesis testing problem can also be solved by solving a corresponding set estimation problem. The present paper develops Bayesian method based on the Kullback-Liebler and Hellinger divergence measures, rejecting $H_0:\rho=0$ when the specified divergence measure exceeds some number d. This number d is so chosen that the resulting credible interval for the divergence measure has specified coverage probability $1-{\alpha}$. The length of such an interval is compared with the equal two-tailed credible interval and the HPD credible interval for $\rho$ with the same coverage probability which can also be inverted into acceptance regions of $H_0:\rho=0$. Example is considered where the HPD interval based on the one-at- a-time reference prior turns out to be the shortest credible interval having the same coverage probability.

  • PDF

Centroid-model based music similarity with alpha divergence (알파 다이버전스를 이용한 무게중심 모델 기반 음악 유사도)

  • Seo, Jin Soo;Kim, Jeonghyun;Park, Jihyun
    • The Journal of the Acoustical Society of Korea
    • /
    • v.35 no.2
    • /
    • pp.83-91
    • /
    • 2016
  • Music-similarity computation is crucial in developing music information retrieval systems for browsing and classification. This paper overviews the recently-proposed centroid-model based music retrieval method and applies the distributional similarity measures to the model for retrieval-performance evaluation. Probabilistic distance measures (also called divergence) compute the distance between two probability distributions in a certain sense. In this paper, we consider the alpha divergence in computing distance between two centroid models for music retrieval. The alpha divergence includes the widely-used Kullback-Leibler divergence and Bhattacharyya distance depending on the values of alpha. Experiments were conducted on both genre and singer datasets. We compare the music-retrieval performance of the distributional similarity with that of the vector distances. The experimental results show that the alpha divergence improves the performance of the centroid-model based music retrieval.

Local Sensitivity Analysis using Divergence Measures under Weighted Distribution

  • Chung, Younshik;Dey, Dipak K.
    • Journal of the Korean Statistical Society
    • /
    • v.30 no.3
    • /
    • pp.467-480
    • /
    • 2001
  • This paper considers the use of local $\phi$-divergence measures between posterior distributions under classes of perturbations in order to investigate the inherent robustness of certain classes. The smaller value of the limiting local $\phi$-divergence implies more robustness for the prior or the likelihood. We consider the cases when the likelihood comes form the class of weighted distribution. Two kinds of perturbations are considered for the local sensitivity analysis. In addition, some numerical examples are considered which provide measures of robustness.

  • PDF

ON THE GOODNESS OF FIT TEST FOR DISCRETELY OBSERVED SAMPLE FROM DIFFUSION PROCESSES: DIVERGENCE MEASURE APPROACH

  • Lee, Sang-Yeol
    • Journal of the Korean Mathematical Society
    • /
    • v.47 no.6
    • /
    • pp.1137-1146
    • /
    • 2010
  • In this paper, we study the divergence based goodness of fit test for partially observed sample from diffusion processes. In order to derive the limiting distribution of the test, we study the asymptotic behavior of the residual empirical process based on the observed sample. It is shown that the residual empirical process converges weakly to a Brownian bridge and the associated phi-divergence test has a chi-square limiting null distribution.

NEW INFORMATION INEQUALITIES ON ABSOLUTE VALUE OF THE FUNCTIONS AND ITS APPLICATION

  • CHHABRA, PRAPHULL
    • Journal of applied mathematics & informatics
    • /
    • v.35 no.3_4
    • /
    • pp.371-385
    • /
    • 2017
  • Jain and Saraswat (2012) introduced new generalized f-information divergence measure, by which we obtained many well known and new information divergences. In this work, we introduce new information inequalities in absolute form on this new generalized divergence by considering convex normalized functions. Further, we apply these inequalities for getting new relations among well known divergences, together with numerical verification. Application to the Mutual information is also presented. Asymptotic approximation in terms of Chi- square divergence is done as well.

Minimum Density Power Divergence Estimation for Normal-Exponential Distribution (정규-지수분포에 대한 최소밀도함수승간격 추정법)

  • Pak, Ro Jin
    • The Korean Journal of Applied Statistics
    • /
    • v.27 no.3
    • /
    • pp.397-406
    • /
    • 2014
  • The minimum density power divergence estimation has been a popular topic in the field of robust estimation for since Basu et al. (1988). The minimum density power divergence estimator has strong robustness properties with the little loss in asymptotic efficiency relative to the maximum likelihood estimator under model conditions. However, a limitation in applying this estimation method is the algebraic difficulty on an integral involved in an estimation function. This paper considers a minimum density power divergence estimation method with approximated divergence avoiding such difficulty. As an example, we consider the normal-exponential convolution model introduced by Bolstad (2004). The estimated divergence in this case is too complicated; consequently, a Laplace approximation is employed to obtain a manageable form. Simulations and an empirical study show that the minimum density power divergence estimators based on an approximated estimated divergence for the normal-exponential model perform adequately in terms of bias and efficiency.

Automatic Selection of the Turning Parametter in the Minimum Density Power Divergence Estimation

  • Changkon Hong;Kim, Youngseok
    • Journal of the Korean Statistical Society
    • /
    • v.30 no.3
    • /
    • pp.453-465
    • /
    • 2001
  • It is often the case that one wants to estimate parameters of the distribution which follows certain parametric model, while the dta are contaminated. it is well known that the maximum likelihood estimators are not robust to contamination. Basuet al.(1998) proposed a robust method called the minimum density power divergence estimation. In this paper, we investigate data-driven selection of the tuning parameter $\alpha$ in the minimum density power divergence estimation. A criterion is proposed and its performance is studied through the simulation. The simulation includes three cases of estimation problem.

  • PDF