• Title/Summary/Keyword: Kullback-Leibler Information

Search Result 60, Processing Time 0.028 seconds

Improved Tag Selection for Tag-cloud using the Dynamic Characteristics of Tag Co-occurrence (태그 동시 출현의 동적인 특징을 이용한 개선된 태그 클라우드의 태그 선택 방법)

  • Kim, Du-Nam;Lee, Kang-Pyo;Kim, Hyoung-Joo
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.15 no.6
    • /
    • pp.405-413
    • /
    • 2009
  • Tagging system is the system that allows internet users to assign new meta-data which is called tag to article, photo, video and etc. for facilitating searching and browsing of web contents. Tag cloud, a visual interface is widely used for browsing tag space. Tag cloud selects the tags with the highest frequency and presents them alphabetically with font size reflecting their popularity. However the conventional tag selection method includes known weaknesses. So, we propose a novel tag selection method Freshness, which helps to find fresh web contents. Freshness is the mean value of Kullback-Leibler divergences between each consecutive change of tag co-occurrence probability distribution. We collected tag data from three web sites, Allblog, Eolin and Technorati and constructed the system, 'Fresh Tag Cloud' which collects tag data and creates our tag cloud. Comparing the experimental results between Fresh Tag Cloud and the conventional one with data from Allblog, our one shows 87.5% less overlapping average, which means Fresh Tag Cloud outperforms the conventional tag cloud.

Centroid-model based music similarity with alpha divergence (알파 다이버전스를 이용한 무게중심 모델 기반 음악 유사도)

  • Seo, Jin Soo;Kim, Jeonghyun;Park, Jihyun
    • The Journal of the Acoustical Society of Korea
    • /
    • v.35 no.2
    • /
    • pp.83-91
    • /
    • 2016
  • Music-similarity computation is crucial in developing music information retrieval systems for browsing and classification. This paper overviews the recently-proposed centroid-model based music retrieval method and applies the distributional similarity measures to the model for retrieval-performance evaluation. Probabilistic distance measures (also called divergence) compute the distance between two probability distributions in a certain sense. In this paper, we consider the alpha divergence in computing distance between two centroid models for music retrieval. The alpha divergence includes the widely-used Kullback-Leibler divergence and Bhattacharyya distance depending on the values of alpha. Experiments were conducted on both genre and singer datasets. We compare the music-retrieval performance of the distributional similarity with that of the vector distances. The experimental results show that the alpha divergence improves the performance of the centroid-model based music retrieval.

MEASURE OF DEPARTURE FROM QUASI-SYMMETRY AND BRADLEY-TERRY MODELS FOR SQUARE CONTINGENCY TABLES WITH NOMINAL CATEGORIES

  • Kouji Tahata;Nobuko Miyamoto;Sadao Tomizawa
    • Journal of the Korean Statistical Society
    • /
    • v.33 no.1
    • /
    • pp.129-147
    • /
    • 2004
  • For square contingency tables with nominal categories, this paper proposes a measure to represent the degree of departure from the quasi-symmetry (QS) model and the Bradley-Terry (BT) model. The measure proposed is expressed by using the Cressie and Read (1984)'s power-divergence or Patil and Taillie (1982)'s diversity index. The measure lies between 0 and 1, and it is useful for comparing the degree of departure from QS or BT in several tables.

Blind Image Separation with Neural Learning Based on Information Theory and Higher-order Statistics (신경회로망 ICA를 이용한 혼합영상신호의 분리)

  • Cho, Hyun-Cheol;Lee, Kwon-Soon
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.57 no.8
    • /
    • pp.1454-1463
    • /
    • 2008
  • Blind source separation by independent component analysis (ICA) has applied in signal processing, telecommunication, and image processing to recover unknown original source signals from mutually independent observation signals. Neural networks are learned to estimate the original signals by unsupervised learning algorithm. Because the outputs of the neural networks which yield original source signals are mutually independent, then mutual information is zero. This is equivalent to minimizing the Kullback-Leibler convergence between probability density function and the corresponding factorial distribution of the output in neural networks. In this paper, we present a learning algorithm using information theory and higher order statistics to solve problem of blind source separation. For computer simulation two deterministic signals and a Gaussian noise are used as original source signals. We also test the proposed algorithm by applying it to several discrete images.

An Analysis of Fuzzy Survey Data Based on the Maximum Entropy Principle (최대 엔트로피 분포를 이용한 퍼지 관측데이터의 분석법에 관한 연구)

  • 유재휘;유동일
    • Journal of the Korea Society of Computer and Information
    • /
    • v.3 no.2
    • /
    • pp.131-138
    • /
    • 1998
  • In usual statistical data analysis, we describe statistical data by exact values. However, in modem complex and large-scale systems, it is difficult to treat the systems using only exact data. In this paper, we define these data as fuzzy data(ie. Linguistic variable applied to make the member-ship function.) and Propose a new method to get an analysis of fuzzy survey data based on the maximum entropy Principle. Also, we propose a new method of discrimination by measuring distance between a distribution of the stable state and estimated distribution of the present state using the Kullback - Leibler information. Furthermore, we investigate the validity of our method by computer simulations under realistic situations.

  • PDF

Generalized Measure of Departure From Global Symmetry for Square Contingency Tables with Ordered Categories

  • Tomizawa, Sadao;Saitoh, Kayo
    • Journal of the Korean Statistical Society
    • /
    • v.27 no.3
    • /
    • pp.289-303
    • /
    • 1998
  • For square contingency tables with ordered categories, Tomizawa (1995) considered two kinds of measures to represent the degree of departure from global symmetry, which means that the probability that an observation will fall in one of cells in the upper-right triangle of square table is equal to the probability that the observation falls in one of cells in the lower-left triangle of it. This paper proposes a generalization of those measures. The proposed measure is expressed by using Cressie and Read's (1984) power divergence or Patil and Taillie's (1982) diversity index. Special cases of the proposed measure include TomiBawa's measures. The proposed measure would be useful for comparing the degree of departure from global symmetry in several tables.

  • PDF

Goodness of Fit Test of Normality Based on Kullback-Leibler Information

  • Kim, Jong-Tae;Lee, Woo-Dong;Ko, Jung-Hwan;Yoon, Yong-Hwa;Kang, Sang-Gil
    • Communications for Statistical Applications and Methods
    • /
    • v.6 no.3
    • /
    • pp.909-918
    • /
    • 1999
  • Arizono and Ohta(1989) studied goodness of fit test of normality using the entropy estimator proposed by Vasicek (1976) Recently van Es(1992) and Correa(1995) proposed an estimator of entropy. In this paper we propose goodness of fit test statistics for normality based on Vasicek ven Es and Correa. And we compare the power of the proposed test statistics with Kolmogorov-Smirnov Kuiper Cramer von Mises Watson Anderson-Darling and Finkelstein and Schefer statistics.

  • PDF

On the comparison of cumulative hazard functions

  • Park, Sangun;Ha, Seung Ah
    • Communications for Statistical Applications and Methods
    • /
    • v.26 no.6
    • /
    • pp.623-633
    • /
    • 2019
  • This paper proposes two distance measures between two cumulative hazard functions that can be obtained by comparing their difference and ratio, respectively. Then we estimate the measures and present goodness of t test statistics. Since the proposed test statistics are expressed in terms of the cumulative hazard functions, we can easily give more weights on earlier (or later) departures in cumulative hazards if we like to place an emphasis on earlier (or later) departures. We also show that these test statistics present comparable performances with other well-known test statistics based on the empirical distribution function for an exponential null distribution. The proposed test statistic is an omnibus test which is applicable to other lots of distributions than an exponential distribution.

The Robustness of Coding and Modulation for Body-Area Networks

  • Biglieri, Ezio;Alrajeh, Nabil
    • Journal of Communications and Networks
    • /
    • v.16 no.3
    • /
    • pp.264-269
    • /
    • 2014
  • We consider transmission over body area networks. Due to the difficulty in assessing an accurate statistical model valid for multiple scenarios, we advocate a system design technique favoring robustness. Our approach, which is based on results in [12] and generalizes them, examines the variation of a performance metric when the nominal statistical distribution of fading is replaced by the worst distribution within a given Kullback-Leibler divergence from it. The sensitivity of the performance metric to the divergence from the nominal distribution can be used as an indication of the design robustness. This concept is applied by evaluating the error probability of binary uncoded modulation and the outage probability-the first parameter is useful to assess system performance with no error-control coding, while the second reflects the performance when a near-optimal code is used. The usefulness of channel coding can be assessed by comparing its robustness with that of uncoded transmission.

Effects on Regression Estimates under Misspecified Generalized Linear Mixed Models for Counts Data

  • Jeong, Kwang Mo
    • The Korean Journal of Applied Statistics
    • /
    • v.25 no.6
    • /
    • pp.1037-1047
    • /
    • 2012
  • The generalized linear mixed model(GLMM) is widely used in fitting categorical responses of clustered data. In the numerical approximation of likelihood function the normality is assumed for the random effects distribution; subsequently, the commercial statistical packages also routinely fit GLMM under this normality assumption. We may also encounter departures from the distributional assumption on the response variable. It would be interesting to investigate the impact on the estimates of parameters under misspecification of distributions; however, there has been limited researche on these topics. We study the sensitivity or robustness of the maximum likelihood estimators(MLEs) of GLMM for counts data when the true underlying distribution is normal, gamma, exponential, and a mixture of two normal distributions. We also consider the effects on the MLEs when we fit Poisson-normal GLMM whereas the outcomes are generated from the negative binomial distribution with overdispersion. Through a small scale Monte Carlo study we check the empirical coverage probabilities of parameters and biases of MLEs of GLMM.