Browse > Article
http://dx.doi.org/10.5351/CKSS.2006.13.2.297

Mutual Information and Redundancy for Categorical Data  

Hong, Chong-Sun (Department of Statistics, Sungkyunkwan University)
Kim, Beom-Jun (Department of Statistics, Sungkyunkwan University)
Publication Information
Communications for Statistical Applications and Methods / v.13, no.2, 2006 , pp. 297-307 More about this Journal
Abstract
Most methods for describing the relationship among random variables require specific probability distributions and some assumptions of random variables. The mutual information based on the entropy to measure the dependency among random variables does not need any specific assumptions. And the redundancy which is a analogous version of the mutual information was also proposed. In this paper, the redundancy and mutual information are explored to multi-dimensional categorical data. It is found that the redundancy for categorical data could be expressed as the function of the generalized likelihood ratio statistic under several kinds of independent log-linear models, so that the redundancy could also be used to analyze contingency tables. Whereas the generalized likelihood ratio statistic to test the goodness-of-fit of the log-linear models is sensitive to the sample size, the redundancy for categorical data does not depend on sample size but its cell probabilities itself.
Keywords
Entropy; Goodness of fit; Joint Independence; Log-linear Model; Redundancy;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Abramson, N. (1963). Information Theory and Coding, Mcgraw Hill, New York
2 Prichard, D. and Theiler, J (1995). Generating surrogate data for time series with several simultaneously measured variables. Physical Review. Vol. 73, 951-954
3 Wienholt, W. and Sendhoff, B. (1996), How to determine the redundancy of noisy chaotic time series. International Journal of Bifurcation and Chaos. Vol. 6. 101-117   DOI
4 Shannon, C.E, (1948). A mathematical theory of communication. The Bell System Technical Journal, Vol. 27, 379-423   DOI
5 DeGroot, M.H. (1962). Uncertainty, information and sequential experiments. Annals of Mathematical Statistics, Vol. 33, 404-419   DOI
6 Fraser, A. and Swinney, H. (1986). Independent coordinates for strange attractors from mutual information. Physical Review, Vol. 33(2), 1134-1140   DOI
7 Bishop, Y.M.M., Fienberg, S.E., and Holland, P.W. (1975). Discrete Multivariate Analysis, Cambridge, Massachusetts: MIT Press
8 Brillinger, R. (2004). Some data analyses using mutual information, Brazilian Journal of Probability and Statistics, Vol. 18, 163-183
9 Brillinger, R and Guha A. (2006). Mutual Information in the Frequency Domain, Journal of Statistical Planning and Inference, To appear
10 Cover, T. and Thomas, J. (1991). Elements of Information Theory, John Wiley and Sons, New York
11 Gallager, R.G. (1968). Information Theory and Reliable Communication, John Wiley, New York
12 Gelfand, I.M. and Yaglom, A.M. (1959). Calculation of the amount of information about a random function contained in another such function. American Mathematical Society, Translations, Ser
13 Palus, M. and Pivka, D. (1995). Estimating predictability: Redundancy and surrogate data method. Neural Network World, Vol. 4, 537-552
14 Palus, M. (1993). Identifying and quantifying chaos by using information theoretic functions in time series prediction: Forecasting the Future and Understanding the Past. SantaFe Institute Studies in the Sciences of Complexity, Vol. 15, 387-413