• Title/Summary/Keyword: Chi-Square Statistic

Search Result 72, Processing Time 0.025 seconds

Goodness-of-fit tests for a proportional odds model

  • Lee, Hyun Yung
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.6
    • /
    • pp.1465-1475
    • /
    • 2013
  • The chi-square type test statistic is the most commonly used test in terms of measuring testing goodness-of-fit for multinomial logistic regression model, which has its grouped data (binomial data) and ungrouped (binary) data classified by a covariate pattern. Chi-square type statistic is not a satisfactory gauge, however, because the ungrouped Pearson chi-square statistic does not adhere well to the chi-square statistic and the ungrouped Pearson chi-square statistic is also not a satisfactory form of measurement in itself. Currently, goodness-of-fit in the ordinal setting is often assessed using the Pearson chi-square statistic and deviance tests. These tests involve creating a contingency table in which rows consist of all possible cross-classifications of the model covariates, and columns consist of the levels of the ordinal response. I examined goodness-of-fit tests for a proportional odds logistic regression model-the most commonly used regression model for an ordinal response variable. Using a simulation study, I investigated the distribution and power properties of this test and compared these with those of three other goodness-of-fit tests. The new test had lower power than the existing tests; however, it was able to detect a greater number of the different types of lack of fit considered in this study. I illustrated the ability of the tests to detect lack of fit using a study of aftercare decisions for psychiatrically hospitalized adolescents.

Tests for Uniformity : A Comparative Study

  • Rahman, Mezbahur;Chakrobartty, Shuvro
    • Journal of the Korean Data and Information Science Society
    • /
    • v.15 no.1
    • /
    • pp.211-218
    • /
    • 2004
  • The subject of assessing whether a data set is from a specific distribution has received a good deal of attention. This topic is critically important for uniform distributions. Several parametric tests are compared. These tests also can be used in testing randomness of a sample. Anderson-Darling $A^2$ statistic is found to be most powerful.

  • PDF

Empirical Comparisons of Disparity Measures for Three Dimensional Log-Linear Models

  • Park, Y.S.;Hong, C.S.;Jeong, D.B.
    • Journal of the Korean Data and Information Science Society
    • /
    • v.17 no.2
    • /
    • pp.543-557
    • /
    • 2006
  • This paper is concerned with the applicability of the chi-square approximation to the six disparity statistics: the Pearson chi-square, the generalized likelihood ratio, the power divergence, the blended weight chi-square, the blended weight Hellinger distance, and the negative exponential disparity statistic. Three dimensional contingency tables of small and moderate sample sizes are generated to be fitted to all possible hierarchical log-linear models: the completely independent model, the conditionally independent model, the partial association models, and the model with one variable independent of the other two. For models with direct solutions of expected cell counts, point estimates and confidence intervals of the 90 and 95 percentage points of six statistics are explored. For model without direct solutions, the empirical significant levels and the empirical powers of six statistics to test the significance of the three factor interaction are computed and compared.

  • PDF

A Rao-Robson Chi-Square Test for Multivariate Normality Based on the Mahalanobis Distances

  • Park, Cheolyong
    • Communications for Statistical Applications and Methods
    • /
    • v.7 no.2
    • /
    • pp.385-392
    • /
    • 2000
  • Many tests for multivariate normality are based on the spherical coordinates of the scaled residuals of multivariate observations. Moore and Stubblebine's (1981) Pearson chi-square test is based on the radii of the scaled residuals, or equivalently the sample Mahalanobis distances of the observations from the sample mean vector. The chi-square statistic does not have a limiting chi-square distribution since the unknown parameters are estimated from ungrouped data. We will derive a simple closed form of the Rao-Robson chi-square test statistic and provide a self-contained proof that it has a limiting chi-square distribution. We then provide an illustrative example of application to a real data with a simulation study to show the accuracy in finite sample of the limiting distribution.

  • PDF

Properties of chi-square statistic and information gain for feature selection of imbalanced text data (불균형 텍스트 데이터의 변수 선택에 있어서의 카이제곱통계량과 정보이득의 특징)

  • Mun, Hye In;Son, Won
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.4
    • /
    • pp.469-484
    • /
    • 2022
  • Since a large text corpus contains hundred-thousand unique words, text data is one of the typical large-dimensional data. Therefore, various feature selection methods have been proposed for dimension reduction. Feature selection methods can improve the prediction accuracy. In addition, with reduced data size, computational efficiency also can be achieved. The chi-square statistic and the information gain are two of the most popular measures for identifying interesting terms from text data. In this paper, we investigate the theoretical properties of the chi-square statistic and the information gain. We show that the two filtering metrics share theoretical properties such as non-negativity and convexity. However, they are different from each other in the sense that the information gain is prone to select more negative features than the chi-square statistic in imbalanced text data.

Empirical Comparisons of Disparity Measures for Partial Association Models in Three Dimensional Contingency Tables

  • Jeong, D.B.;Hong, C.S.;Yoon, S.H.
    • Communications for Statistical Applications and Methods
    • /
    • v.10 no.1
    • /
    • pp.135-144
    • /
    • 2003
  • This work is concerned with comparison of the recently developed disparity measures for the partial association model in three dimensional categorical data. Data are generated by using simulation on each term in the log-linear model equation based on the partial association model, which is a proposed method in this paper. This alternative Monte Carlo methods are explored to study the behavior of disparity measures such as the power divergence statistic I(λ), the Pearson chi-square statistic X$^2$, the likelihood ratio statistic G$^2$, the blended weight chi-square statistic BWCS(λ), the blended weight Hellinger distance statistic BWHD(λ), and the negative exponential disparity statistic NED(λ) for moderate sample sizes. We find that the power divergence statistic I(2/3) and the blended weight Hellinger distance family BWHD(1/9) are the best tests with respect to size and power.

Analysis on the Amino Acid Distributions with Position in Transmembrane Proteins

  • Chi, Sang-Mun
    • Journal of the Korean Data and Information Science Society
    • /
    • v.16 no.4
    • /
    • pp.745-758
    • /
    • 2005
  • This paper presents a statistical analysis on the position-specific distributions of amino acid residues in transmembrane proteins. A hidden Markov model segments membrane proteins to produce segmented regions of homogeneous statistical property from variable-length amino acids sequences. These segmented residues are analyzed by using chi-square statistic and relative-entropy in order to find position-specific amino acids. This analysis showed that isoleucine and valine concentrated on the center of membrane-spanning regions, tryptophan, tyrosine and positive residues were found frequently near both ends of membrane.

  • PDF

A Study on Cell Influences to Chi-square Statistic in Contingency Tables

  • Kim, Hong-Gie
    • Communications for Statistical Applications and Methods
    • /
    • v.5 no.1
    • /
    • pp.35-42
    • /
    • 1998
  • Once a contingency table is constructed, the first interest will be the hypotheses of either homogeneity or independence depending on the sampling scheme. The most widely used test statistic in practice is the classical Pearson's $\chi^2$ statistic. When the null hypothesis is rejected, another natural interest becomes which cell contributed to the rejection of the null hypothesis more than others. For this purpose, so called cell $\chi^2$ components are investigated. In this paper, the influence function of a cell to the $\chi^2$ statistic is derived, which can be used for the same purpose. This function measures the effect of each cell to the $\chi$$^2$ statistic. A numerical example is given to demonstrate the role of the new function.

  • PDF

Likelihood ratio in estimating Chi-square parameter

  • Rahman, Mezbahur
    • Journal of the Korean Data and Information Science Society
    • /
    • v.20 no.3
    • /
    • pp.587-592
    • /
    • 2009
  • The most frequent use of the chi-square distribution is in the area of goodness-of-t of a distribution. The likelihood ratio test is a commonly used test statistic as the maximum likelihood estimate in statistical inferences. The recently revised versions of the likelihood ratio test statistics are used in estimating the parameter in the chi-square distribution. The estimates are compared with the commonly used method of moments and the maximum likelihood estimate.

  • PDF

A Note on the Chi-Square Test for Multivariate Normality Based on the Sample Mahalanobis Distances

  • Park, Cheolyong
    • Journal of the Korean Statistical Society
    • /
    • v.28 no.4
    • /
    • pp.479-488
    • /
    • 1999
  • Moore and Stubblebine(1981) suggested a chi-square test for multivariate normality based on cell counts calculated from the sample Mahalanobis distances. They derived the limiting distribution of the test statistic only when equiprobable cells are employed. Using conditional limit theorems, we derive the limiting distribution of the statistic as well as the asymptotic normality of the cell counts. These distributions are valid even when equiprobable cells are not employed. We finally apply this method to a real data set.

  • PDF