• 제목/요약/키워드: chi-square statistic

검색결과 72건 처리시간 0.03초

Goodness-of-fit tests for a proportional odds model

  • Lee, Hyun Yung
    • Journal of the Korean Data and Information Science Society
    • /
    • 제24권6호
    • /
    • pp.1465-1475
    • /
    • 2013
  • The chi-square type test statistic is the most commonly used test in terms of measuring testing goodness-of-fit for multinomial logistic regression model, which has its grouped data (binomial data) and ungrouped (binary) data classified by a covariate pattern. Chi-square type statistic is not a satisfactory gauge, however, because the ungrouped Pearson chi-square statistic does not adhere well to the chi-square statistic and the ungrouped Pearson chi-square statistic is also not a satisfactory form of measurement in itself. Currently, goodness-of-fit in the ordinal setting is often assessed using the Pearson chi-square statistic and deviance tests. These tests involve creating a contingency table in which rows consist of all possible cross-classifications of the model covariates, and columns consist of the levels of the ordinal response. I examined goodness-of-fit tests for a proportional odds logistic regression model-the most commonly used regression model for an ordinal response variable. Using a simulation study, I investigated the distribution and power properties of this test and compared these with those of three other goodness-of-fit tests. The new test had lower power than the existing tests; however, it was able to detect a greater number of the different types of lack of fit considered in this study. I illustrated the ability of the tests to detect lack of fit using a study of aftercare decisions for psychiatrically hospitalized adolescents.

Tests for Uniformity : A Comparative Study

  • Rahman, Mezbahur;Chakrobartty, Shuvro
    • Journal of the Korean Data and Information Science Society
    • /
    • 제15권1호
    • /
    • pp.211-218
    • /
    • 2004
  • The subject of assessing whether a data set is from a specific distribution has received a good deal of attention. This topic is critically important for uniform distributions. Several parametric tests are compared. These tests also can be used in testing randomness of a sample. Anderson-Darling $A^2$ statistic is found to be most powerful.

  • PDF

Empirical Comparisons of Disparity Measures for Three Dimensional Log-Linear Models

  • Park, Y.S.;Hong, C.S.;Jeong, D.B.
    • Journal of the Korean Data and Information Science Society
    • /
    • 제17권2호
    • /
    • pp.543-557
    • /
    • 2006
  • This paper is concerned with the applicability of the chi-square approximation to the six disparity statistics: the Pearson chi-square, the generalized likelihood ratio, the power divergence, the blended weight chi-square, the blended weight Hellinger distance, and the negative exponential disparity statistic. Three dimensional contingency tables of small and moderate sample sizes are generated to be fitted to all possible hierarchical log-linear models: the completely independent model, the conditionally independent model, the partial association models, and the model with one variable independent of the other two. For models with direct solutions of expected cell counts, point estimates and confidence intervals of the 90 and 95 percentage points of six statistics are explored. For model without direct solutions, the empirical significant levels and the empirical powers of six statistics to test the significance of the three factor interaction are computed and compared.

  • PDF

A Rao-Robson Chi-Square Test for Multivariate Normality Based on the Mahalanobis Distances

  • Park, Cheolyong
    • Communications for Statistical Applications and Methods
    • /
    • 제7권2호
    • /
    • pp.385-392
    • /
    • 2000
  • Many tests for multivariate normality are based on the spherical coordinates of the scaled residuals of multivariate observations. Moore and Stubblebine's (1981) Pearson chi-square test is based on the radii of the scaled residuals, or equivalently the sample Mahalanobis distances of the observations from the sample mean vector. The chi-square statistic does not have a limiting chi-square distribution since the unknown parameters are estimated from ungrouped data. We will derive a simple closed form of the Rao-Robson chi-square test statistic and provide a self-contained proof that it has a limiting chi-square distribution. We then provide an illustrative example of application to a real data with a simulation study to show the accuracy in finite sample of the limiting distribution.

  • PDF

불균형 텍스트 데이터의 변수 선택에 있어서의 카이제곱통계량과 정보이득의 특징 (Properties of chi-square statistic and information gain for feature selection of imbalanced text data)

  • 문혜인;손원
    • 응용통계연구
    • /
    • 제35권4호
    • /
    • pp.469-484
    • /
    • 2022
  • 텍스트 데이터는 일반적으로 많은 단어로 이루어져 있으므로 변수의 수가 매우 많은 고차원 데이터에 해당된다. 이러한 고차원 데이터에서는 계산 효율성과 통계분석의 정확성을 높이기 위해 많은 변수 중 중요한 변수를 선택하기 위한 절차를 거치는 경우가 많다. 텍스트 데이터에서도 많은 단어 중 중요한 단어를 선택하기 위해 여러가지 방법들이 사용되고 있다. 이 연구에서는 단어 선택을 위한 대표적인 필터링 방법인 카이제곱통계량과 정보이득의 공통점과 차이점을 살펴보고 실제 텍스트 데이터에서 이 단어선택 방법들의 성질을 확인해보았다. 카이제곱통계량과 정보이득은 비음성, 볼록성 등의 성질을 공유하지만 불균형 텍스트 데이터에서 카이제곱통계량이 양변수 위주로 단어를 선택하는 반면, 정보이득은 음변수도 상대적으로 많이 선택하는 경향이 있음을 확인하였다.

Empirical Comparisons of Disparity Measures for Partial Association Models in Three Dimensional Contingency Tables

  • Jeong, D.B.;Hong, C.S.;Yoon, S.H.
    • Communications for Statistical Applications and Methods
    • /
    • 제10권1호
    • /
    • pp.135-144
    • /
    • 2003
  • This work is concerned with comparison of the recently developed disparity measures for the partial association model in three dimensional categorical data. Data are generated by using simulation on each term in the log-linear model equation based on the partial association model, which is a proposed method in this paper. This alternative Monte Carlo methods are explored to study the behavior of disparity measures such as the power divergence statistic I(λ), the Pearson chi-square statistic X$^2$, the likelihood ratio statistic G$^2$, the blended weight chi-square statistic BWCS(λ), the blended weight Hellinger distance statistic BWHD(λ), and the negative exponential disparity statistic NED(λ) for moderate sample sizes. We find that the power divergence statistic I(2/3) and the blended weight Hellinger distance family BWHD(1/9) are the best tests with respect to size and power.

Analysis on the Amino Acid Distributions with Position in Transmembrane Proteins

  • Chi, Sang-Mun
    • Journal of the Korean Data and Information Science Society
    • /
    • 제16권4호
    • /
    • pp.745-758
    • /
    • 2005
  • This paper presents a statistical analysis on the position-specific distributions of amino acid residues in transmembrane proteins. A hidden Markov model segments membrane proteins to produce segmented regions of homogeneous statistical property from variable-length amino acids sequences. These segmented residues are analyzed by using chi-square statistic and relative-entropy in order to find position-specific amino acids. This analysis showed that isoleucine and valine concentrated on the center of membrane-spanning regions, tryptophan, tyrosine and positive residues were found frequently near both ends of membrane.

  • PDF

A Study on Cell Influences to Chi-square Statistic in Contingency Tables

  • Kim, Hong-Gie
    • Communications for Statistical Applications and Methods
    • /
    • 제5권1호
    • /
    • pp.35-42
    • /
    • 1998
  • Once a contingency table is constructed, the first interest will be the hypotheses of either homogeneity or independence depending on the sampling scheme. The most widely used test statistic in practice is the classical Pearson's $\chi^2$ statistic. When the null hypothesis is rejected, another natural interest becomes which cell contributed to the rejection of the null hypothesis more than others. For this purpose, so called cell $\chi^2$ components are investigated. In this paper, the influence function of a cell to the $\chi^2$ statistic is derived, which can be used for the same purpose. This function measures the effect of each cell to the $\chi$$^2$ statistic. A numerical example is given to demonstrate the role of the new function.

  • PDF

Likelihood ratio in estimating Chi-square parameter

  • Rahman, Mezbahur
    • Journal of the Korean Data and Information Science Society
    • /
    • 제20권3호
    • /
    • pp.587-592
    • /
    • 2009
  • The most frequent use of the chi-square distribution is in the area of goodness-of-t of a distribution. The likelihood ratio test is a commonly used test statistic as the maximum likelihood estimate in statistical inferences. The recently revised versions of the likelihood ratio test statistics are used in estimating the parameter in the chi-square distribution. The estimates are compared with the commonly used method of moments and the maximum likelihood estimate.

  • PDF

A Note on the Chi-Square Test for Multivariate Normality Based on the Sample Mahalanobis Distances

  • Park, Cheolyong
    • Journal of the Korean Statistical Society
    • /
    • 제28권4호
    • /
    • pp.479-488
    • /
    • 1999
  • Moore and Stubblebine(1981) suggested a chi-square test for multivariate normality based on cell counts calculated from the sample Mahalanobis distances. They derived the limiting distribution of the test statistic only when equiprobable cells are employed. Using conditional limit theorems, we derive the limiting distribution of the statistic as well as the asymptotic normality of the cell counts. These distributions are valid even when equiprobable cells are not employed. We finally apply this method to a real data set.

  • PDF