• Title/Summary/Keyword: Pearson chi-squared statistics

Search Result 14, Processing Time 0.022 seconds

Effect of Bias on the Pearson Chi-squared Test for Two Population Homogeneity Test

  • Heo, Sunyeong
    • Journal of Integrative Natural Science
    • /
    • v.5 no.4
    • /
    • pp.241-245
    • /
    • 2012
  • Categorical data collected based on complex sample design is not proper for the standard Pearson multinomial-based chi-squared test because the observations are not independent and identically distributed. This study investigates effects of bias of point estimator of population proportion and its variance estimator to the standard Pearson chi-squared test statistics when the sample is collected based on complex sampling scheme. This study examines the effect under two population homogeneity test. The standard Pearson test statistic can be partitioned into two parts; the first part is the weighted sum of ${\chi}^2_1$ with eigenvalues of design matrix as their weights, and the additional second part which is added due to the biases of the point estimator and its variance estimator. Our empirical analysis shows that even though the bias of point estimator is small, Pearson test statistic is very much inflated due to underestimate the variance of point estimator. In the connection of design-based variance estimator and its design matrix, the bigger the average of eigenvalues of design matrix is, the larger relative size of which the first component part to Pearson test statistic is taking.

Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links

  • Jeong, Kwang-Mo;Lee, Hyun-Yung
    • Communications for Statistical Applications and Methods
    • /
    • v.16 no.4
    • /
    • pp.697-705
    • /
    • 2009
  • The Pearson chi-squared statistic or the deviance statistic is widely used in assessing the goodness-of-fit of the generalized linear models. But these statistics are not proper in the situation of continuous explanatory variables which results in the sparseness of cell frequencies. We propose a goodness-of-fit test statistic for the cumulative logit models with ordinal responses. We consider the grouping of a dataset based on the ordinal scores obtained by fitting the assumed model. We propose the Pearson chi-squared type test statistic, which is obtained from the cross-classified table formed by the subgroups of ordinal scores and the response categories. Because the limiting distribution of the chi-squared type statistic is intractable we suggest the parametric bootstrap testing procedure to approximate the distribution of the proposed test statistic.

Generating Multidimensional Random Tables (다차원 임의 분할표 생성)

  • Choi, Hyun-Jip
    • The Korean Journal of Applied Statistics
    • /
    • v.19 no.3
    • /
    • pp.545-554
    • /
    • 2006
  • We suggest a method for generating multidimensional random tables based on the log-linear models. A linear combination approach by Lee(1997) is applied to get the joint distribution with the well known Pearson chi-squared statistics. We can generate completely associated joint distributions which have the fixed association among three variables by using the suggested method. Therefore the method can be extended to more higher dimension than the three dimensional tables.

The Chi-squared Test of Independence for a Multi-way Contingency Table wish All Margins Fixed

  • Park, Cheolyong
    • Journal of the Korean Statistical Society
    • /
    • v.27 no.2
    • /
    • pp.197-203
    • /
    • 1998
  • To test the hypothesis of complete or total independence for a multi-way contingency table, the Pearson chi-squared test statistic is usually employed under Poisson or multinomial models. It is well known that, under the hypothesis, this statistic follows an asymptotic chi-squared distribution. We consider the case where all marginal sums of the contingency table are fixed. Using conditional limit theorems, we show that the chi-squared test statistic has the same limiting distribution for this case.

  • PDF

Tests for homogeneity of proportions in clustered binomial data

  • Jeong, Kwang Mo
    • Communications for Statistical Applications and Methods
    • /
    • v.23 no.5
    • /
    • pp.433-444
    • /
    • 2016
  • When we observe binary responses in a cluster (such as rat lab-subjects), they are usually correlated to each other. In clustered binomial counts, the independence assumption is violated and we encounter an extra-variation. In the presence of extra-variation, the ordinary statistical analyses of binomial data are inappropriate to apply. In testing the homogeneity of proportions between several treatment groups, the classical Pearson chi-squared test has a severe flaw in the control of Type I error rates. We focus on modifying the chi-squared statistic by incorporating variance inflation factors. We suggest a method to adjust data in terms of dispersion estimate based on a quasi-likelihood model. We explain the testing procedure via an illustrative example as well as compare the performance of a modified chi-squared test with competitive statistics through a Monte Carlo study.

Error cause analysis of Pearson test statistics for k-population homogeneity test (k-모집단 동질성검정에서 피어슨검정의 오차성분 분석에 관한 연구)

  • Heo, Sunyeong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.4
    • /
    • pp.815-824
    • /
    • 2013
  • Traditional Pearson chi-squared test is not appropriate for the data collected by the complex sample design. When one uses the traditional Pearson chi-squared test to the complex sample categorical data, it may give wrong test results, and the error may occur not only due to the biased variance estimators but also due to the biased point estimators of cell proportions. In this study, the design based consistent Wald test statistics was derived for k-population homogeneity test, and the traditional Pearson chi-squared test statistics was partitioned into three parts according to the causes of error; the error due to the bias of variance estimator, the error due to the bias of cell proportion estimator, and the unseparated error due to the both bias of variance estimator and bias of cell proportion estimator. An analysis was conducted for empirical results of the relative size of each error component to the Pearson chi-squared test statistics. The second year data from the fourth Korean national health and nutrition examination survey (KNHANES, IV-2) was used for the analysis. The empirical results show that the relative size of error from the bias of variance estimator was relatively larger than the size of error from the bias of cell proportion estimator, but its degrees were different variable by variable.

Notes on the Goodness-of-Fit Tests for the Ordinal Response Model

  • Jeong, Kwang-Mo;Lee, Hyun-Yung
    • The Korean Journal of Applied Statistics
    • /
    • v.23 no.6
    • /
    • pp.1057-1065
    • /
    • 2010
  • In this paper we discuss some cautionary notes in using the Pearson chi-squared test statistic for the goodness-of-fit of the ordinal response model. If a model includes continuous type explanatory variables, the resulting table from the t of a model is not a regular one in the sense that the cell boundaries are not fixed but randomly determined by some other criteria. The chi-squared statistic from this kind of table does not have a limiting chi-square distribution in general and we need to be very cautious of the use of a chi-squared type goodness-of-t test. We also study the limiting distribution of the chi-squared type statistic for testing the goodness-of-t of cumulative logit models with ordinal responses. The regularity conditions necessary to the limiting distribution will be reformulated in the framework of the cumulative logit model by modifying those of Moore and Spruill (1975). Due to the complex limiting distribution, a parametric bootstrap testing procedure is a good alternative and we explained the suggested method through a practical example of an ordinal response dataset.

Effect of complex sample design on Pearson test statistic for homogeneity (복합표본자료에서 동질성검정을 위한 피어슨 검정통계량의 효과)

  • Heo, Sun-Yeong;Chung, Young-Ae
    • Journal of the Korean Data and Information Science Society
    • /
    • v.23 no.4
    • /
    • pp.757-764
    • /
    • 2012
  • This research is for comparison of test statistics for homogeneity when the data is collected based on complex sample design. The survey data based on complex sample design does not satisfy the condition of independency which is required for the standard Pearson multinomial-based chi-squared test. Today, lots of data sets ara collected by complex sample designs, but the tests for categorical data are conducted using the standard Pearson chi-squared test. In this study, we compared the performance of three test statistics for homogeneity between two populations using data from the 2009 customer satisfaction evaluation survey to the service from Gyeongsangnam-do regional offices of education: the standard Pearson test, the unbiasedWald test, and the Pearsontype test with survey-based point estimates. Through empirical analyses, we fist showed that the standard Pearson test inflates the values of test statistics very much and the results are not reliable. Second, in the comparison of Wald test and Pearson-type test, we find that the test results are affected by the number of categories, the mean and standard deviation of the eigenvalues of design matrix.

Empirical Comparisons of Disparity Measures for Three Dimensional Log-Linear Models

  • Park, Y.S.;Hong, C.S.;Jeong, D.B.
    • Journal of the Korean Data and Information Science Society
    • /
    • v.17 no.2
    • /
    • pp.543-557
    • /
    • 2006
  • This paper is concerned with the applicability of the chi-square approximation to the six disparity statistics: the Pearson chi-square, the generalized likelihood ratio, the power divergence, the blended weight chi-square, the blended weight Hellinger distance, and the negative exponential disparity statistic. Three dimensional contingency tables of small and moderate sample sizes are generated to be fitted to all possible hierarchical log-linear models: the completely independent model, the conditionally independent model, the partial association models, and the model with one variable independent of the other two. For models with direct solutions of expected cell counts, point estimates and confidence intervals of the 90 and 95 percentage points of six statistics are explored. For model without direct solutions, the empirical significant levels and the empirical powers of six statistics to test the significance of the three factor interaction are computed and compared.

  • PDF

A Monte Carlo Comparison of the Small Sample Behavior of Disparity Measures (소표본에서 차이측도 통계량의 비교연구)

  • 홍종선;정동빈;박용석
    • The Korean Journal of Applied Statistics
    • /
    • v.16 no.2
    • /
    • pp.455-467
    • /
    • 2003
  • There has been a long debate on the applicability of the chi-square approximation to statistics based on small sample size. Extending comparison results among Pearson chi-square Χ$^2$, generalized likelihood .ratio G$^2$, and the power divergence Ι(2/3) statistics suggested by Rudas(1986), recently developed disparity statistics (BWHD(1/9), BWCS(1/3), NED(4/3)) we compared and analyzed in this paper. By Monte Carlo studies about the independence model of two dimension contingency tables, the conditional model and one variable independence model of three dimensional tables, simulated 90 and 95 percentage points and approximate 95% confidence intervals for the true percentage points are obtained. It is found that the Χ$^2$, Ι(2/3), BWHD(1/9) test statistics have very similar behavior and there seem to be applcable for small sample sizes than others.