• Title/Summary/Keyword: Contingency tables

Search Result 85, Processing Time 0.022 seconds

Empirical Comparisons of Disparity Measures for Three Dimensional Log-Linear Models

  • Park, Y.S.;Hong, C.S.;Jeong, D.B.
    • Journal of the Korean Data and Information Science Society
    • /
    • v.17 no.2
    • /
    • pp.543-557
    • /
    • 2006
  • This paper is concerned with the applicability of the chi-square approximation to the six disparity statistics: the Pearson chi-square, the generalized likelihood ratio, the power divergence, the blended weight chi-square, the blended weight Hellinger distance, and the negative exponential disparity statistic. Three dimensional contingency tables of small and moderate sample sizes are generated to be fitted to all possible hierarchical log-linear models: the completely independent model, the conditionally independent model, the partial association models, and the model with one variable independent of the other two. For models with direct solutions of expected cell counts, point estimates and confidence intervals of the 90 and 95 percentage points of six statistics are explored. For model without direct solutions, the empirical significant levels and the empirical powers of six statistics to test the significance of the three factor interaction are computed and compared.

  • PDF

Farmers' Views on the Farming in Seoul (서울지역 농업인의 영농의식)

  • Hwang, Han-Cheol;Choi, Soo-Myung;Park, Sun-Yong
    • Proceedings of the Korean Society of Agricultural Engineers Conference
    • /
    • 2001.10a
    • /
    • pp.216-219
    • /
    • 2001
  • In spite of the importance of the farm area in Seoul, in providing fresh vegetables, a pleasant environment and a good quality of life for residents, rapid urbanization and industrialization have greatly reduced the farm area. The purpose of this study is to examine farmers' intentions and attitudes to provide supporting data for planning the strategy of urban agricultural development. All the collected data was analyzed using the contingency tables and the Chi-square test using the SAS computer statistical package. Farmers' views on the farming in Seoul were different depending on their status. Therefore, agricultural strategies in there should be considered their different attitudes.

  • PDF

Bayesian Inference with Inequality Constraints (부등 제한 조건하에서의 베이지안 추론)

  • Oh, Man-Suk
    • The Korean Journal of Applied Statistics
    • /
    • v.27 no.6
    • /
    • pp.909-922
    • /
    • 2014
  • This paper reviews Bayesian inference with inequality constraints. It focuses on ⅰ) comparison of models with various inequality/equality constraints on parameters, ⅱ) multiple tests on equalities of parameters when parameters are under inequality constraints, ⅲ) multiple test on equalities of score parameters in models for contingency tables with ordinal categorical variables.

Influential Points in GLMs via Backwards Stepping

  • Jeong, Kwang-Mo;Oh, Hae-Young
    • Communications for Statistical Applications and Methods
    • /
    • v.9 no.1
    • /
    • pp.197-212
    • /
    • 2002
  • When assessing goodness-of-fit of a model, a small subset of deviating observations can give rise to a significant lack of fit. It is therefore important to identify such observations and to assess their effects on various aspects of analysis. A Cook's distance measure is usually used to detect influential observation. But it sometimes is not fully effective in identifying truly influential set of observations because there may exist masking or swamping effects. In this paper we confine our attention to influential subset In GLMs such as logistic regression models and loglinear models. We modify a backwards stepping algorithm, which was originally suggested for detecting outlying cells in contingency tables, to detect influential observations in GLMs. The algorithm consists of two steps, the identification step and the testing step. In identification step we Identify influential observations based on influencial measures such as Cook's distances. On the other hand in testing step we test the subset of identified observations to be significant or not Finally we explain the proposed method through two types of dataset related to logistic regression model and loglinear model, respectively.

Reinterpretation of Multiple Correspondence Analysis using the K-Means Clustering Analysis

  • Choi, Yong-Seok;Hyun, Gee Hong;Kim, Kyung Hee
    • Communications for Statistical Applications and Methods
    • /
    • v.9 no.2
    • /
    • pp.505-514
    • /
    • 2002
  • Multiple correspondence analysis graphically shows the correspondent relationship among categories in multi-way contingency tables. It is well known that the proportions of the principal inertias as part of the total inertia is low in multiple correspondence analysis. Moreover, although this problem can be overcome by using the Benzecri formula, it is not enough to show clear correspondent relationship among categories (Greenacre and Blasius, 1994, Chapter 10). In addition, they show that Andrews' plot is useful in providing the correspondent relationship among categories. However, this method also does not give some concise interpretation among categories when the number of categories is large. Therefore, in this study, we will easily interpret the multiple correspondence analysis by applying the K-means clustering analysis.

Farmers' Views on the Farming in Seoul (서울지역 농업인의 영농의식)

  • Hwang, Han-Cheol;Park, Sun-Yong;Han, Kyong-Soo
    • Journal of Korean Society of Rural Planning
    • /
    • v.8 no.1 s.15
    • /
    • pp.94-104
    • /
    • 2002
  • In spite of the importance of the farm area in Seoul, in providing fresh vegetables, a pleasant environment and a good quality of life for residents, rapid urbanization and industrialization have greatly reduced the farm area. The purpose of this study is to examine farmers' intentions and attitudes to provide supporting data for planning the strategy of urban agricultural development. All the collected data was analyzed using the contingency tables and the Chi-square test using the SAS computer statistical package. Based on analysis of the survey data, the leaseholders were found to be more satisfied with their job than the landowning farmers. Also, the small-scale farmers with green houses showed greater job satisfaction than the ordinary large-scale farmers. Farmers' views on the farming in Seoul were different depending on their status. Therefore, agricultural strategies in there should be considered their different attitudes.

Estimation from Incomplete Data in Multivariate Distributions under Stochastic Ordering (확률적 순서를 갖는 다변량분포에서 불완전자료에 의한 추정)

  • Kwang Mo Jeoung
    • The Korean Journal of Applied Statistics
    • /
    • v.7 no.2
    • /
    • pp.145-157
    • /
    • 1994
  • For multivariate distributions satisfying stochastic ordering, we suggest maximum likelihood estimation with incomplete data via an EM algorithm. In this paper we restrict our attention to the contingency tables with partially cross-classified observations. We may use the existing isotonic regression program to implement EM algorithm, and we illustrate the estimation process through an example.

  • PDF

Empirical Comparisons of Disparity Measures for Partial Association Models in Three Dimensional Contingency Tables

  • Jeong, D.B.;Hong, C.S.;Yoon, S.H.
    • Communications for Statistical Applications and Methods
    • /
    • v.10 no.1
    • /
    • pp.135-144
    • /
    • 2003
  • This work is concerned with comparison of the recently developed disparity measures for the partial association model in three dimensional categorical data. Data are generated by using simulation on each term in the log-linear model equation based on the partial association model, which is a proposed method in this paper. This alternative Monte Carlo methods are explored to study the behavior of disparity measures such as the power divergence statistic I(λ), the Pearson chi-square statistic X$^2$, the likelihood ratio statistic G$^2$, the blended weight chi-square statistic BWCS(λ), the blended weight Hellinger distance statistic BWHD(λ), and the negative exponential disparity statistic NED(λ) for moderate sample sizes. We find that the power divergence statistic I(2/3) and the blended weight Hellinger distance family BWHD(1/9) are the best tests with respect to size and power.

Mutual Information and Redundancy for Categorical Data

  • Hong, Chong-Sun;Kim, Beom-Jun
    • Communications for Statistical Applications and Methods
    • /
    • v.13 no.2
    • /
    • pp.297-307
    • /
    • 2006
  • Most methods for describing the relationship among random variables require specific probability distributions and some assumptions of random variables. The mutual information based on the entropy to measure the dependency among random variables does not need any specific assumptions. And the redundancy which is a analogous version of the mutual information was also proposed. In this paper, the redundancy and mutual information are explored to multi-dimensional categorical data. It is found that the redundancy for categorical data could be expressed as the function of the generalized likelihood ratio statistic under several kinds of independent log-linear models, so that the redundancy could also be used to analyze contingency tables. Whereas the generalized likelihood ratio statistic to test the goodness-of-fit of the log-linear models is sensitive to the sample size, the redundancy for categorical data does not depend on sample size but its cell probabilities itself.

Maximum Likelihood Estimation of Multinomial Parameters with Known or Unknown Crossing Point

  • Lee, Ju-Young;Oh, Myongsik
    • Communications for Statistical Applications and Methods
    • /
    • v.6 no.3
    • /
    • pp.947-956
    • /
    • 1999
  • We define a crossing point $x_0$ such that f(x)$\geq$g(x) for x$\leq$$x_0$ and f(x)$\leq$g(x) for x>$x_0$ where f and g are probability density functions. We may encounter suchy situation when we compare two histograms from two independent observations. For example two contingency tables where initially admitted students and actually enrolled students are classified according to their high school ranking may show such situation, In this paper we consider maximum likelihood estimation of cell probabilities when a crossing point exists, We first assume a known crossing point and find an estimator. The estimation procedure for the case of unknown crossing point is just a straightforward extension. A real data is analyzed for an illustrative purpose.

  • PDF