DOI QR코드

DOI QR Code

Large tests of independence in incomplete two-way contingency tables using fractional imputation

  • Kang, Shin-Soo (Department of MIS, Catholic Kwandong University) ;
  • Larsen, Michael D. (Department of Statistics, George Washington University)
  • Received : 2015.01.10
  • Accepted : 2015.05.19
  • Published : 2015.07.31

Abstract

Imputation procedures fill-in missing values, thereby enabling complete data analyses. Fully efficient fractional imputation (FEFI) and multiple imputation (MI) create multiple versions of the missing observations, thereby reflecting uncertainty about their true values. Methods have been described for hypothesis testing with multiple imputation. Fractional imputation assigns weights to the observed data to compensate for missing values. The focus of this article is the development of tests of independence using FEFI for partially classified two-way contingency tables. Wald and deviance tests of independence under FEFI are proposed. Simulations are used to compare type I error rates and Power. The partially observed marginal information is useful for estimating the joint distribution of cell probabilities, but it is not useful for testing association. FEFI compares favorably to other methods in simulations.

Keywords

References

  1. Box, G. E. P. and Tiao, G. C. (1992). Bayesian inference in statistical analysis (Wiley Classics Library Edition), John Wiley & Sons, New York.
  2. Chen, T. T. and Fienberg, S. E. (1974). Two-dimensional contingency tables with both completely and partially cross-classi ed data. Biometrics, 30, 629-642. https://doi.org/10.2307/2529228
  3. Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Society B, 39, 1-38.
  4. Fay, R. E. (1996). Alternative paradigms for the analysis of imputed survey data Journal of the American Statistical Association, 91, 490-498. https://doi.org/10.1080/01621459.1996.10476909
  5. Friendly, M. (2000). Note on "Obtaining the maximum likelihood estimates in incomplete RC contingency tables using a Poisson generalized model". Journal of Computational and Graphical Statistics, 9, 158-166.
  6. Fuchs, C. (1982). Maximum likelihood estimation and model selection in contingency tables with missing data. Journal of the American Statistical Association, 77, 270-278. https://doi.org/10.1080/01621459.1982.10477795
  7. Geng, Z., He, Y. B., Wang, X. L. and Zhao, Q. (2003). Bayesian method for learning graphical models with incompletely categorical data. Computational Statistics & Data Analysis, 44, 175-192. https://doi.org/10.1016/S0167-9473(03)00066-5
  8. Geng, Z., Wan, K. and Tao, F. (2000). Mixed graphical models with missing data and the partial imputation EM algorithm. Scandinavian Journal of Statistics, 27, 433-444. https://doi.org/10.1111/1467-9469.00199
  9. Green, P. E. and Park, T. (2003). A Bayesian hierarchical model for categorical data with nonignorable nonresponse. Biometrics, 59, 886-896. https://doi.org/10.1111/j.0006-341X.2003.00103.x
  10. Kalton, G. and Kish, L. (1981). Two efficient random imputation procedures. Proceedings of the Survey Research Methods Section, American Statistical Association, 146-151.
  11. Kang, S. S. (2006). MLE for incompletecontingency tables with lagrangian multiplier. Journal of the Korean Data & information Science Society, 17, 919-925.
  12. Kang, S. S., Koehler, K. J. and Larsen, M. D. (2011). Fractional imputation for incomplete two-way con-tingency tables. Metrika, 75, 581-599.
  13. Kim, J. K. and Fuller, W. (2004). Fractional hot deck imputation. Biometrika, 91, 559-578. https://doi.org/10.1093/biomet/91.3.559
  14. Li, K. H., Meng, X. L., Raghunathan, T. E. and Rubin, D. (1991). Significance levels from repeated p-values with multiply-imputed data. Statistica Sinica, 1, 65-92.
  15. Li, K. H., Raghunathan, T. E. and Rubin, D. B. (1991). Large-sample significance levels from multiple imputed data using moment-based statistics and an F reference distribution. Journal of the American Statistical Association, 86, 1065-1073.
  16. Lipsitz, S. R. and Fitzmaurice, G. M. (1996). The score test for independence in $R{\times}C$ contingency tables with missing data. Biometrics, 52, 751-762. https://doi.org/10.2307/2532915
  17. Lipsitz, S. R., Parzen, M. and Molenberghs, G. (1998). Obtaining the maximum likelihood estimates in incomplete $R{\times}C$ contingency tables using a Poisson generalized model. Journal of Computational and Graphical Statistics, 7, 356-376.
  18. Little, R. J. A. (1982). Models for nonresponse in sample surveys. Journal of the American Statistical Association, 77, 237-250. https://doi.org/10.1080/01621459.1982.10477792
  19. Little, R. J. A. and Rubin, D. B. (2002). Statistical analysis with missing data, John Wiley & Sons, New York.
  20. Magder, L. S. (2003). Simple approaches to assess the possible impact of missing outcome information on estimates of risk ratios, odds ratios and risk differences. Controlled Clinical Trials, 24, 411-421. https://doi.org/10.1016/S0197-2456(03)00021-7
  21. Meng, X. L. and Rubin, D. B. (1991) Using EM to obtain asymptotic variance-covariance matrices: The SEM algorithm. Journal of the American Statistical Association, 86, 899-909. https://doi.org/10.1080/01621459.1991.10475130
  22. Meng, X. L. and Rubin, D. B. (1992). Performing likelihood ratio tests with multiply-imputed data sets. Biometrika, 79, 103-111. https://doi.org/10.1093/biomet/79.1.103
  23. Meng, X. L. and Rubin, D. B. (1993). Maximum likelihood estimation via the ECM algorithm: A general framework. Biometrika, 80, 267-278. https://doi.org/10.1093/biomet/80.2.267
  24. Molenberghs, G., Kenward, M. G. and Goetghebeur, E. (2001). Sensitivity analysis for incomplete contin-gency tables: the Slovenian plebiscite case. Applied Statistics, 50, 15-29.
  25. Pringle, R. M. and Rayner, A. A. (1971). Generalized inverse matrices with applications to statistics, Charles Grin & Company Limited, London.
  26. Rubin, D. B. (1976). Infernece and missing data (with discussion). Biometrika, 63, 581-592. https://doi.org/10.1093/biomet/63.3.581
  27. Rubin, D. B. (1978). Multiple imputation in sample surveys - A phenomenological bayesian approach to nonresponse. Proceedings of the Survey Research Methods Section, American Statistical Association 1978, 20-34.
  28. Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys, John Wiley & Sons, New York.
  29. Rubin, D. B. (1996). Multiple imputation after 18+ years. Journal of the American Statistical Association, 91, 473-489. https://doi.org/10.1080/01621459.1996.10476908
  30. Rubin, D. B., Stern, H. S. and Vehovar, V. (1995). Handling don't know survey responses: The case of the slovenian plebiscite. Journal of the American Statistical Association, 90, 822-828.
  31. Schafer, J. L. (1997). Analysis of Incomplete Multivariate Data, Chapman & Hall, London.
  32. Shao, J. (1999). Mathematical Statistics, Springer-Verlag, New York.
  33. Van Dyk, D. A., Meng, X. L. and Rubin, D. B. (1995). Maximum likelihood estimation via the ECM algorithm: Computing the asymptotic variance. Statistica Sinica, 5, 55-75.
  34. Wake eld, J. (2004). Ecological inference in 2x2 tables. Journal of the Royal Statistical Society, Series A, 167, 385-426. https://doi.org/10.1111/j.1467-985x.2004.02046_1.x
  35. West, C. P. and Dawson, J. D. (2002). Complete imputation of missing repeated categorical data: One-sample applications. Statistics in Medicine, 21, 203-217. https://doi.org/10.1002/sim.982

Cited by

  1. Variance estimation for distribution rate in stratified cluster sampling with missing values vol.28, pp.2, 2017, https://doi.org/10.7465/jkdi.2017.28.2.443