• Title/Summary/Keyword: zero imputation

Search Result 2, Processing Time 0.025 seconds

Application of discrete Weibull regression model with multiple imputation

  • Yoo, Hanna
    • Communications for Statistical Applications and Methods
    • /
    • v.26 no.3
    • /
    • pp.325-336
    • /
    • 2019
  • In this article we extend the discrete Weibull regression model in the presence of missing data. Discrete Weibull regression models can be adapted to various type of dispersion data however, it is not widely used. Recently Yoo (Journal of the Korean Data and Information Science Society, 30, 11-22, 2019) adapted the discrete Weibull regression model using single imputation. We extend their studies by using multiple imputation also with several various settings and compare the results. The purpose of this study is to address the merit of using multiple imputation in the presence of missing data in discrete count data. We analyzed the seventh Korean National Health and Nutrition Examination Survey (KNHANES VII), from 2016 to assess the factors influencing the variable, 1 month hospital stay, and we compared the results using discrete Weibull regression model with those of Poisson, negative Binomial and zero-inflated Poisson regression models, which are widely used in count data analyses. The results showed that the discrete Weibull regression model using multiple imputation provided the best fit. We also performed simulation studies to show the accuracy of the discrete Weibull regression using multiple imputation given both under- and over-dispersed distribution, as well as varying missing rates and sample size. Sensitivity analysis showed the influence of mis-specification and the robustness of the discrete Weibull model. Using imputation with discrete Weibull regression to analyze discrete data will increase explanatory power and is widely applicable to various types of dispersion data with a unified model.

Effect of zero imputation methods for log-transformation of independent variables in logistic regression

  • Seo Young Park
    • Communications for Statistical Applications and Methods
    • /
    • v.31 no.4
    • /
    • pp.409-425
    • /
    • 2024
  • Logistic regression models are commonly used to explain binary health outcome variable using independent variables such as patient characteristics in medical science and public health research. Although there is no distributional assumption required for independent variables in logistic regression, variables with severely right-skewed distribution such as lab values are often log-transformed to achieve symmetry or approximate normality. However, lab values often have zeros due to limit of detection which makes it impossible to apply log-transformation. Therefore, preprocessing to handle zeros in the observation before log-transformation is necessary. In this study, five methods that remove zeros (shift by 1, shift by half of the smallest nonzero, shift by square root of the smallest nonzero, replace zeros with half of the smallest nonzero, replace zeros with the square root of the smallest nonzero) are investigated in logistic regression setting. To evaluate performances of these methods, we performed a simulation study based on randomly generated data from log-normal distribution and logistic regression model. Shift by 1 method has the worst performance, and overall shift by half of the smallest nonzero method, replace zeros with half of the smallest nonzero method, and replace zeros with the square root of the smallest nonzero method showed comparable and stable performances.