• Title/Summary/Keyword: missing covariates

Search Result 18, Processing Time 0.017 seconds

Pattern-Mixture Model of the Cox Proportional Hazards Model with Missing Binary Covariates (결측이 있는 이산형 공변량에 대한 Cox비례위험모형의 패턴-혼합 모델)

  • Youk, Tae-Mi;Song, Ju-Won
    • The Korean Journal of Applied Statistics
    • /
    • v.25 no.2
    • /
    • pp.279-291
    • /
    • 2012
  • When fitting a Cox proportional hazards model with missing covariates, it is inefficient to exclude observations with missing values in the analysis. Furthermore, if the missing-data mechanism is not Missing Completely At Random(MCAR), it may lead to biased parameter estimation. Many approaches have been suggested to handle the Cox proportional hazards model when covariates are sometimes missing, but they are based on the selection model. This paper suggest an approach to handle Cox proportional hazards model with missing covariates by using the pattern-mixture model (Little, 1993). The pattern-mixture model is expressed by the joint distribution of survival time and the missing-data mechanism. In the pattern-mixture model, many models can be considered by setting up various restrictions, and different results under various restrictions indicate the sensitivity of the model due to missing covariates. A simulation study was conducted to show the sensitivity of parameter estimation under different restrictions in a pattern-mixture model. The proposed approach was also applied to mouse leukemia data.

Comparison of missing data methods in clustered survival data using Bayesian adaptive B-Spline estimation

  • Yoo, Hanna;Lee, Jae Won
    • Communications for Statistical Applications and Methods
    • /
    • v.25 no.2
    • /
    • pp.159-172
    • /
    • 2018
  • In many epidemiological studies, missing values in the outcome arise due to censoring. Such censoring is what makes survival analysis special and differentiated from other analytical methods. There are many methods that deal with censored data in survival analysis. However, few studies have dealt with missing covariates in survival data. Furthermore, studies dealing with missing covariates are rare when data are clustered. In this paper, we conducted a simulation study to compare results of several missing data methods when data had clustered multi-structured type with missing covariates. In this study, we modeled unknown baseline hazard and frailty with Bayesian B-Spline to obtain more smooth and accurate estimates. We also used prior information to achieve more accurate results. We assumed the missing mechanism as MAR. We compared the performance of five different missing data techniques and compared these results through simulation studies. We also presented results from a Multi-Center study of Korean IBD patients with Crohn's disease(Lee et al., Journal of the Korean Society of Coloproctology, 28, 188-194, 2012).

Comparison of GEE Estimators Using Imputation Methods (대체방법별 GEE추정량 비교)

  • 김동욱;노영화
    • The Korean Journal of Applied Statistics
    • /
    • v.16 no.2
    • /
    • pp.407-426
    • /
    • 2003
  • We consider the missing covariates problem in generalized estimating equations(GEE) model. If the covariate is partially missing, GEE can not be calculated. In this paper, we study the performance of 7 imputation methods to handle missing covariates in GEE models, and the properties of GEE estimators are investigated after missing covariates are imputed for ordinal data of repeated measurements. The 7 imputation methods include i) Naive Deletion ii) Sample Average Imputation iii) Row Average Imputation iv) Cross-wave Regression Imputation v) Carry-over Imputation vi) Bayesian Bootstrap vii) Approximate Bayesian Bootstrap. A Monte-Carlo simulation is used to compare the performance of these methods. For the missing mechanism generating the missing data, we assume ignorable nonresponse. Furthermore, we generate missing covariates with or without considering wave nonresp onse patterns.

The EM algorithm for mixture regression with missing covariates (결측 공변량을 갖는 혼합회귀모형에서의 EM 알고리즘)

  • Kim, Hyungmin;Ham, Geonhee;Seo, Byungtae
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.7
    • /
    • pp.1347-1359
    • /
    • 2016
  • Finite mixtures of regression models provide an effective tool to explore a hidden functional relationship between a response variable and covariates. However, it is common in practice that data are not fully observed due to several reasons. In this paper, we derived an expectation-maximization (EM) algorithm to obtain the maximum likelihood estimator when some covariates are missing at random in the finite mixture of regression models. We conduct some simulation studies and we also provide some real data examples to show the validity of the derived EM algorithm.

Application of Multiple Imputation Method in Analyzing Data with Missing Continuous Covariates

  • Ghasemizadeh Tamar, S.;Ganjali, M.
    • The Korean Journal of Applied Statistics
    • /
    • v.21 no.4
    • /
    • pp.659-664
    • /
    • 2008
  • Missing continuous covariates are pervasive in the use of generalized linear models for medical data. Multiple imputation is the most common and easy-to-do method of dealing with missing covariate data. However, there are always serious warnings in using this method. There should be concern to make imputed values more proper. In this paper, proper imputation from posterior predictive distribution is developed for implementing with arbitrary priors. We use empirical distribution of the posterior for approximating the posterior predictive distribution, to sample from it. This method is preferable in comparison with a presented imputation method of us which uses a full model to impute missing values using available software. The proposed methods are implemented on glucocorticoid data.

Comparison of GEE Estimation Methods for Repeated Binary Data with Time-Varying Covariates on Different Missing Mechanisms (시간-종속적 공변량이 포함된 이분형 반복측정자료의 GEE를 이용한 분석에서 결측 체계에 따른 회귀계수 추정방법 비교)

  • Park, Boram;Jung, Inkyung
    • The Korean Journal of Applied Statistics
    • /
    • v.26 no.5
    • /
    • pp.697-712
    • /
    • 2013
  • When analyzing repeated binary data, the generalized estimating equations(GEE) approach produces consistent estimates for regression parameters even if an incorrect working correlation matrix is used. However, time-varying covariates experience larger changes in coefficients than time-invariant covariates across various working correlation structures for finite samples. In addition, the GEE approach may give biased estimates under missing at random(MAR). Weighted estimating equations and multiple imputation methods have been proposed to reduce biases in parameter estimates under MAR. This article studies if the two methods produce robust estimates across various working correlation structures for longitudinal binary data with time-varying covariates under different missing mechanisms. Through simulation, we observe that time-varying covariates have greater differences in parameter estimates across different working correlation structures than time-invariant covariates. The multiple imputation method produces more robust estimates under any working correlation structure and smaller biases compared to the other two methods.

한 인구학도의 회고

  • 김택일
    • Korea journal of population studies
    • /
    • v.11 no.1
    • /
    • pp.1-13
    • /
    • 1988
  • This study examines the sampling bias that may have resulted from the large number of missing observations. Despite well-designed and reliable sampling procedures, the observed sample values in DSFH(Demographic Survey on Changes in Family and Household Structure, Japan) included many missing observations. The head administerd survey method of DSFH resulted in a large number of missing observations regarding characteristics of elderly non-head parents and their children. In addition, the response probability of a particular item in DSFH significantly differs by characteristics of elderly parents and their children. Furthermore, missing observations of many items occurred simultaneously. This complex pattern of missing observations critically limits the ability to produce an unbiased analysis. First, the large number of missing observations is likely to cause a misleading estimate of the standard error. Even worse, the possible dependency of missing observations on their latent values is likely to produce biased estimates of covariates. Two models are employed to solve the possible inference biases. First, EM algorithm is used to infer the missing values based on the knowledge of the association between the observed values and other covariates. Second, a selection model was employed given the suspicion that the probability of missing observations of proximity depends on its unobserved outcome.

  • PDF

Analysis of the cause-specific proportional hazards model with missing covariates (누락된 공변량을 가진 원인별 비례위험모형의 분석)

  • Minjung Lee
    • The Korean Journal of Applied Statistics
    • /
    • v.37 no.2
    • /
    • pp.225-237
    • /
    • 2024
  • In the analysis of competing risks data, some of covariates may not be fully observed for some subjects. In such cases, excluding subjects with missing covariate values from the analysis may result in biased estimates and loss of efficiency. In this paper, we studied multiple imputation and the augmented inverse probability weighting method for regression parameter estimation in the cause-specific proportional hazards model with missing covariates. The performance of estimators obtained from multiple imputation and the augmented inverse probability weighting method is evaluated by simulation studies, which show that those methods perform well. Multiple imputation and the augmented inverse probability weighting method were applied to investigate significant risk factors for the risk of death from breast cancer and from other causes for breast cancer data with missing values for tumor size obtained from the Prostate, Lung, Colorectal, and Ovarian Cancer Screen Trial Study. Under the cause-specific proportional hazards model, the methods show that race, marital status, stage, grade, and tumor size are significant risk factors for breast cancer mortality, and stage has the greatest effect on increasing the risk of breast cancer death. Age at diagnosis and tumor size have significant effects on increasing the risk of other-cause death.

A case study of competing risk analysis in the presence of missing data

  • Limei Zhou;Peter C. Austin;Husam Abdel-Qadir
    • Communications for Statistical Applications and Methods
    • /
    • v.30 no.1
    • /
    • pp.1-19
    • /
    • 2023
  • Observational data with missing or incomplete data are common in biomedical research. Multiple imputation is an effective approach to handle missing data with the ability to decrease bias while increasing statistical power and efficiency. In recent years propensity score (PS) matching has been increasingly used in observational studies to estimate treatment effect as it can reduce confounding due to measured baseline covariates. In this paper, we describe in detail approaches to competing risk analysis in the setting of incomplete observational data when using PS matching. First, we used multiple imputation to impute several missing variables simultaneously, then conducted propensity-score matching to match statin-exposed patients with those unexposed. Afterwards, we assessed the effect of statin exposure on the risk of heart failure-related hospitalizations or emergency visits by estimating both relative and absolute effects. Collectively, we provided a general methodological framework to assess treatment effect in incomplete observational data. In addition, we presented a practical approach to produce overall cumulative incidence function (CIF) based on estimates from multiple imputed and PS-matched samples.

EM Algorithm and Two Stage Model for Incomplete Data (불완전한 자료에 대한 보완기법(EM 알고리듬과 2단계(Two Stage) 모델))

  • 박경숙
    • Korea journal of population studies
    • /
    • v.21 no.1
    • /
    • pp.162-183
    • /
    • 1998
  • This study examines the sampling bias that may have resulted from the large number of missing observations. Despite well-designed and reliable sampling procedures, the observed sample values in DSFH(Demographic Survey on Changes in Family and Household Structure, Japan) included many missing observations. The head administerd survey method of DSFH resulted in a large number of missing observations regarding characteristics of elderly non-head parents and their children. In addition, the response probability of a particular item in DSFH significantly differs by characteristics of elderly parents and their children. Furthermore, missing observations of many items occurred simultaneously. This complex pattern of missing observations critically limits the ability to produce an unbiased analysis. First, the large number of missing observations is likely to cause a misleading estimate of the standard error. Even worse, the possible dependency of missing observations on their latent values is likely to produce biased estimates of covariates. Two models are employed to solve the possible inference biases. First, EM algorithm is used to infer the missing values based on the knowledge of the association between the observed values and other covariates. Second, a selection model was employed given the suspicion that the probability of missing observations of proximity depends on its unobserved outcome.

  • PDF