• Title/Summary/Keyword: conditional mean imputation

Search Result 4, Processing Time 0.02 seconds

Comparing Imputation Methods for Doubly Censored Data

  • Yoo, Han-Na;Lee, Jae-Won
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.3
    • /
    • pp.607-616
    • /
    • 2009
  • In many epidemiological studies, the occurrence times of the event of interest are right-censored or interval censored. In certain situations such as the AIDS data, however, the incubation period which is the time between HIV infection and the diagnosis of AIDS is usually doubly censored. In this paper, we impute the interval censored HIV infection time using three imputation methods. Mid imputation, conditional mean imputation and approximate Bayesian bootstrap are implemented to obtain right censored data, and then Gibbs sampler is used to estimate the coefficient factor of the incubation period. By using Bayesian approach, flexible modeling and the use of prior information is available. We applied both parametric and semi-parametric methods for estimating the effect of the covariate and compared the imputation results incorporating prior information for the covariate effects.

An Imputation for Nonresponses in the Survey on the Rural Living Indicators (농촌생활지표조사에서 무응답 대체 : 사례)

  • Cho, Young-Sook;Chun, Young-Min;Hwang, Dae-Yong
    • The Korean Journal of Applied Statistics
    • /
    • v.21 no.1
    • /
    • pp.95-107
    • /
    • 2008
  • Survey on the rural living indicators was the statistic approved from National Statistical Office and the survey executed by rural resources development institute. This study was used the raw data of survey on the rural living indicators in 2005. After editing procedure for raw data, we were studied 1,582 households which is acquired through elimination of case included nonresponses, and imputed a nonresponses of 15 item selected from 146 item. The imputation methods and efficiency of imputation for simulation was adapted differently from type of data. For continuous data, we imputed the nonresponses with mean imputation, regression imputation, adjusted grey-based k-NN imputation(DU, DW, WU, WW) and compared the results with RMSE. For categorical data, we imputed the nonresponses with mode method, probability imputation, conditional mode method, conditional probability method, hot-deck imputation, and compared the results with Accuracy. By the results, regression imputation and adjusted grey-based k-NN imputation appropriated for continuous data and hot-deck imputation appropriated for categorical data.

Regression Analysis of Doubly censored data using Gibbs Sampler for the Incubation period

  • Yoo Hanna;Lee Jae Won
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2004.11a
    • /
    • pp.237-241
    • /
    • 2004
  • In standard time-to-event or survival analysis, the occurrence times of the event of interest are observed exactly or are right-censored. However in certain situations such as the AIDS data, the incubation period which is the time between HIV infection time and the diagnosis of AIDS is usually doubly censored. That is the HIV infection time Is interval censored and also the time of the diagnosis of AIDS is right censored. In this paper, we Impute the Interval censored infection time using the conditional mean imputation and estimate the coefficient factor of the regression analysis for the incubation period using Gibbs sampler. We applied parametric and semi-parametric methods for the analysis of the Incubation period and compared the results.

  • PDF

A Sparse Data Preprocessing Using Support Vector Regression (Support Vector Regression을 이용한 희소 데이터의 전처리)

  • Jun, Sung-Hae;Park, Jung-Eun;Oh, Kyung-Whan
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.14 no.6
    • /
    • pp.789-792
    • /
    • 2004
  • In various fields as web mining, bioinformatics, statistical data analysis, and so forth, very diversely missing values are found. These values make training data to be sparse. Largely, the missing values are replaced by predicted values using mean and mode. We can used the advanced missing value imputation methods as conditional mean, tree method, and Markov Chain Monte Carlo algorithm. But general imputation models have the property that their predictive accuracy is decreased according to increase the ratio of missing in training data. Moreover the number of available imputations is limited by increasing missing ratio. To settle this problem, we proposed statistical learning theory to preprocess for missing values. Our statistical learning theory is the support vector regression by Vapnik. The proposed method can be applied to sparsely training data. We verified the performance of our model using the data sets from UCI machine learning repository.