Search | Korea Science

Yoo, Han-Na;Lee, Jae-Won
- The Korean Journal of Applied Statistics
- /
- v.22 no.3
- /
- pp.607-616
- /
- 2009
In many epidemiological studies, the occurrence times of the event of interest are right-censored or interval censored. In certain situations such as the AIDS data, however, the incubation period which is the time between HIV infection and the diagnosis of AIDS is usually doubly censored. In this paper, we impute the interval censored HIV infection time using three imputation methods. Mid imputation, conditional mean imputation and approximate Bayesian bootstrap are implemented to obtain right censored data, and then Gibbs sampler is used to estimate the coefficient factor of the incubation period. By using Bayesian approach, flexible modeling and the use of prior information is available. We applied both parametric and semi-parametric methods for estimating the effect of the covariate and compared the imputation results incorporating prior information for the covariate effects.
https://doi.org/10.5351/KJAS.2009.22.3.607 인용 PDF KSCI

Cho, Young-Sook;Chun, Young-Min;Hwang, Dae-Yong
- The Korean Journal of Applied Statistics
- /
- v.21 no.1
- /
- pp.95-107
- /
- 2008
Survey on the rural living indicators was the statistic approved from National Statistical Office and the survey executed by rural resources development institute. This study was used the raw data of survey on the rural living indicators in 2005. After editing procedure for raw data, we were studied 1,582 households which is acquired through elimination of case included nonresponses, and imputed a nonresponses of 15 item selected from 146 item. The imputation methods and efficiency of imputation for simulation was adapted differently from type of data. For continuous data, we imputed the nonresponses with mean imputation, regression imputation, adjusted grey-based k-NN imputation(DU, DW, WU, WW) and compared the results with RMSE. For categorical data, we imputed the nonresponses with mode method, probability imputation, conditional mode method, conditional probability method, hot-deck imputation, and compared the results with Accuracy. By the results, regression imputation and adjusted grey-based k-NN imputation appropriated for continuous data and hot-deck imputation appropriated for categorical data.
https://doi.org/10.5351/KJAS.2008.21.1.095 인용 PDF KSCI

Yoo Hanna;Lee Jae Won
- Proceedings of the Korean Statistical Society Conference
- /
- 2004.11a
- /
- pp.237-241
- /
- 2004
In standard time-to-event or survival analysis, the occurrence times of the event of interest are observed exactly or are right-censored. However in certain situations such as the AIDS data, the incubation period which is the time between HIV infection time and the diagnosis of AIDS is usually doubly censored. That is the HIV infection time Is interval censored and also the time of the diagnosis of AIDS is right censored. In this paper, we Impute the Interval censored infection time using the conditional mean imputation and estimate the coefficient factor of the regression analysis for the incubation period using Gibbs sampler. We applied parametric and semi-parametric methods for the analysis of the Incubation period and compared the results.
PDF

Jun, Sung-Hae;Park, Jung-Eun;Oh, Kyung-Whan
- Journal of the Korean Institute of Intelligent Systems
- /
- v.14 no.6
- /
- pp.789-792
- /
- 2004
In various fields as web mining, bioinformatics, statistical data analysis, and so forth, very diversely missing values are found. These values make training data to be sparse. Largely, the missing values are replaced by predicted values using mean and mode. We can used the advanced missing value imputation methods as conditional mean, tree method, and Markov Chain Monte Carlo algorithm. But general imputation models have the property that their predictive accuracy is decreased according to increase the ratio of missing in training data. Moreover the number of available imputations is limited by increasing missing ratio. To settle this problem, we proposed statistical learning theory to preprocess for missing values. Our statistical learning theory is the support vector regression by Vapnik. The proposed method can be applied to sparsely training data. We verified the performance of our model using the data sets from UCI machine learning repository.
https://doi.org/10.5391/JKIIS.2004.14.6.789 인용 PDF KSCI