• Title/Summary/Keyword: Imputation Method

Search Result 132, Processing Time 0.021 seconds

Non-Response Imputation for Panel Data (패널자료의 무응답 대체법)

  • Pak, Gi-Deok;Shin, Key-Il
    • Communications for Statistical Applications and Methods
    • /
    • v.17 no.6
    • /
    • pp.899-907
    • /
    • 2010
  • Several non-response imputation methods are suggested, however, mainly cross-sectional imputations are studied and applied to this analysis. A simple and common imputation method for panel data is the cross-wave regression imputation or carry-over imputation as a special case of cross-wave regression imputation. This study suggests a multiple imputation method combined time series analysis and cross-sectional multiple imputation method. We compare this method and the cross-wave regression imputation method using MSE, MAE, and Bias. The 2008 monthly labor survey data is used for this study.

Two-stage imputation method to handle missing data for categorical response variable

  • Jong-Min Kim;Kee-Jae Lee;Seung-Joo Lee
    • Communications for Statistical Applications and Methods
    • /
    • v.30 no.6
    • /
    • pp.577-587
    • /
    • 2023
  • Conventional categorical data imputation techniques, such as mode imputation, often encounter issues related to overestimation. If the variable has too many categories, multinomial logistic regression imputation method may be impossible due to computational limitations. To rectify these limitations, we propose a two-stage imputation method. During the first stage, we utilize the Boruta variable selection method on the complete dataset to identify significant variables for the target categorical variable. Then, in the second stage, we use the important variables for the target categorical variable for logistic regression to impute missing data in binary variables, polytomous regression to impute missing data in categorical variables, and predictive mean matching to impute missing data in quantitative variables. Through analysis of both asymmetric and non-normal simulated and real data, we demonstrate that the two-stage imputation method outperforms imputation methods lacking variable selection, as evidenced by accuracy measures. During the analysis of real survey data, we also demonstrate that our suggested two-stage imputation method surpasses the current imputation approach in terms of accuracy.

A Computational Intelligence Based Online Data Imputation Method: An Application For Banking

  • Nishanth, Kancherla Jonah;Ravi, Vadlamani
    • Journal of Information Processing Systems
    • /
    • v.9 no.4
    • /
    • pp.633-650
    • /
    • 2013
  • All the imputation techniques proposed so far in literature for data imputation are offline techniques as they require a number of iterations to learn the characteristics of data during training and they also consume a lot of computational time. Hence, these techniques are not suitable for applications that require the imputation to be performed on demand and near real-time. The paper proposes a computational intelligence based architecture for online data imputation and extended versions of an existing offline data imputation method as well. The proposed online imputation technique has 2 stages. In stage 1, Evolving Clustering Method (ECM) is used to replace the missing values with cluster centers, as part of the local learning strategy. Stage 2 refines the resultant approximate values using a General Regression Neural Network (GRNN) as part of the global approximation strategy. We also propose extended versions of an existing offline imputation technique. The offline imputation techniques employ K-Means or K-Medoids and Multi Layer Perceptron (MLP)or GRNN in Stage-1and Stage-2respectively. Several experiments were conducted on 8benchmark datasets and 4 bank related datasets to assess the effectiveness of the proposed online and offline imputation techniques. In terms of Mean Absolute Percentage Error (MAPE), the results indicate that the difference between the proposed best offline imputation method viz., K-Medoids+GRNN and the proposed online imputation method viz., ECM+GRNN is statistically insignificant at a 1% level of significance. Consequently, the proposed online technique, being less expensive and faster, can be employed for imputation instead of the existing and proposed offline imputation techniques. This is the significant outcome of the study. Furthermore, GRNN in stage-2 uniformly reduced MAPE values in both offline and online imputation methods on all datasets.

REGRESSION FRACTIONAL HOT DECK IMPUTATION

  • Kim, Jae-Kwang
    • Journal of the Korean Statistical Society
    • /
    • v.36 no.3
    • /
    • pp.423-434
    • /
    • 2007
  • Imputation using a regression model is a method to preserve the correlation among variables and to provide imputed point estimators. We discuss the implementation of regression imputation using fractional imputation. By a suitable choice of fractional weights, the fractional regression imputation can take the form of hot deck fractional imputation, thus no artificial values are constructed after the imputation. A variance estimator, which extends the method of Kim and Fuller (2004), is also proposed. Results from a limited simulation study are presented.

A comparison of imputation methods using machine learning models

  • Heajung Suh;Jongwoo Song
    • Communications for Statistical Applications and Methods
    • /
    • v.30 no.3
    • /
    • pp.331-341
    • /
    • 2023
  • Handling missing values in data analysis is essential in constructing a good prediction model. The easiest way to handle missing values is to use complete case data, but this can lead to information loss within the data and invalid conclusions in data analysis. Imputation is a technique that replaces missing data with alternative values obtained from information in a dataset. Conventional imputation methods include K-nearest-neighbor imputation and multiple imputations. Recent methods include missForest, missRanger, and mixgb ,all which use machine learning algorithms. This paper compares the imputation techniques for datasets with mixed datatypes in various situations, such as data size, missing ratios, and missing mechanisms. To evaluate the performance of each method in mixed datasets, we propose a new imputation performance measure (IPM) that is a unified measurement applicable to numerical and categorical variables. We believe this metric can help find the best imputation method. Finally, we summarize the comparison results with imputation performances and computational times.

On the use of weighted adaptive nearest neighbors for missing value imputation (가중 적응 최근접 이웃을 이용한 결측치 대치)

  • Yum, Yunjin;Kim, Dongjae
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.4
    • /
    • pp.507-516
    • /
    • 2018
  • Widely used among the various single imputation methods is k-nearest neighbors (KNN) imputation due to its robustness even when a parametric model such as multivariate normality is not satisfied. We propose a weighted adaptive nearest neighbors imputation method that combines the adaptive nearest neighbors imputation method that accounts for the local features of the data in the KNN imputation method and weighted k-nearest neighbors method that are less sensitive to extreme value or outlier among k-nearest neighbors. We conducted a Monte Carlo simulation study to compare the performance of the proposed imputation method with previous imputation methods.

Jackknife Variance Estimation under Imputation for Nonrandom Nonresponse with Follow-ups

  • Park, Jinwoo
    • Journal of the Korean Statistical Society
    • /
    • v.29 no.4
    • /
    • pp.385-394
    • /
    • 2000
  • Jackknife variance estimation based on adjusted imputed values when nonresponse is nonrandom and follow-up data are available for a subsample of nonrespondents is provided. Both hot-deck and ratio imputation method are considered as imputation method. The performance of the proposed variance estimator under nonrandom response mechanism is investigated through numerical simulation.

  • PDF

A New Method for Imputation of Missing Genotype using Linkage Disequilibrium and Haplotype Information (결측치가 존재하는 유전형 자료에서의 연관불균형과 일배체형을 사용한 결측치 대치 방법)

  • Park Yun-Ju;Kim Young-Jin;Park Jung-Sun;Kim Kuchan;Koh Insong;Jung Ho-Youl
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.2
    • /
    • pp.99-107
    • /
    • 2005
  • In this paper, wc propose a now missing imputation method for minimizing loss of information linkage disequilibrium-based and haplotype-based imputation method, which estimate missing values of the data based on the specificity of Single Nucleotide Polymorphism(SNP) genotype data. Method for imputing data is needed to minimize the loss of information caused by experimental missing data. In general, missing imputation of biological data has used major allele imputation method. but this approach is not optima]. 1'his method has high error rates of missing values estimation since the characteristics of the genotype data are not considered not take into consideration the specific structure of the data. In this paper, we show the results of the comparative evaluation of our model methods and major imputation method for the estimation of missing values.

Application of Multiple Imputation Method in Analyzing Data with Missing Continuous Covariates

  • Ghasemizadeh Tamar, S.;Ganjali, M.
    • The Korean Journal of Applied Statistics
    • /
    • v.21 no.4
    • /
    • pp.659-664
    • /
    • 2008
  • Missing continuous covariates are pervasive in the use of generalized linear models for medical data. Multiple imputation is the most common and easy-to-do method of dealing with missing covariate data. However, there are always serious warnings in using this method. There should be concern to make imputed values more proper. In this paper, proper imputation from posterior predictive distribution is developed for implementing with arbitrary priors. We use empirical distribution of the posterior for approximating the posterior predictive distribution, to sample from it. This method is preferable in comparison with a presented imputation method of us which uses a full model to impute missing values using available software. The proposed methods are implemented on glucocorticoid data.

Comparison of Five Single Imputation Methods in General Missing Pattern

  • Kang, Shin-Soo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.15 no.4
    • /
    • pp.945-955
    • /
    • 2004
  • 'Complete-case analysis' is easy to carry out and it may be fine with small amount of missing data. However, this method is not recommended in general because the estimates are usually biased and not efficient. There are numerous alternatives to complete-case analysis. One alternative is the single imputation. Some of the most common single imputation methods are reviewed and the performances are compared by simulation studies.

  • PDF