• Title/Summary/Keyword: multivariate imputation by chained equations(MICE)

Search Result 5, Processing Time 0.022 seconds

Imputation of Missing SST Observation Data Using Multivariate Bidirectional RNN (다변수 Bidirectional RNN을 이용한 표층수온 결측 데이터 보간)

  • Shin, YongTak;Kim, Dong-Hoon;Kim, Hyeon-Jae;Lim, Chaewook;Woo, Seung-Buhm
    • Journal of Korean Society of Coastal and Ocean Engineers
    • /
    • v.34 no.4
    • /
    • pp.109-118
    • /
    • 2022
  • The data of the missing section among the vertex surface sea temperature observation data was imputed using the Bidirectional Recurrent Neural Network(BiRNN). Among artificial intelligence techniques, Recurrent Neural Networks (RNNs), which are commonly used for time series data, only estimate in the direction of time flow or in the reverse direction to the missing estimation position, so the estimation performance is poor in the long-term missing section. On the other hand, in this study, estimation performance can be improved even for long-term missing data by estimating in both directions before and after the missing section. Also, by using all available data around the observation point (sea surface temperature, temperature, wind field, atmospheric pressure, humidity), the imputation performance was further improved by estimating the imputation data from these correlations together. For performance verification, a statistical model, Multivariate Imputation by Chained Equations (MICE), a machine learning-based Random Forest model, and an RNN model using Long Short-Term Memory (LSTM) were compared. For imputation of long-term missing for 7 days, the average accuracy of the BiRNN/statistical models is 70.8%/61.2%, respectively, and the average error is 0.28 degrees/0.44 degrees, respectively, so the BiRNN model performs better than other models. By applying a temporal decay factor representing the missing pattern, it is judged that the BiRNN technique has better imputation performance than the existing method as the missing section becomes longer.

Bias-corrected imputation method for non-ignorable nonresponse with heteroscedasticity in super-population model (초모집단 모형의 오차가 이분산일 때 무시할 수 없는 무응답에서 편향수정 무응답 대체)

  • Yujin Lee;Key-Il Shin
    • The Korean Journal of Applied Statistics
    • /
    • v.37 no.3
    • /
    • pp.283-295
    • /
    • 2024
  • Many studies have been conducted to properly handle nonresponse. Recently, many nonresponse imputation methods have been developed and practically used. Most imputation methods assume MCAR (missing completely at random) or MAR (missing at random). On the contrary, there are relatively few studies on imputation under the assumption of MNAR (missing not at random) or NN (nonignorable nonresponse) that are affected by the study variable. The MNAR causes Bias and reduces the accuracy of imputation whenever response probability is not properly estimated. Lee and Shin (2022) proposed a nonresponse imputation method that can be applied to nonignorable nonresponse assuming homoscedasticity in super-population model. In this paper we propose an generalized version of the imputation method proposed by Lee and Shin (2022) to improve the accuracy of estimation by removing the Bias caused by MNAR under heteroscedasticity. In addition, the superiority of the proposed method is confirmed through simulation studies.

Bias corrected imputation method for non-ignorable non-response (무시할 수 없는 무응답에서 편향 보정을 이용한 무응답 대체)

  • Lee, Min-Ha;Shin, Key-Il
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.4
    • /
    • pp.485-499
    • /
    • 2022
  • Controlling the total survey error including sampling error and non-sampling error is very important in sampling design. Non-sampling error caused by non-response accounts for a large proportion of the total survey error. Many studies have been conducted to handle non-response properly. Recently, a lot of non-response imputation methods using machine learning technique and traditional statistical methods have been studied and practically used. Most imputation methods assume MCAR(missing completely at random) or MAR(missing at random) and few studies have been conducted focusing on MNAR (missing not at random) or NN(non-ignorable non-response) which cause bias and reduce the accuracy of imputation. In this study, we propose a non-response imputation method that can be applied to non-ignorable non-response. That is, we propose an imputation method to improve the accuracy of estimation by removing the bias caused by NN. In addition, the superiority of the proposed method is confirmed through small simulation studies.

Methods for Handling Incomplete Repeated Measures Data (불완전한 반복측정 자료의 보정방법)

  • Woo, Hae-Bong;Yoon, In-Jin
    • Survey Research
    • /
    • v.9 no.2
    • /
    • pp.1-27
    • /
    • 2008
  • Problems of incomplete data are pervasive in statistical analysis. In particular, incomplete data have been an important challenge in repeated measures studies. The objective of this study is to give a brief introduction to missing data mechanisms and conventional/recent missing data methods and to assess the performance of various missing data methods under ignorable and non-ignorable missingness mechanisms. Given the inadequate attention to longitudinal studies with missing data, this study applied recent advances in missing data methods to repeated measures models and investigated the performance of various missing data methods, such as FIML (Full Information Maximum Likelihood Estimation) and MICE(Multivariate Imputation by Chained Equations), under MCAR, MAR, and MNAR mechanisms. Overall, the results showed that listwise deletion and mean imputation performed poorly compared to other recommended missing data procedures. The better performance of EM, FIML, and MICE was more noticeable under MAR compared to MCAR. With the non-ignorable missing data, this study showed that missing data methods did not perform well. In particular, this problem was noticeable in slope-related estimates. Therefore, this study suggests that if missing data are suspected to be non-ignorable, developmental research may underestimate true rates of change over the life course. This study also suggests that bias from non-ignorable missing data can be substantially reduced by considering rich information from variables related to missingness.

  • PDF

A study on multiple imputation modeling for Korean EAPS (경제활동인구조사 자료를 위한 다중대체 방식 연구)

  • Park, Min-Jeong;Bae, Yoonjong;Kim, Joungyoun
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.5
    • /
    • pp.685-696
    • /
    • 2021
  • The Korean Economically Active Population Survey (KEAPS) is a national survey that produces employment-related statistics. The main purpose of the survey is to find out the economic activity status (employed/ unemployed/ non-employed) of the people. KEAPS has a unique characteristics caused by the survey method. In this study, through understanding of structural non-response and utilization of past data, we would like to present an improved imputation model. The performance of the proposed model is compared with the existing model through simulation. The performance of the imputation models is evaluated based on the degree of mathing/nonmatching rates. For this, we employ the KEAPS data in November 2019. For the randomly selected ones among the total 59,996 respondents, the six explanatory variables, which are critical in determining the economic activity states, are treated as non-response. The proposed model includes industry variable and job status variable in addition to the explanatory variables used in the precedent research. This is based on the linkage and utilization of past data. The simulation results confirm that the proposed model with additional variables outperforms the existing model in the precedent research. In addition, we consider various scenarios for the number of non-responders by the economic activity status.