• Title/Summary/Keyword: Missing mechanisms

Search Result 30, Processing Time 0.018 seconds

Methods for Handling Incomplete Repeated Measures Data (불완전한 반복측정 자료의 보정방법)

  • Woo, Hae-Bong;Yoon, In-Jin
    • Survey Research
    • /
    • v.9 no.2
    • /
    • pp.1-27
    • /
    • 2008
  • Problems of incomplete data are pervasive in statistical analysis. In particular, incomplete data have been an important challenge in repeated measures studies. The objective of this study is to give a brief introduction to missing data mechanisms and conventional/recent missing data methods and to assess the performance of various missing data methods under ignorable and non-ignorable missingness mechanisms. Given the inadequate attention to longitudinal studies with missing data, this study applied recent advances in missing data methods to repeated measures models and investigated the performance of various missing data methods, such as FIML (Full Information Maximum Likelihood Estimation) and MICE(Multivariate Imputation by Chained Equations), under MCAR, MAR, and MNAR mechanisms. Overall, the results showed that listwise deletion and mean imputation performed poorly compared to other recommended missing data procedures. The better performance of EM, FIML, and MICE was more noticeable under MAR compared to MCAR. With the non-ignorable missing data, this study showed that missing data methods did not perform well. In particular, this problem was noticeable in slope-related estimates. Therefore, this study suggests that if missing data are suspected to be non-ignorable, developmental research may underestimate true rates of change over the life course. This study also suggests that bias from non-ignorable missing data can be substantially reduced by considering rich information from variables related to missingness.

  • PDF

Non-identifiability and testability of missing mechanisms in incomplete two-way contingency tables

  • Park, Yousung;Oh, Seung Mo;Kwon, Tae Yeon
    • Communications for Statistical Applications and Methods
    • /
    • v.28 no.3
    • /
    • pp.307-314
    • /
    • 2021
  • We showed that any missing mechanism is reproduced by EMAR or MNAR with equal fit for observed likelihood if there are non-negative solutions of maximum likelihood equations. This is a generalization of Molenberghs et al. (2008) and Jeon et al. (2019). Nonetheless, as MCAR becomes a nested model of MNAR, a natural question is whether or not MNAR and MCAR are testable by using the well-known three statistics, LR (Likelihood ratio), Wald, and Score test statistics. Through simulation studies, we compared these three statistics. We investigated to what extent the boundary solution affect tesing MCAR against MNAR, which is the only testable pair of missing mechanisms based on observed likelihood. We showed that all three statistics are useful as long as the boundary proximity is far from 1.

A Study on Automatic Missing Value Imputation Replacement Method for Data Processing in Digital Data (디지털 데이터에서 데이터 전처리를 위한 자동화된 결측 구간 대치 방법에 관한 연구)

  • Kim, Jong-Chan;Sim, Chun-Bo;Jung, Se-Hoon
    • Journal of Korea Multimedia Society
    • /
    • v.24 no.2
    • /
    • pp.245-254
    • /
    • 2021
  • We proposed the research on an analysis and prediction model that allows the identification of outliers or abnormality in the data followed by effective and rapid imputation of missing values was conducted. This model is expected to analyze efficiently the problems in the data based on the calibrated raw data. As a result, a system that can adequately utilize the data was constructed by using the introduced KNN + MLE algorithm. With this algorithm, the problems in some of the existing KNN-based missing data imputation algorithms such as ignoring the missing values in some data sections or discarding normal observations were effectively addressed. A comparative evaluation was performed between the existing imputation approaches such as K-means, KNN, MEI, and MI as well as the data missing mechanisms including MCAR, MAR, and NI to check the effectiveness/efficiency of the proposed algorithm, and its superiority in all aspects was confirmed.

A Study on Imputation using Adjusted Cohen Method

  • Chung, Sung-Suk;Chun, Young-Min;Lee, Sun-Kyung
    • Journal of the Korean Data and Information Science Society
    • /
    • v.17 no.3
    • /
    • pp.871-888
    • /
    • 2006
  • Many studies have been done to develop procedures to deal with missing values. Most common method is to reassign the other values to the missing data. The purpose of our study is to suggest adjusted Cohen methods and to compare the efficiency of them with other methods through a simulation study. The adjusted Cohen methods use an auxiliary variable to arrange ranking of the variable with missing values. It leads to a reduced mean square error(MSE) compared with the Cohen method.

  • PDF

A comparison of imputation methods using machine learning models

  • Heajung Suh;Jongwoo Song
    • Communications for Statistical Applications and Methods
    • /
    • v.30 no.3
    • /
    • pp.331-341
    • /
    • 2023
  • Handling missing values in data analysis is essential in constructing a good prediction model. The easiest way to handle missing values is to use complete case data, but this can lead to information loss within the data and invalid conclusions in data analysis. Imputation is a technique that replaces missing data with alternative values obtained from information in a dataset. Conventional imputation methods include K-nearest-neighbor imputation and multiple imputations. Recent methods include missForest, missRanger, and mixgb ,all which use machine learning algorithms. This paper compares the imputation techniques for datasets with mixed datatypes in various situations, such as data size, missing ratios, and missing mechanisms. To evaluate the performance of each method in mixed datasets, we propose a new imputation performance measure (IPM) that is a unified measurement applicable to numerical and categorical variables. We believe this metric can help find the best imputation method. Finally, we summarize the comparison results with imputation performances and computational times.

Sensitivity analysis of missing mechanisms for the 19th Korean presidential election poll survey (19대 대선 여론조사에서 무응답 메카니즘의 민감도 분석)

  • Kim, Seongyong;Kwak, Dongho
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.1
    • /
    • pp.29-40
    • /
    • 2019
  • Categorical data with non-responses are frequently observed in election poll surveys, and can be represented by incomplete contingency tables. To estimate supporting rates of candidates, the identification of the missing mechanism should be pre-determined because the estimates of non-responses can be changed depending on the assumed missing mechanism. However, it has been shown that it is not possible to identify the missing mechanism when using observed data. To overcome this problem, sensitivity analysis has been suggested. The previously proposed sensitivity analysis can be applicable only to two-way incomplete contingency tables with binary variables. The previous sensitivity analysis is inappropriate to use since more than two of the factors such as region, gender, and age are usually considered in election poll surveys. In this paper, sensitivity analysis suitable to an multi-dimensional incomplete contingency table is devised, and also applied to the 19th Korean presidential election poll survey data. As a result, the intervals of estimates from the sensitivity analysis include actual results as well as estimates from various missing mechanisms. In addition, the properties of the missing mechanism that produce estimates nearest to actual election results are investigated.

ELCIC: An R package for model selection using the empirical-likelihood based information criterion

  • Chixiang Chen;Biyi Shen;Ming Wang
    • Communications for Statistical Applications and Methods
    • /
    • v.30 no.4
    • /
    • pp.355-368
    • /
    • 2023
  • This article introduces the R package ELCIC (https://cran.r-project.org/web/packages/ELCIC/index.html), which provides an empirical likelihood-based information criterion (ELCIC) for model selection that includes, but is not limited to, variable selection. The empirical likelihood is a semi-parametric approach to draw statistical inference that does not require distribution assumptions for data generation. Therefore, ELCIC is more robust and versatile in the context of model selection compared to the currently existing information criteria. This paper illustrates several applications of ELCIC, including its use in generalized linear models, generalized estimating equations (GEE) for longitudinal data, and weighted GEE (WGEE) for missing longitudinal data under the mechanisms of missing at random and dropout.

Comparison of GEE Estimation Methods for Repeated Binary Data with Time-Varying Covariates on Different Missing Mechanisms (시간-종속적 공변량이 포함된 이분형 반복측정자료의 GEE를 이용한 분석에서 결측 체계에 따른 회귀계수 추정방법 비교)

  • Park, Boram;Jung, Inkyung
    • The Korean Journal of Applied Statistics
    • /
    • v.26 no.5
    • /
    • pp.697-712
    • /
    • 2013
  • When analyzing repeated binary data, the generalized estimating equations(GEE) approach produces consistent estimates for regression parameters even if an incorrect working correlation matrix is used. However, time-varying covariates experience larger changes in coefficients than time-invariant covariates across various working correlation structures for finite samples. In addition, the GEE approach may give biased estimates under missing at random(MAR). Weighted estimating equations and multiple imputation methods have been proposed to reduce biases in parameter estimates under MAR. This article studies if the two methods produce robust estimates across various working correlation structures for longitudinal binary data with time-varying covariates under different missing mechanisms. Through simulation, we observe that time-varying covariates have greater differences in parameter estimates across different working correlation structures than time-invariant covariates. The multiple imputation method produces more robust estimates under any working correlation structure and smaller biases compared to the other two methods.

Modelling Missing Traffic Volume Data using Circular Probability Distribution (순환확률분포를 이용한 교통량 결측자료 보정 모형)

  • Kim, Hyeon-Seok;Im, Gang-Won;Lee, Yeong-In;Nam, Du-Hui
    • Journal of Korean Society of Transportation
    • /
    • v.25 no.4
    • /
    • pp.109-121
    • /
    • 2007
  • In this study, an imputation model using circular probability distribution was developed in order to overcome problems of missing data from a traffic survey. The existing ad-hoc or heuristic, model-based and algorithm-based imputation techniques were reviewed through previous studies, and then their limitations for imputing missing traffic volume data were revealed. The statistical computing language 'R' was employed for model construction, and a mixture of von Mises probability distribution, which is classified as symmetric, and unimodal circular probability were finally fitted on the basis of traffic volume data at survey stations in urban and rural areas, respectively. The circular probability distribution model largely proved to outperform a dummy variable regression model in regards to various evaluation conditions. It turned out that circular probability distribution models depict circularity of hourly volumes well and are very cost-effective and robust to changes in missing mechanisms.

Automatic Feeding and Transplanting Mechanism for Plug Seedling Transplanter (플러그묘 자동이식기의 묘 자동공급 및 이식기구에 관한 연구)

  • 민영봉;문성동
    • Journal of Biosystems Engineering
    • /
    • v.23 no.3
    • /
    • pp.259-270
    • /
    • 1998
  • An automatic seedling transplanter, employed an innovative plug-seedling feeder was developed by improving the problems of conversational feeding and transplanting mechanisms. With conventional methods, missing and damage rates of seedling were high for long seedlings over 20cm and also breaking seed-bed was frequently observed. Thus, a pushout-bucket slide-hopper type trandsplanter was devised and tested. Test results of picking and transferring accuracies of the developed transplanter are as follows : A prototype transplanter performed with 1.5% of missing rate. The deviations of horizontal feed ranged from -0.3mm to 2.8mm and averaged 0.673mm for the 128-hoe test tray : and ranged from -lmm to +3mm and averaged 0.785mm for the 200-hole test tray. The deviations could decrease with precise manufacturing and lightening the mechanism. The maximum and deviations of vertical feed were -2.3mm and + 1mm, respectively, for the 128-hole test tray ; and were +3mm and +2.5mm, respectively, for the 200-hole test tray. The missing rate, seeding bruise rate and seed-bed damage rate were esitmate to be 1.3%, 0.4% and 3.5%, respectively, with the developed automatic transplanter.

  • PDF