• Title/Summary/Keyword: Imputation method

Search Result 132, Processing Time 0.021 seconds

Comparison of Single Imputation Methods in 2×2 Cross-Over Design with Missing Observations (2×2 교차계획법에서 결측치가 있을 때의 결측치 처리 방법 비교에 관한 연구)

  • Jo, Bobae;Kim, Dongjae
    • The Korean Journal of Applied Statistics
    • /
    • v.28 no.3
    • /
    • pp.529-540
    • /
    • 2015
  • A cross-over design is frequently used in clinical trials (especially in bioequivalence tests with a parametric method) for the comparison of two treatments. Missing values frequently take place in cross-over designs in the second period. Usually, subjects that have missing values are removed and analyzed. However, it can be unsuitable in clinical trials with a small sample size. In this paper, we compare single imputation methods in a $2{\times}2$ cross-over design when missing values exist in the second period. Additionally, parametric and nonparametric methods are compared after applying single imputation methods. A Monte-Carlo simulation study compares type I error and the power of methods.

Outlier Filtering and Missing Data Imputation Algorithm using TCS Data (TCS데이터를 이용한 이상치제거 및 결측보정 알고리즘 개발)

  • Do, Myung-Sik;Lee, Hyang-Mee;NamKoong, Seong
    • Journal of Korean Society of Transportation
    • /
    • v.26 no.4
    • /
    • pp.241-250
    • /
    • 2008
  • With the ever-growing amount of traffic, there is an increasing need for good quality travel time information. Various existing outlier filtering and missing data imputation algorithms using AVI data for interrupted and uninterrupted traffic flow have been proposed. This paper is devoted to development of an outlier filtering and missing data imputation algorithm by using Toll Collection System (TCS) data. TCS travel time data collected from August to September 2007 were employed. Travel time data from TCS are made out of records of every passing vehicle; these data have potential for providing real-time travel time information. However, the authors found that as the distance between entry tollgates and exit tollgates increases, the variance of travel time also increases. Also, time gaps appeared in the case of long distances between tollgates. Finally, the authors propose a new method for making representative values after removal of abnormal and "noise" data and after analyzing existing methods. The proposed algorithm is effective.

Analysis of the cause-specific proportional hazards model with missing covariates (누락된 공변량을 가진 원인별 비례위험모형의 분석)

  • Minjung Lee
    • The Korean Journal of Applied Statistics
    • /
    • v.37 no.2
    • /
    • pp.225-237
    • /
    • 2024
  • In the analysis of competing risks data, some of covariates may not be fully observed for some subjects. In such cases, excluding subjects with missing covariate values from the analysis may result in biased estimates and loss of efficiency. In this paper, we studied multiple imputation and the augmented inverse probability weighting method for regression parameter estimation in the cause-specific proportional hazards model with missing covariates. The performance of estimators obtained from multiple imputation and the augmented inverse probability weighting method is evaluated by simulation studies, which show that those methods perform well. Multiple imputation and the augmented inverse probability weighting method were applied to investigate significant risk factors for the risk of death from breast cancer and from other causes for breast cancer data with missing values for tumor size obtained from the Prostate, Lung, Colorectal, and Ovarian Cancer Screen Trial Study. Under the cause-specific proportional hazards model, the methods show that race, marital status, stage, grade, and tumor size are significant risk factors for breast cancer mortality, and stage has the greatest effect on increasing the risk of breast cancer death. Age at diagnosis and tumor size have significant effects on increasing the risk of other-cause death.

Structural health monitoring data reconstruction of a concrete cable-stayed bridge based on wavelet multi-resolution analysis and support vector machine

  • Ye, X.W.;Su, Y.H.;Xi, P.S.;Liu, H.
    • Computers and Concrete
    • /
    • v.20 no.5
    • /
    • pp.555-562
    • /
    • 2017
  • The accuracy and integrity of stress data acquired by bridge heath monitoring system is of significant importance for bridge safety assessment. However, the missing and abnormal data are inevitably existed in a realistic monitoring system. This paper presents a data reconstruction approach for bridge heath monitoring based on the wavelet multi-resolution analysis and support vector machine (SVM). The proposed method has been applied for data imputation based on the recorded data by the structural health monitoring (SHM) system instrumented on a prestressed concrete cable-stayed bridge. The effectiveness and accuracy of the proposed wavelet-based SVM prediction method is examined by comparing with the traditional autoregression moving average (ARMA) method and SVM prediction method without wavelet multi-resolution analysis in accordance with the prediction errors. The data reconstruction analysis based on 5-day and 1-day continuous stress history data with obvious preternatural signals is performed to examine the effect of sample size on the accuracy of data reconstruction. The results indicate that the proposed data reconstruction approach based on wavelet multi-resolution analysis and SVM is an effective tool for missing data imputation or preternatural signal replacement, which can serve as a solid foundation for the purpose of accurately evaluating the safety of bridge structures.

Nonignorable Nonresponse Imputation and Rotation Group Bias Estimation on the Rotation Sample Survey (무시할 수 없는 무응답을 가지고 있는 교체표본조사에서의 무응답 대체와 교체그룹 편향 추정)

  • Choi, Bo-Seung;Kim, Dae-Young;Kim, Kee-Whan;Park, You-Sung
    • The Korean Journal of Applied Statistics
    • /
    • v.21 no.3
    • /
    • pp.361-375
    • /
    • 2008
  • We propose proper methods to impute the item nonresponse in 4-8-4 rotation sample survey. We consider nonignorable nonresponse mechanism that can happen when survey deals with sensitive question (e.g. income, labor force). We utilize modeling imputation method based on Bayesian approach to avoid a boundary solution problem. We also estimate a interview time bias using imputed data and calculate cell expectation and marginal probability on fixed time after removing estimated bias. We compare the mean squared errors and bias between maximum likelihood method and Bayesian methods using simulation studies.

Comparing Accuracy of Imputation Methods for Categorical Incomplete Data (범주형 자료의 결측치 추정방법 성능 비교)

  • 신형원;손소영
    • The Korean Journal of Applied Statistics
    • /
    • v.15 no.1
    • /
    • pp.33-43
    • /
    • 2002
  • Various kinds of estimation methods have been developed for imputation of categorical missing data. They include category method, logistic regression, and association rule. In this study, we propose two fusions algorithms based on both neural network and voting scheme that combine the results of individual imputation methods. A Mont-Carlo simulation is used to compare the performance of these methods. Five factors used to simulate the missing data pattern are (1) input-output function, (2) data size, (3) noise of input-output function (4) proportion of missing data, and (5) pattern of missing data. Experimental study results indicate the following: when the data size is small and missing data proportion is large, modal category method, association rule, and neural network based fusion have better performances than the other methods. However, when the data size is small and correlation between input and missing output is strong, logistic regression and neural network barred fusion algorithm appear better than the others. When data size is large with low missing data proportion, a large noise, and strong correlation between input and missing output, neural networks based fusion algorithm turns out to be the best choice.

Predictive Optimization Adjusted With Pseudo Data From A Missing Data Imputation Technique (결측 데이터 보정법에 의한 의사 데이터로 조정된 예측 최적화 방법)

  • Kim, Jeong-Woo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.20 no.2
    • /
    • pp.200-209
    • /
    • 2019
  • When forecasting future values, a model estimated after minimizing training errors can yield test errors higher than the training errors. This result is the over-fitting problem caused by an increase in model complexity when the model is focused only on a given dataset. Some regularization and resampling methods have been introduced to reduce test errors by alleviating this problem but have been designed for use with only a given dataset. In this paper, we propose a new optimization approach to reduce test errors by transforming a test error minimization problem into a training error minimization problem. To carry out this transformation, we needed additional data for the given dataset, termed pseudo data. To make proper use of pseudo data, we used three types of missing data imputation techniques. As an optimization tool, we chose the least squares method and combined it with an extra pseudo data instance. Furthermore, we present the numerical results supporting our proposed approach, which resulted in less test errors than the ordinary least squares method.

On Adaptation to Sparse Design in Bivariate Local Linear Regression

  • Hall, Peter;Seifert, Burkhardt;Turlach, Berwin A.
    • Journal of the Korean Statistical Society
    • /
    • v.30 no.2
    • /
    • pp.231-246
    • /
    • 2001
  • Local linear smoothing enjoys several excellent theoretical and numerical properties, an in a range of applications is the method most frequently chosen for fitting curves to noisy data. Nevertheless, it suffers numerical problems in places where the distribution of design points(often called predictors, or explanatory variables) is spares. In the case of univariate design, several remedies have been proposed for overcoming this problem, of which one involves adding additional ″pseudo″ design points in places where the orignal design points were too widely separated. This approach is particularly well suited to treating sparse bivariate design problem, and in fact attractive, elegant geometric analogues of unvariate imputation and interpolation rules are appropriate for that case. In the present paper we introduce and develop pseudo dta rules for bivariate design, and apply them to real data.

  • PDF

Comparison of EM and Multiple Imputation Methods with Traditional Methods in Monotone Missing Pattern

  • Kang, Shin-Soo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.16 no.1
    • /
    • pp.95-106
    • /
    • 2005
  • Complete-case analysis is easy to carry out and it may be fine with small amount of missing data. However, this method is not recommended in general because the estimates are usually biased and not efficient. There are numerous alternatives to complete-case analysis. A natural alternative procedure is available-case analysis. Available-case analysis uses all cases that contain the variables required for a specific task. The EM algorithm is a general approach for computing maximum likelihood estimates of parameters from incomplete data. These methods and multiple imputation(MI) are reviewed and the performances are compared by simulation studies in monotone missing pattern.

  • PDF

MLE for Incomplete Contingency Tables with Lagrangian Multiplier

  • Kang, Shin-Soo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.17 no.3
    • /
    • pp.919-925
    • /
    • 2006
  • Maximum likelihood estimate(MLE) is obtained from the partial log-likelihood function for the cell probabilities of two way incomplete contingency tables proposed by Chen and Fienberg(1974). The partial log-likelihood function is modified by adding lagrangian multiplier that constraints can be incorporated with. Variances of MLE estimators of population proportions are derived from the matrix of second derivatives of the loglikelihood with respect to cell probabilities. Simulation results, when data are missing at random, reveal that Complete-case(CC) analysis produces biased estimates of joint probabilities under MAR and less efficient than either MLE or MI. MLE and MI provides consistent results under either the MAR situation. MLE provides more efficient estimates of population proportions than either multiple imputation(MI) based on data augmentation or complete case analysis. The standard errors of MLE from the proposed method using lagrangian multiplier are valid and have less variation than the standard errors from MI and CC.

  • PDF