• Title/Summary/Keyword: 다중 대체 방법

Search Result 121, Processing Time 0.028 seconds

Multiple Imputation Reducing Outlier Effect using Weight Adjustment Methods (가중치 보정을 이용한 다중대체법)

  • Kim, Jin-Young;Shin, Key-Il
    • The Korean Journal of Applied Statistics
    • /
    • v.26 no.4
    • /
    • pp.635-647
    • /
    • 2013
  • Imputation is a commonly used method to handle missing survey data. The performance of the imputation method is influenced by various factors, especially an outlier. The removal of the outlier in a data set is a simple and effective approach to reduce the effect of an outlier. In this paper in order to improve the precision of multiple imputation, we study a imputation method which reduces the effect of outlier using various weight adjustment methods that include the removal of an outlier method. The regression method in PROC/MI in SAS is used for multiple imputation and the obtained final adjusted weight is used as a weight variable to obtain the imputed values. Simulation studies compared the performance of various weight adjustment methods and Monthly Labor Statistic data is used for real data analysis.

Non-Response Imputation for Panel Data (패널자료의 무응답 대체법)

  • Pak, Gi-Deok;Shin, Key-Il
    • Communications for Statistical Applications and Methods
    • /
    • v.17 no.6
    • /
    • pp.899-907
    • /
    • 2010
  • Several non-response imputation methods are suggested, however, mainly cross-sectional imputations are studied and applied to this analysis. A simple and common imputation method for panel data is the cross-wave regression imputation or carry-over imputation as a special case of cross-wave regression imputation. This study suggests a multiple imputation method combined time series analysis and cross-sectional multiple imputation method. We compare this method and the cross-wave regression imputation method using MSE, MAE, and Bias. The 2008 monthly labor survey data is used for this study.

Comparison of Data Reconstruction Methods for Missing Value Imputation (결측값 대체를 위한 데이터 재현 기법 비교)

  • Cheongho Kim;Kee-Hoon Kang
    • The Journal of the Convergence on Culture Technology
    • /
    • v.10 no.1
    • /
    • pp.603-608
    • /
    • 2024
  • Nonresponse and missing values are caused by sample dropouts and avoidance of answers to surveys. In this case, problems with the possibility of information loss and biased reasoning arise, and a replacement of missing values with appropriate values is required. In this paper, as an alternative to missing values imputation, we compare several replacement methods, which use mean, linear regression, random forest, K-nearest neighbor, autoencoder and denoising autoencoder based on deep learning. These methods of imputing missing values are explained, and each method is compared by using continuous simulation data and real data. The comparison results confirm that in most cases, the performance of the random forest imputation method and the denoising autoencoder imputation method are better than the others.

Comparison of GEE Estimation Methods for Repeated Binary Data with Time-Varying Covariates on Different Missing Mechanisms (시간-종속적 공변량이 포함된 이분형 반복측정자료의 GEE를 이용한 분석에서 결측 체계에 따른 회귀계수 추정방법 비교)

  • Park, Boram;Jung, Inkyung
    • The Korean Journal of Applied Statistics
    • /
    • v.26 no.5
    • /
    • pp.697-712
    • /
    • 2013
  • When analyzing repeated binary data, the generalized estimating equations(GEE) approach produces consistent estimates for regression parameters even if an incorrect working correlation matrix is used. However, time-varying covariates experience larger changes in coefficients than time-invariant covariates across various working correlation structures for finite samples. In addition, the GEE approach may give biased estimates under missing at random(MAR). Weighted estimating equations and multiple imputation methods have been proposed to reduce biases in parameter estimates under MAR. This article studies if the two methods produce robust estimates across various working correlation structures for longitudinal binary data with time-varying covariates under different missing mechanisms. Through simulation, we observe that time-varying covariates have greater differences in parameter estimates across different working correlation structures than time-invariant covariates. The multiple imputation method produces more robust estimates under any working correlation structure and smaller biases compared to the other two methods.

Robust multiple imputation method for missings with boundary and outliers (한계와 이상치가 있는 결측치의 로버스트 다중대체 방법)

  • Park, Yousung;Oh, Do Young;Kwon, Tae Yeon
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.6
    • /
    • pp.889-898
    • /
    • 2019
  • The problem of missing value imputation for variables in surveys that include item missing becomes complicated if outliers and logical boundary conditions between other survey items cannot be ignored. If there are outliers and boundaries in a variable including missing values, imputed values based on previous regression-based imputation methods are likely to be biased and not meet boundary conditions. In this paper, we approach these difficulties in imputation by combining various robust regression models and multiple imputation methods. Through a simulation study on various scenarios of outliers and boundaries, we find and discuss the optimal combination of robust regression and multiple imputation method.

A two-sample test with interval censored competing risk data using multiple imputation (다중대체방법을 이용한 구간 중도 경쟁 위험 모형에서의 이표본 검정)

  • Kim, Yuwon;Kim, Yang-Jin
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.2
    • /
    • pp.233-241
    • /
    • 2017
  • Interval censored data frequently occur in observation studies where the subject is followed periodically. In this paper, our interest is to suggest a test statistic to compare the CIF of two groups with interval censored failure time data in the presence of competing risks. Gray (1988) suggested a test statistic for right censored data that motivated a well-known Fine and Gray's subdistribution hazard model. A multiple imputation technique is adopted to adopt Gray's test statistic to interval censored data. The powers and sizes of the suggested method are investigated through diverse simulation schemes. The main merit of the suggested method is its simplicity to implement with existing software for right censored data. The method is illustrated by analyzing Bangkok's HIV cohort dataset.

Analysis of the cause-specific proportional hazards model with missing covariates (누락된 공변량을 가진 원인별 비례위험모형의 분석)

  • Minjung Lee
    • The Korean Journal of Applied Statistics
    • /
    • v.37 no.2
    • /
    • pp.225-237
    • /
    • 2024
  • In the analysis of competing risks data, some of covariates may not be fully observed for some subjects. In such cases, excluding subjects with missing covariate values from the analysis may result in biased estimates and loss of efficiency. In this paper, we studied multiple imputation and the augmented inverse probability weighting method for regression parameter estimation in the cause-specific proportional hazards model with missing covariates. The performance of estimators obtained from multiple imputation and the augmented inverse probability weighting method is evaluated by simulation studies, which show that those methods perform well. Multiple imputation and the augmented inverse probability weighting method were applied to investigate significant risk factors for the risk of death from breast cancer and from other causes for breast cancer data with missing values for tumor size obtained from the Prostate, Lung, Colorectal, and Ovarian Cancer Screen Trial Study. Under the cause-specific proportional hazards model, the methods show that race, marital status, stage, grade, and tumor size are significant risk factors for breast cancer mortality, and stage has the greatest effect on increasing the risk of breast cancer death. Age at diagnosis and tumor size have significant effects on increasing the risk of other-cause death.

Multiple imputation and synthetic data (다중대체와 재현자료 작성)

  • Kim, Joungyoun;Park, Min-Jeong
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.1
    • /
    • pp.83-97
    • /
    • 2019
  • As society develops, the dissemination of microdata has increased to respond to diverse analytical needs of users. Analysis of microdata for policy making, academic purposes, etc. is highly desirable in terms of value creation. However, the provision of microdata, whose usefulness is guaranteed, has a risk of exposure of personal information. Several methods have been considered to ensure the protection of personal information while ensuring the usefulness of the data. One of these methods has been studied to generate and utilize synthetic data. This paper aims to understand the synthetic data by exploring methodologies and precautions related to synthetic data. To this end, we first explain muptiple imputation, Bayesian predictive model, and Bayesian bootstrap, which are basic foundations for synthetic data. And then, we link these concepts to the construction of fully/partially synthetic data. To understand the creation of synthetic data, we review a real longitudinal synthetic data example which is based on sequential regression multivariate imputation.

Wormhole Detection using Multipath in sensor network (센서네트워크 환경에서 다중 경로를 이용한 웜홀 검출)

  • Kim, In-Tae;Han, Seung-Jin;Lee, Jung-Hyun
    • KSCI Review
    • /
    • v.15 no.1
    • /
    • pp.77-81
    • /
    • 2007
  • 센서 네트워크 라우팅에 대한 공격은 무선이라는 네트워크 환경 때문에 애드혹과 유사하게 이루어 지고 있다. 하지만 이를 대처하는 보안 매커니즘은 노드가 보다 제한된 자원을 가지므로 그대로 적용할 수 없어 새로운 연구가 필요하게 되었다. 본 논문에서는 웜홀 이라는 라우팅 공격에 대하여 다중 경로를 이용하여 공격을 회피하고 검출하는 방법에 대하여 제안한다. 다중 경로 환경에서 주경로와 대체 경로간 홉당 지연시간을 비교하여 웜홀 경로를 회피, 검출하고 검출 오차를 줄이기 위하여 블랙리스트를 방법을 사용한다. Ns-2 시뮬레이션 환경에서 제안한 방법을 이용한 웜홀 검출 메커니즘을 시뮬레이션하고 웜홀과 정상 노드의 검출율을 비교하여 성능을 측정하였다.

  • PDF

Monte Carlo Random Permutation Tests for Incompletely Ranked Data (불완전 순위 자료를 위한 몬테칼로 임의순열 검정)

  • Huh, Myung-Hoe;Choi, Won
    • The Korean Journal of Applied Statistics
    • /
    • v.14 no.1
    • /
    • pp.191-199
    • /
    • 2001
  • 본 소고는 n명의 심사자가 k개의 객체를 평가하여 얻어진 불완전 순위자료에서 객체간 선호도에 있어 차이가 없다는 영가설을 검정하는 방법에 관한 연구이다. 주어진 자료에서 결측값들을 다중대체하는 방식을 제안하고 이들을 평균 p-값으로 묶는 몬테칼로방식의 임의순열 검정을 제안한다.

  • PDF