• 제목/요약/키워드: Multiple Imputation

검색결과 61건 처리시간 0.022초

Jackknife Variance Estimation under Imputation for Nonrandom Nonresponse with Follow-ups

  • Park, Jinwoo
    • Journal of the Korean Statistical Society
    • /
    • 제29권4호
    • /
    • pp.385-394
    • /
    • 2000
  • Jackknife variance estimation based on adjusted imputed values when nonresponse is nonrandom and follow-up data are available for a subsample of nonrespondents is provided. Both hot-deck and ratio imputation method are considered as imputation method. The performance of the proposed variance estimator under nonrandom response mechanism is investigated through numerical simulation.

  • PDF

가중치 보정을 이용한 다중대체법 (Multiple Imputation Reducing Outlier Effect using Weight Adjustment Methods)

  • 김진영;신기일
    • 응용통계연구
    • /
    • 제26권4호
    • /
    • pp.635-647
    • /
    • 2013
  • 다중 대체법은 표본조사에서 결측값이 발생하였을 때 가장 흔히 사용하는 방법이다. 이 방법은 여러 요인에 의해 그 성능이 좌우되며 특히 이상점의 영향을 많이 받는다. 본 연구에서는 가중치 보정법을 이용하여 이상점의 영향력을 줄여 다중 대체법의 성능을 향상시키는 방법을 연구하였다. 가중치 보정법을 이용하여 얻어진 최종 가중치를 다중대체에 사용하였으며 SAS의 PROC MI가 다중 대체를 위해 사용되었다. 모의실험과 매월노동통계 자료를 이용한 실제 자료 분석을 통하여 제안된 방법의 우수성을 확인하였다.

한계와 이상치가 있는 결측치의 로버스트 다중대체 방법 (Robust multiple imputation method for missings with boundary and outliers)

  • 박유성;오도영;권태연
    • 응용통계연구
    • /
    • 제32권6호
    • /
    • pp.889-898
    • /
    • 2019
  • 항목 무응답(item missing)이 발생한 설문조사에서 결측이 포함된 변수에 이상치(outlier)의 존재와 다른 설문문항 항목과의 논리적 한계(boundary) 조건들이 유의미하다면 결측치 대체문제는 매우 복잡해진다. 한계가 있는 결측값들을 포함한 변수에 이상치가 존재하는 경우, 기존의 회귀분석에 근거한 결측치 대체방법은 편향된 대체값 그리고 한계를 만족하지 않은 대체값을 제시할 가능성이 있다. 이에 본 논문은 회귀모형에 기반을 두고 결측치들을 대체를 함에 있어 이상치와 논리적 한계조건이 자료에 존재하는 경우, 다양한 로버스트 회귀모형과 다중대체 방법의 조합을 통해 해결점을 모색하고자 한다. 이를 위해 이들 방법들의 최적의 조합을 다양한 시나리오별로 모의실험을 통하여 찾아보고 이에 대하여 논의하였다.

Application of NORM to the Multiple Imputation for Multivariate Missing Data

  • 김현정;문승호;신재경
    • Journal of the Korean Data and Information Science Society
    • /
    • 제13권2호
    • /
    • pp.105-113
    • /
    • 2002
  • The statistical analysis of incomplete data sometimes requires handling of incomplete observations. Towards this end, each case with some missing values generally should be deleted, namely, resulting in only use of non-missing cases. EM algorithm(Dempster et al., 1977) which involves prediction and estimation steps is a general method among others. In this article, we use the free software NORM developed for multiple imputation, which uses DA(Data Augmentation) algorithm in its imputation, and evaluate its efficiency through a numerical example.

  • PDF

Multiple imputation for competing risks survival data via pseudo-observations

  • Han, Seungbong;Andrei, Adin-Cristian;Tsui, Kam-Wah
    • Communications for Statistical Applications and Methods
    • /
    • 제25권4호
    • /
    • pp.385-396
    • /
    • 2018
  • Competing risks are commonly encountered in biomedical research. Regression models for competing risks data can be developed based on data routinely collected in hospitals or general practices. However, these data sets usually contain the covariate missing values. To overcome this problem, multiple imputation is often used to fit regression models under a MAR assumption. Here, we introduce a multivariate imputation in a chained equations algorithm to deal with competing risks survival data. Using pseudo-observations, we make use of the available outcome information by accommodating the competing risk structure. Lastly, we illustrate the practical advantages of our approach using simulations and two data examples from a coronary artery disease data and hepatocellular carcinoma data.

Handling Incomplete Data Problem in Collaborative Filtering System

  • Noh, Hyun-ju;Kwak, Min-jung;Han, In-goo
    • 한국산학기술학회:학술대회논문집
    • /
    • 한국산학기술학회 2003년도 Proceeding
    • /
    • pp.105-110
    • /
    • 2003
  • Collaborative filtering is one of the methodologies that are most widely used for recommendation system. It is based on a data matrix of each customer's preferences of products. There could be a lot of missing values in such preference. data matrix. This incomplete data is one of the reasons to deteriorate the accuracy of recommendation system. Multiple imputation method imputes m values for each missing value. It overcomes flaws of single imputation approaches through considering the uncertainty of missing values.. The objective of this paper is to suggest multiple imputation-based collaborative filtering approach for recommendation system to improve the accuracy in prediction performance. The experimental works show that the proposed approach provides better performance than the traditional Collaborative filtering approach, especially in case that there are a lot of missing values in dataset used for recommendation system.

  • PDF

Iterative integrated imputation for missing data and pathway models with applications to breast cancer subtypes

  • Linder, Henry;Zhang, Yuping
    • Communications for Statistical Applications and Methods
    • /
    • 제26권4호
    • /
    • pp.411-430
    • /
    • 2019
  • Tumor development is driven by complex combinations of biological elements. Recent advances suggest that molecularly distinct subtypes of breast cancers may respond differently to pathway-targeted therapies. Thus, it is important to dissect pathway disturbances by integrating multiple molecular profiles, such as genetic, genomic and epigenomic data. However, missing data are often present in the -omic profiles of interest. Motivated by genomic data integration and imputation, we present a new statistical framework for pathway significance analysis. Specifically, we develop a new strategy for imputation of missing data in large-scale genomic studies, which adapts low-rank, structured matrix completion. Our iterative strategy enables us to impute missing data in complex configurations across multiple data platforms. In turn, we perform large-scale pathway analysis integrating gene expression, copy number, and methylation data. The advantages of the proposed statistical framework are demonstrated through simulations and real applications to breast cancer subtypes. We demonstrate superior power to identify pathway disturbances, compared with other imputation strategies. We also identify differential pathway activity across different breast tumor subtypes.

Handling Incomplete Data Problem in Collaborative Filtering System

  • Noh, Hyun-Ju;Kwak, Min-Jung;Han, In-Goo
    • 지능정보연구
    • /
    • 제9권2호
    • /
    • pp.51-63
    • /
    • 2003
  • Collaborative filtering is one of the methodologies that are most widely used for recommendation system. It is based on a data matrix of each customer's preferences of products. There could be a lot of missing values in such preference data matrix. This incomplete data is one of the reasons to deteriorate the accuracy of recommendation system. There are several treatments to deal with the incomplete data problem such as case deletion and single imputation. Those approaches are simple and easy to implement but they may provide biased results. Multiple imputation method imputes m values for each missing value. It overcomes flaws of single imputation approaches through considering the uncertainty of missing values. The objective of this paper is to suggest multiple imputation-based collaborative filtering approach for recommendation system to improve the accuracy in prediction performance. The experimental works show that the proposed approach provides better performance than the traditional Collaborative filtering approach, especially in case that there are a lot of missing values in dataset used for recommendation system.

  • PDF

Identification of Differentially Expressed Genes Using Tests Based on Multiple Imputations

  • Kim, Sang Cheol;Yu, Donghyeon
    • Quantitative Bio-Science
    • /
    • 제36권1호
    • /
    • pp.23-31
    • /
    • 2017
  • Datasets from DNA microarray experiments, which are in the form of large matrices of expression levels of genes, often have missing values. However, the existing statistical methods including the principle components analysis (PCA) and Hotelling's t-test are not directly applicable for the datasets having missing values due to the fact that they assume the observed dataset is complete in general. Many methods have been proposed in previous literature to impute the missing in the observed data. Troyanskaya et al. [1] study the k-nearest neighbor (kNN) imputation, Kim et al. [2] propose the local least squares (LLS) method and Rubin [3] propose the multiple imputation (MI) for missing values. To identify differentially expressed genes, we propose a new testing procedure when the missing exists in the observed data. The proposed procedure uses the Stouffer's z-scores and combines the test results of individual imputed samples, which are dependent to each other. We numerically show that the proposed test procedure based on MI performs better than the existing test procedures based on single imputation (SI) by comparing their ROC curves. We apply the proposed method to analyzing a public microarray data.

마코프 랜덤 필드 하에서 정규혼합모형에 의한 다중 결측값 대체기법: 색조영상 결측 화소값 대체에 응용 (Imputation of Multiple Missing Values by Normal Mixture Model under Markov Random Field: Application to Imputation of Pixel Values of Color Image)

  • 김승구
    • Communications for Statistical Applications and Methods
    • /
    • 제16권6호
    • /
    • pp.925-936
    • /
    • 2009
  • 자료의 독립성 가청 하에서 EM 알고리즘에 의한 경측치 대체 (imputation of missing values) 기법은 잘 알려져 있다. 그러나 공간자료를 다루는 응용문제에서는 독립성 가정이 확장된 마코프 랜덤 필드 (Markov random field; MRF) 하에서 다루어져야 할 것이다. 이에 본 논문에서는 마코프 랜덤 필드 모형 궁에서 다변량 자료 중에 다중의 결측치의 대체를 위한 EM 알고리즘을 제공한다. 이 기법은 몇 가지 현실척 가정하에서 결국 혼합모형에 의한 대체 기법 임을 보인다. 그리고 제공된 기법으로 3-변량으로 구성된 색조영상(color image)의 결측화소값 대체문제에 적용하여 그 유용성과 문제점을 밝히며, 문제정의 개선방안에 대해 논의한다.