• 제목/요약/키워드: Multiple Imputation

검색결과 61건 처리시간 0.022초

MergeReference: A Tool for Merging Reference Panels for HLA Imputation

  • Cook, Seungho;Han, Buhm
    • Genomics & Informatics
    • /
    • 제15권3호
    • /
    • pp.108-111
    • /
    • 2017
  • Recently developed computational methods allow the imputation of human leukocyte antigen (HLA) genes using intergenic single nucleotide polymorphism markers. To improve the imputation accuracy in HLA imputation, it is essential to increase the sample size and the diversity of alleles in the reference panel. Our software, MergeReference, helps achieve this goal by providing a streamlined pipeline for combining multiple reference panels into one.

Breast Cancer and Modifiable Lifestyle Factors in Argentinean Women: Addressing Missing Data in a Case-Control Study

  • Coquet, Julia Becaria;Tumas, Natalia;Osella, Alberto Ruben;Tanzi, Matteo;Franco, Isabella;Diaz, Maria Del Pilar
    • Asian Pacific Journal of Cancer Prevention
    • /
    • 제17권10호
    • /
    • pp.4567-4575
    • /
    • 2016
  • A number of studies have evidenced the effect of modifiable lifestyle factors such as diet, breastfeeding and nutritional status on breast cancer risk. However, none have addressed the missing data problem in nutritional epidemiologic research in South America. Missing data is a frequent problem in breast cancer studies and epidemiological settings in general. Estimates of effect obtained from these studies may be biased, if no appropriate method for handling missing data is applied. We performed Multiple Imputation for missing values on covariates in a breast cancer case-control study of $C{\acute{o}}rdoba$ (Argentina) to optimize risk estimates. Data was obtained from a breast cancer case control study from 2008 to 2015 (318 cases, 526 controls). Complete case analysis and multiple imputation using chained equations were the methods applied to estimate the effects of a Traditional dietary pattern and other recognized factors associated with breast cancer. Physical activity and socioeconomic status were imputed. Logistic regression models were performed. When complete case analysis was performed only 31% of women were considered. Although a positive association of Traditional dietary pattern and breast cancer was observed from both approaches (complete case analysis OR=1.3, 95%CI=1.0-1.7; multiple imputation OR=1.4, 95%CI=1.2-1.7), effects of other covariates, like BMI and breastfeeding, were only identified when multiple imputation was considered. A Traditional dietary pattern, BMI and breastfeeding are associated with the occurrence of breast cancer in this Argentinean population when multiple imputation is appropriately performed. Multiple Imputation is suggested in Latin America's epidemiologic studies to optimize effect estimates in the future.

A case study of competing risk analysis in the presence of missing data

  • Limei Zhou;Peter C. Austin;Husam Abdel-Qadir
    • Communications for Statistical Applications and Methods
    • /
    • 제30권1호
    • /
    • pp.1-19
    • /
    • 2023
  • Observational data with missing or incomplete data are common in biomedical research. Multiple imputation is an effective approach to handle missing data with the ability to decrease bias while increasing statistical power and efficiency. In recent years propensity score (PS) matching has been increasingly used in observational studies to estimate treatment effect as it can reduce confounding due to measured baseline covariates. In this paper, we describe in detail approaches to competing risk analysis in the setting of incomplete observational data when using PS matching. First, we used multiple imputation to impute several missing variables simultaneously, then conducted propensity-score matching to match statin-exposed patients with those unexposed. Afterwards, we assessed the effect of statin exposure on the risk of heart failure-related hospitalizations or emergency visits by estimating both relative and absolute effects. Collectively, we provided a general methodological framework to assess treatment effect in incomplete observational data. In addition, we presented a practical approach to produce overall cumulative incidence function (CIF) based on estimates from multiple imputed and PS-matched samples.

Estimation of Log-Odds Ratios for Incomplete $2{\times}2$ Tables with Covariates using FEFI

  • Kang, Shin-Soo;Bae, Je-Min
    • Journal of the Korean Data and Information Science Society
    • /
    • 제18권1호
    • /
    • pp.185-194
    • /
    • 2007
  • The information of covariates are available to do fully efficient fractional imputation(FEFI). The new method, FEFI with logistic regression is proposed to construct complete contingency tables. Jackknife method is used to get a standard errors of log-odds ratio from the completed table by the new method. Simulation results, when covariates have more information about categorical variables, reveal that the new method provides more efficient estimates of log-odds ratio than either multiple imputation(MI) based on data augmentation or complete case analysis.

  • PDF

Sparse Data Cleaning using Multiple Imputations

  • Jun, Sung-Hae;Lee, Seung-Joo;Oh, Kyung-Whan
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • 제4권1호
    • /
    • pp.119-124
    • /
    • 2004
  • Real data as web log file tend to be incomplete. But we have to find useful knowledge from these for optimal decision. In web log data, many useful things which are hyperlink information and web usages of connected users may be found. The size of web data is too huge to use for effective knowledge discovery. To make matters worse, they are very sparse. We overcome this sparse problem using Markov Chain Monte Carlo method as multiple imputations. This missing value imputation changes spare web data to complete. Our study may be a useful tool for discovering knowledge from data set with sparseness. The more sparseness of data in increased, the better performance of MCMC imputation is good. We verified our work by experiments using UCI machine learning repository data.

A Naive Multiple Imputation Method for Ignorable Nonresponse

  • Lee, Seung-Chun
    • Communications for Statistical Applications and Methods
    • /
    • 제11권2호
    • /
    • pp.399-411
    • /
    • 2004
  • A common method of handling nonresponse in sample survey is to delete the cases, which may result in a substantial loss of cases. Thus in certain situation, it is of interest to create a complete set of sample values. In this case, a popular approach is to impute the missing values in the sample by the mean or the median of responders. The difficulty with this method which just replaces each missing value with a single imputed value is that inferences based on the completed dataset underestimate the precision of the inferential procedure. Various suggestions have been made to overcome the difficulty but they might not be appropriate for public-use files where the user has only limited information for about the reasons for nonresponse. In this note, a multiple imputation method is considered to create complete dataset which might be used for all possible inferential procedures without misleading or underestimating the precision.

Multiple imputation inference for stratified random sample with nonignorable nonresponse

  • 신민웅;이상은;이성철;이주영
    • 한국통계학회:학술대회논문집
    • /
    • 한국통계학회 2001년도 추계학술발표회 논문집
    • /
    • pp.191-194
    • /
    • 2001
  • In general, the imputation problems which are caused from survey nonresponse have been studied for being based on ignorable cases. However the model based approach can be applied to survey with nonresponse suspected of being nonignorable. Here in this study, we will make the nonresponse for nonignorable into ignorable cell using adjustment cell approach, then we can applied the ignorable nonresponse method. For data sets of each nonresponse cells are simulated from normal distribution.

  • PDF

다중대체방법을 이용한 구간 중도 경쟁 위험 모형에서의 이표본 검정 (A two-sample test with interval censored competing risk data using multiple imputation)

  • 김유원;김양진
    • 응용통계연구
    • /
    • 제30권2호
    • /
    • pp.233-241
    • /
    • 2017
  • 구간 중도 절단 자료는 관측 연구에서 종종 발생되는 생존 자료의 한 유형으로 관심 있는 사건 발생 시간을 정확하게 관측할 수 없는 대신에 이를 포함한 두 관측 시점으로 구성된다. 본 연구의 목적은 경쟁 위험이 구간 중도 절단 자료에서 발생될 경우, 두 그룹의 누적 발생 함수를 비교하기 위한 검정 통계량을 제시하는 것이다. 특히 본 연구에서는 다중 대체 방법을 통해 생성된 자료를 이용하여 검정력과 유의 수준을 구하고자 한다. 모의실험을 통해 제안한 방법이 다양한 경우에서 적절한 결과를 보이는지 검토하였으며 실제 자료 분석의 예로 남녀 그룹의 HIV 발생 함수의 차이를 비교하기 위해 제안한 방법을 적용하였다.

다중대체와 재현자료 작성 (Multiple imputation and synthetic data)

  • 김정연;박민정
    • 응용통계연구
    • /
    • 제32권1호
    • /
    • pp.83-97
    • /
    • 2019
  • 사회가 발전함에 따라 이용자의 다양한 분석 요구에 대응하기 위해 개인 단위로 구성된 마이크로데이터 제공이 증가했다. 나아가 센서스, 행정자료와 같은 전수자료를 마이크로데이터 형태로 제공받아 연구하고자 하는 요구 역시 커지고 있다. 정책결정, 학술목적 등을 위한 마이크로데이터 분석은 가치 창출 측면에서 대단히 바람직하다. 하지만 자료 유용성이 확보된 마이크로데이터 제공은 개인정보가 노출될 가능성이라는 위험을 가질 수 밖에 없다. 이에, 자료의 유용성을 확보하면서 개인정보보호를 보장할 수 있는 여러 방법들이 고려되어 왔다. 이러한 방법 중 하나로 재현자료(synthetic data)를 생성해서 활용하는 방법이 연구되어 왔다. 본 논문은 재현자료 생성과 관련된 방법론 및 주의사항을 소개하여, 재현자료의 이해를 도모하고자 한다. 이를 위해 재현자료 작성에 필수적인 다중대체, 베이지안 예측 모형 및 베이지안 붓스트랩 등의 개념들을 먼저 설명하고, 완전 재현자료 및 부분 재현자료에 대해 살펴본다. 특히, 재현자료 작성을 심도 깊이 이해하기 위해 순차회귀 다중대체(sequential regression multivariate imputation)를 이용해 경시적(longitudinal) 자료를 재현자료로 작성하는 구체적 사례를 살펴본다.

이상점 영향력 축소를 통한 무응답 대체법 (A Multiple Imputation for Reducing Outlier Effect)

  • 김만겸;신기일
    • 응용통계연구
    • /
    • 제27권7호
    • /
    • pp.1229-1241
    • /
    • 2014
  • 이상점과 무응답이 동시에 존재하는 경우에는 무응답만 있는 경우에 비해 무응답 대체의 성능이 떨어지게 된다. 이러한 경우에는 먼저 이상점을 탐지하고, 탐지된 이상점의 영향력을 축소한 후 무응답 대체를 실시하여야 한다. 본 논문에서는 이상점의 영향력을 축소하여 무응답 대체법의 성능을 향상시키는 방법을 연구하였다. 이를 위해 She and Owen (2011)이 제안한 이상점 탐지법을 살펴보았고, 탐지된 이상점의 영향력을 줄이기 위한 방법으로 흔히 사용되는 가중치 조정법과 이상점 대체법을 살펴보았다. 또한 이상점 처리 방법을 적용한 무응답 대체법을 살펴보았으며 모의실험과 사례분석을 통하여 이상점 영향력 축소 효과를 살펴보았다.