• 제목/요약/키워드: missing data

검색결과 1,260건 처리시간 0.029초

Imputation Procedures in Exponential Regression Analysis in the presence of missing values

  • 박영술
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 한국데이터정보과학회 2003년도 춘계학술대회
    • /
    • pp.135-144
    • /
    • 2003
  • A data set having missing observations is often completed by using imputed values. In this paper, performances and accuracy of five imputation procedures are evaluated when missing values exist only on the response variable in the exponential regression model. Our simulation results show that adjusted exponential regression imputation procedure can be well used to compensate for missing data, in particular, compared to other imputation procedures. An illustrative example using real data is provided.

  • PDF

A case study of competing risk analysis in the presence of missing data

  • Limei Zhou;Peter C. Austin;Husam Abdel-Qadir
    • Communications for Statistical Applications and Methods
    • /
    • 제30권1호
    • /
    • pp.1-19
    • /
    • 2023
  • Observational data with missing or incomplete data are common in biomedical research. Multiple imputation is an effective approach to handle missing data with the ability to decrease bias while increasing statistical power and efficiency. In recent years propensity score (PS) matching has been increasingly used in observational studies to estimate treatment effect as it can reduce confounding due to measured baseline covariates. In this paper, we describe in detail approaches to competing risk analysis in the setting of incomplete observational data when using PS matching. First, we used multiple imputation to impute several missing variables simultaneously, then conducted propensity-score matching to match statin-exposed patients with those unexposed. Afterwards, we assessed the effect of statin exposure on the risk of heart failure-related hospitalizations or emergency visits by estimating both relative and absolute effects. Collectively, we provided a general methodological framework to assess treatment effect in incomplete observational data. In addition, we presented a practical approach to produce overall cumulative incidence function (CIF) based on estimates from multiple imputed and PS-matched samples.

A Probabilistic Tensor Factorization approach for Missing Data Inference in Mobile Crowd-Sensing

  • Akter, Shathee;Yoon, Seokhoon
    • International Journal of Internet, Broadcasting and Communication
    • /
    • 제13권3호
    • /
    • pp.63-72
    • /
    • 2021
  • Mobile crowd-sensing (MCS) is a promising sensing paradigm that leverages mobile users with smart devices to perform large-scale sensing tasks in order to provide services to specific applications in various domains. However, MCS sensing tasks may not always be successfully completed or timely completed for various reasons, such as accidentally leaving the tasks incomplete by the users, asynchronous transmission, or connection errors. This results in missing sensing data at specific locations and times, which can degrade the performance of the applications and lead to serious casualties. Therefore, in this paper, we propose a missing data inference approach, called missing data approximation with probabilistic tensor factorization (MDI-PTF), to approximate the missing values as closely as possible to the actual values while taking asynchronous data transmission time and different sensing locations of the mobile users into account. The proposed method first normalizes the data to limit the range of the possible values. Next, a probabilistic model of tensor factorization is formulated, and finally, the data are approximated using the gradient descent method. The performance of the proposed algorithm is verified by conducting simulations under various situations using different datasets.

Estimation in the exponential distribution under progressive Type I interval censoring with semi-missing data

  • Shin, Hyejung;Lee, Kwangho
    • Journal of the Korean Data and Information Science Society
    • /
    • 제23권6호
    • /
    • pp.1271-1277
    • /
    • 2012
  • In this paper, we propose an estimation method of the parameter in an exponential distribution based on a progressive Type I interval censored sample with semi-missing observation. The maximum likelihood estimator (MLE) of the parameter in the exponential distribution cannot be obtained explicitly because the intervals are not equal in length under the progressive Type I interval censored sample with semi-missing data. To obtain the MLE of the parameter for the sampling scheme, we propose a method by which progressive Type I interval censored sample with semi-missing data is converted to the progressive Type II interval censored sample. Consequently, the estimation procedures in the progressive Type II interval censored sample can be applied and we obtain the MLE of the parameter and survival function. It will be shown that the obtained estimators have good performance in terms of the mean square error (MSE) and mean integrated square error (MISE).

A comparison of imputation methods using machine learning models

  • Heajung Suh;Jongwoo Song
    • Communications for Statistical Applications and Methods
    • /
    • 제30권3호
    • /
    • pp.331-341
    • /
    • 2023
  • Handling missing values in data analysis is essential in constructing a good prediction model. The easiest way to handle missing values is to use complete case data, but this can lead to information loss within the data and invalid conclusions in data analysis. Imputation is a technique that replaces missing data with alternative values obtained from information in a dataset. Conventional imputation methods include K-nearest-neighbor imputation and multiple imputations. Recent methods include missForest, missRanger, and mixgb ,all which use machine learning algorithms. This paper compares the imputation techniques for datasets with mixed datatypes in various situations, such as data size, missing ratios, and missing mechanisms. To evaluate the performance of each method in mixed datasets, we propose a new imputation performance measure (IPM) that is a unified measurement applicable to numerical and categorical variables. We believe this metric can help find the best imputation method. Finally, we summarize the comparison results with imputation performances and computational times.

Nonstationary Time Series and Missing Data

  • Shin, Dong-Wan;Lee, Oe-Sook
    • 응용통계연구
    • /
    • 제23권1호
    • /
    • pp.73-79
    • /
    • 2010
  • Missing values for unit root processes are imputed by the most recent observations. Treating the imputed observations as if they are complete ones, semiparametric unit root tests are extended to missing value situations. Also, an invariance principle for the partial sum process of the imputed observations is established under some mild conditions, which shows that the extended tests have the same limiting null distributions as those based on complete observations. The proposed tests are illustrated by analyzing an unequally spaced real data set.

Partitioning likelihood method in the analysis of non-monotone missing data

  • Kim Jae-Kwang
    • 한국통계학회:학술대회논문집
    • /
    • 한국통계학회 2004년도 학술발표논문집
    • /
    • pp.1-8
    • /
    • 2004
  • We address the problem of parameter estimation in multivariate distributions under ignorable non-monotone missing data. The factoring likelihood method for monotone missing data, termed by Robin (1974), is extended to a more general case of non-monotone missing data. The proposed method is algebraically equivalent to the Newton-Raphson method for the observed likelihood, but avoids the burden of computing the first and the second partial derivatives of the observed likelihood Instead, the maximum likelihood estimates and their information matrices for each partition of the data set are computed separately and combined naturally using the generalized least squares method. A numerical example is also presented to illustrate the method.

  • PDF

Application of Multiple Imputation Method in Analyzing Data with Missing Continuous Covariates

  • Ghasemizadeh Tamar, S.;Ganjali, M.
    • 응용통계연구
    • /
    • 제21권4호
    • /
    • pp.659-664
    • /
    • 2008
  • Missing continuous covariates are pervasive in the use of generalized linear models for medical data. Multiple imputation is the most common and easy-to-do method of dealing with missing covariate data. However, there are always serious warnings in using this method. There should be concern to make imputed values more proper. In this paper, proper imputation from posterior predictive distribution is developed for implementing with arbitrary priors. We use empirical distribution of the posterior for approximating the posterior predictive distribution, to sample from it. This method is preferable in comparison with a presented imputation method of us which uses a full model to impute missing values using available software. The proposed methods are implemented on glucocorticoid data.

Imputation Method Using Local Linear Regression Based on Bidirectional k-nearest-components

  • Yonggeol, Lee
    • Journal of information and communication convergence engineering
    • /
    • 제21권1호
    • /
    • pp.62-67
    • /
    • 2023
  • This paper proposes an imputation method using a bidirectional k-nearest components search based local linear regression method. The bidirectional k-nearest-components search method selects components in the dynamic range from the missing points. Unlike the existing methods, which use a fixed-size window, the proposed method can flexibly select adjacent components in an imputation problem. The weight values assigned to the components around the missing points are calculated using local linear regression. The local linear regression method is free from the rank problem in a matrix of dependent variables. In addition, it can calculate the weight values that reflect the data flow in a specific environment, such as a blackout. The original missing values were estimated from a linear combination of the components and their weights. Finally, the estimated value imputes the missing values. In the experimental results, the proposed method outperformed the existing methods when the error between the original data and imputation data was measured using MAE and RMSE.

한강 하구부에서 결측된 탁도 자료의 보완 (Filling Analysis for Missing Turbidity Data in Han River Estuary)

  • 백경오;조홍연;이삼희
    • 한국수자원학회논문집
    • /
    • 제39권4호
    • /
    • pp.289-298
    • /
    • 2006
  • 한강 하구부의 3개 지점에서 수중 계류방식으로 약 5개월에 걸쳐 탁도를 관측하였다. 이 과정에서 관측기기의 한계로 인해 탁도 자료의 결측치가 발생하였고, 이를 효율적으로 보완하기 위해 본 연구에서는 새로운 결측치 보완기법을 개발하였다. 개발된 기법은 시계열 자료가 단일주기와 상이한 진폭을 갖는다는 가정하에, 각 사이클의 면적비율을 통해 결측치를 보완하는 방법이다. 이 기법을 결측되지 않은 정상적인 자료로 검증해 보면, 첨두치가 약간 과소 산정되는 경향이 있으나 총 면적은 보완 전, 후에 거의 차이가 없었다. 따라서 새로운 기법을 바탕으로 한강 하구부에서 관측된 탁도자료의 결측치를 합리적으로 보완할 수 있었다.