• Title/Summary/Keyword: Missing data

Search Result 1,278, Processing Time 0.024 seconds

Imputation Procedures in Exponential Regression Analysis in the presence of missing values

  • Park, Young-Sool
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 2003.05a
    • /
    • pp.135-144
    • /
    • 2003
  • A data set having missing observations is often completed by using imputed values. In this paper, performances and accuracy of five imputation procedures are evaluated when missing values exist only on the response variable in the exponential regression model. Our simulation results show that adjusted exponential regression imputation procedure can be well used to compensate for missing data, in particular, compared to other imputation procedures. An illustrative example using real data is provided.

  • PDF

A case study of competing risk analysis in the presence of missing data

  • Limei Zhou;Peter C. Austin;Husam Abdel-Qadir
    • Communications for Statistical Applications and Methods
    • /
    • v.30 no.1
    • /
    • pp.1-19
    • /
    • 2023
  • Observational data with missing or incomplete data are common in biomedical research. Multiple imputation is an effective approach to handle missing data with the ability to decrease bias while increasing statistical power and efficiency. In recent years propensity score (PS) matching has been increasingly used in observational studies to estimate treatment effect as it can reduce confounding due to measured baseline covariates. In this paper, we describe in detail approaches to competing risk analysis in the setting of incomplete observational data when using PS matching. First, we used multiple imputation to impute several missing variables simultaneously, then conducted propensity-score matching to match statin-exposed patients with those unexposed. Afterwards, we assessed the effect of statin exposure on the risk of heart failure-related hospitalizations or emergency visits by estimating both relative and absolute effects. Collectively, we provided a general methodological framework to assess treatment effect in incomplete observational data. In addition, we presented a practical approach to produce overall cumulative incidence function (CIF) based on estimates from multiple imputed and PS-matched samples.

A Probabilistic Tensor Factorization approach for Missing Data Inference in Mobile Crowd-Sensing

  • Akter, Shathee;Yoon, Seokhoon
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.13 no.3
    • /
    • pp.63-72
    • /
    • 2021
  • Mobile crowd-sensing (MCS) is a promising sensing paradigm that leverages mobile users with smart devices to perform large-scale sensing tasks in order to provide services to specific applications in various domains. However, MCS sensing tasks may not always be successfully completed or timely completed for various reasons, such as accidentally leaving the tasks incomplete by the users, asynchronous transmission, or connection errors. This results in missing sensing data at specific locations and times, which can degrade the performance of the applications and lead to serious casualties. Therefore, in this paper, we propose a missing data inference approach, called missing data approximation with probabilistic tensor factorization (MDI-PTF), to approximate the missing values as closely as possible to the actual values while taking asynchronous data transmission time and different sensing locations of the mobile users into account. The proposed method first normalizes the data to limit the range of the possible values. Next, a probabilistic model of tensor factorization is formulated, and finally, the data are approximated using the gradient descent method. The performance of the proposed algorithm is verified by conducting simulations under various situations using different datasets.

Estimation in the exponential distribution under progressive Type I interval censoring with semi-missing data

  • Shin, Hyejung;Lee, Kwangho
    • Journal of the Korean Data and Information Science Society
    • /
    • v.23 no.6
    • /
    • pp.1271-1277
    • /
    • 2012
  • In this paper, we propose an estimation method of the parameter in an exponential distribution based on a progressive Type I interval censored sample with semi-missing observation. The maximum likelihood estimator (MLE) of the parameter in the exponential distribution cannot be obtained explicitly because the intervals are not equal in length under the progressive Type I interval censored sample with semi-missing data. To obtain the MLE of the parameter for the sampling scheme, we propose a method by which progressive Type I interval censored sample with semi-missing data is converted to the progressive Type II interval censored sample. Consequently, the estimation procedures in the progressive Type II interval censored sample can be applied and we obtain the MLE of the parameter and survival function. It will be shown that the obtained estimators have good performance in terms of the mean square error (MSE) and mean integrated square error (MISE).

A comparison of imputation methods using machine learning models

  • Heajung Suh;Jongwoo Song
    • Communications for Statistical Applications and Methods
    • /
    • v.30 no.3
    • /
    • pp.331-341
    • /
    • 2023
  • Handling missing values in data analysis is essential in constructing a good prediction model. The easiest way to handle missing values is to use complete case data, but this can lead to information loss within the data and invalid conclusions in data analysis. Imputation is a technique that replaces missing data with alternative values obtained from information in a dataset. Conventional imputation methods include K-nearest-neighbor imputation and multiple imputations. Recent methods include missForest, missRanger, and mixgb ,all which use machine learning algorithms. This paper compares the imputation techniques for datasets with mixed datatypes in various situations, such as data size, missing ratios, and missing mechanisms. To evaluate the performance of each method in mixed datasets, we propose a new imputation performance measure (IPM) that is a unified measurement applicable to numerical and categorical variables. We believe this metric can help find the best imputation method. Finally, we summarize the comparison results with imputation performances and computational times.

Nonstationary Time Series and Missing Data

  • Shin, Dong-Wan;Lee, Oe-Sook
    • The Korean Journal of Applied Statistics
    • /
    • v.23 no.1
    • /
    • pp.73-79
    • /
    • 2010
  • Missing values for unit root processes are imputed by the most recent observations. Treating the imputed observations as if they are complete ones, semiparametric unit root tests are extended to missing value situations. Also, an invariance principle for the partial sum process of the imputed observations is established under some mild conditions, which shows that the extended tests have the same limiting null distributions as those based on complete observations. The proposed tests are illustrated by analyzing an unequally spaced real data set.

Partitioning likelihood method in the analysis of non-monotone missing data

  • Kim Jae-Kwang
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2004.11a
    • /
    • pp.1-8
    • /
    • 2004
  • We address the problem of parameter estimation in multivariate distributions under ignorable non-monotone missing data. The factoring likelihood method for monotone missing data, termed by Robin (1974), is extended to a more general case of non-monotone missing data. The proposed method is algebraically equivalent to the Newton-Raphson method for the observed likelihood, but avoids the burden of computing the first and the second partial derivatives of the observed likelihood Instead, the maximum likelihood estimates and their information matrices for each partition of the data set are computed separately and combined naturally using the generalized least squares method. A numerical example is also presented to illustrate the method.

  • PDF

Application of Multiple Imputation Method in Analyzing Data with Missing Continuous Covariates

  • Ghasemizadeh Tamar, S.;Ganjali, M.
    • The Korean Journal of Applied Statistics
    • /
    • v.21 no.4
    • /
    • pp.659-664
    • /
    • 2008
  • Missing continuous covariates are pervasive in the use of generalized linear models for medical data. Multiple imputation is the most common and easy-to-do method of dealing with missing covariate data. However, there are always serious warnings in using this method. There should be concern to make imputed values more proper. In this paper, proper imputation from posterior predictive distribution is developed for implementing with arbitrary priors. We use empirical distribution of the posterior for approximating the posterior predictive distribution, to sample from it. This method is preferable in comparison with a presented imputation method of us which uses a full model to impute missing values using available software. The proposed methods are implemented on glucocorticoid data.

Imputation Method Using Local Linear Regression Based on Bidirectional k-nearest-components

  • Yonggeol, Lee
    • Journal of information and communication convergence engineering
    • /
    • v.21 no.1
    • /
    • pp.62-67
    • /
    • 2023
  • This paper proposes an imputation method using a bidirectional k-nearest components search based local linear regression method. The bidirectional k-nearest-components search method selects components in the dynamic range from the missing points. Unlike the existing methods, which use a fixed-size window, the proposed method can flexibly select adjacent components in an imputation problem. The weight values assigned to the components around the missing points are calculated using local linear regression. The local linear regression method is free from the rank problem in a matrix of dependent variables. In addition, it can calculate the weight values that reflect the data flow in a specific environment, such as a blackout. The original missing values were estimated from a linear combination of the components and their weights. Finally, the estimated value imputes the missing values. In the experimental results, the proposed method outperformed the existing methods when the error between the original data and imputation data was measured using MAE and RMSE.

Filling Analysis for Missing Turbidity Data in Han River Estuary (한강 하구부에서 결측된 탁도 자료의 보완)

  • Baek, Kyong-Oh;Cho, Hong-Yeon;Lee, Sam-Hee
    • Journal of Korea Water Resources Association
    • /
    • v.39 no.4 s.165
    • /
    • pp.289-298
    • /
    • 2006
  • Turbidity had been measured during five months In Han River estuary at three sites. In this process, missing data occurred due to gauge imitation of the turbidity sensor. A filling method for the missing turbidity data was newly developed in this study. Under the assumption of the time series data with unique period and different amplitudes, the new method can fill the missing data based on the area ratio of each cycle. And the new method was verified through the data set having no missing data. There were little differences between gross area of the original data and that of the revised data by the new method though values of peak were underestimated. As a result, missing turbidity data observed at Han River estuary could be appropriately filled using the new filling method.