• Title/Summary/Keyword: missing data imputation

Search Result 143, Processing Time 0.022 seconds

Missing Data Imputation Using Permanent Traffic Counts on National Highways (일반국토 상시 교통량자료를 이용한 교통량 결측자료 추정)

  • Ha, Jeong-A;Park, Jae-Hwa;Kim, Seong-Hyeon
    • Journal of Korean Society of Transportation
    • /
    • v.25 no.1 s.94
    • /
    • pp.121-132
    • /
    • 2007
  • Up to now Permanent traffic volumes have been counted by Automatic Vehicle Classification (AVC) on National Highways. When counted data have missing items or errors, the data must be revised to stay statistically reliable This study was carried out to estimate correct data based on outoregression and seasonal AutoRegressive Integrated Moving Average (ARIMA). As a result of verification through seasonal ARIMA, the longer the missed period is, the greater the error. Autoregression results in better verification results than seasonal ARIMA. Traffic data is affected by the present state mote than past patterns. However. autoregression can be applied only to the cases where data include similar neighborhood patterns and even in this case. the data cannot be corrected when data are missing due to low qualify or errors Therefore, these data shoo)d be corrected using past patterns and seasonal ARIMA when the missing data occurs in short periods.

Predicting Personal Credit Rating with Incomplete Data Sets Using Frequency Matrix technique (Frequency Matrix 기법을 이용한 결측치 자료로부터의 개인신용예측)

  • Bae, Jae-Kwon;Kim, Jin-Hwa;Hwang, Kook-Jae
    • Journal of Information Technology Applications and Management
    • /
    • v.13 no.4
    • /
    • pp.273-290
    • /
    • 2006
  • This study suggests a frequency matrix technique to predict personal credit rate more efficiently using incomplete data sets. At first this study test on multiple discriminant analysis and logistic regression analysis for predicting personal credit rate with incomplete data sets. Missing values are predicted with mean imputation method and regression imputation method here. An artificial neural network and frequency matrix technique are also tested on their performance in predicting personal credit rating. A data set of 8,234 customers in 2004 on personal credit information of Bank A are collected for the test. The performance of frequency matrix technique is compared with that of other methods. The results from the experiments show that the performance of frequency matrix technique is superior to that of all other models such as MDA-mean, Logit-mean, MDA-regression, Logit-regression, and artificial neural networks.

  • PDF

Analysis of Data Imputation in Recommender Systems (추천 시스템에서의 데이터 임퓨테이션 분석)

  • Lee, Youngnam;Kim, Sang-Wook
    • Journal of KIISE
    • /
    • v.44 no.12
    • /
    • pp.1333-1337
    • /
    • 2017
  • Recommender systems (RS) that predict a set of items a target user is likely to prefer have been extensively studied in academia and have been aggressively implemented by many companies such as Google, Netflix, eBay, and Amazon. Data imputation alleviates the data sparsity problem occurring in recommender systems by inferring missing ratings and adding them to the original data. In this paper, we point out the drawbacks of existing approaches and make suggestions for data imputation techniques. We also justify our suggestions through extensive experiments.

Comparison of GEE Estimation Methods for Repeated Binary Data with Time-Varying Covariates on Different Missing Mechanisms (시간-종속적 공변량이 포함된 이분형 반복측정자료의 GEE를 이용한 분석에서 결측 체계에 따른 회귀계수 추정방법 비교)

  • Park, Boram;Jung, Inkyung
    • The Korean Journal of Applied Statistics
    • /
    • v.26 no.5
    • /
    • pp.697-712
    • /
    • 2013
  • When analyzing repeated binary data, the generalized estimating equations(GEE) approach produces consistent estimates for regression parameters even if an incorrect working correlation matrix is used. However, time-varying covariates experience larger changes in coefficients than time-invariant covariates across various working correlation structures for finite samples. In addition, the GEE approach may give biased estimates under missing at random(MAR). Weighted estimating equations and multiple imputation methods have been proposed to reduce biases in parameter estimates under MAR. This article studies if the two methods produce robust estimates across various working correlation structures for longitudinal binary data with time-varying covariates under different missing mechanisms. Through simulation, we observe that time-varying covariates have greater differences in parameter estimates across different working correlation structures than time-invariant covariates. The multiple imputation method produces more robust estimates under any working correlation structure and smaller biases compared to the other two methods.

Structural health monitoring data reconstruction of a concrete cable-stayed bridge based on wavelet multi-resolution analysis and support vector machine

  • Ye, X.W.;Su, Y.H.;Xi, P.S.;Liu, H.
    • Computers and Concrete
    • /
    • v.20 no.5
    • /
    • pp.555-562
    • /
    • 2017
  • The accuracy and integrity of stress data acquired by bridge heath monitoring system is of significant importance for bridge safety assessment. However, the missing and abnormal data are inevitably existed in a realistic monitoring system. This paper presents a data reconstruction approach for bridge heath monitoring based on the wavelet multi-resolution analysis and support vector machine (SVM). The proposed method has been applied for data imputation based on the recorded data by the structural health monitoring (SHM) system instrumented on a prestressed concrete cable-stayed bridge. The effectiveness and accuracy of the proposed wavelet-based SVM prediction method is examined by comparing with the traditional autoregression moving average (ARMA) method and SVM prediction method without wavelet multi-resolution analysis in accordance with the prediction errors. The data reconstruction analysis based on 5-day and 1-day continuous stress history data with obvious preternatural signals is performed to examine the effect of sample size on the accuracy of data reconstruction. The results indicate that the proposed data reconstruction approach based on wavelet multi-resolution analysis and SVM is an effective tool for missing data imputation or preternatural signal replacement, which can serve as a solid foundation for the purpose of accurately evaluating the safety of bridge structures.

Approximate moments of a variance estimate with imputed conditional means

  • Kang Woo Ram;Shin Min Woong;Lee Sang Eum
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2001.11a
    • /
    • pp.179-184
    • /
    • 2001
  • Schafer and Shenker(2000) mentioned the one of analytic imputation technique involving conditional means. We derive an approximate moments of a variance estimate with imputed conditional means.

  • PDF

A Multiple Imputation for Reducing Outlier Effect (이상점 영향력 축소를 통한 무응답 대체법)

  • Kim, Man-Gyeom;Shin, Key-Il
    • The Korean Journal of Applied Statistics
    • /
    • v.27 no.7
    • /
    • pp.1229-1241
    • /
    • 2014
  • Most of sampling surveys have outliers and non-response missing values simultaneously. In that case, due to the effect of outliers, the result of imputation is not good enough to meet a given precision. To overcome this situation, outlier treatment should be conducted before imputation. In this paper in order for reducing the effect of outlier, we study outlier imputation methods and outlier weight adjustment methods. For the outlier detection, the method suggested by She and Owen (2011) is used. A small simulation study is conducted and for real data analysis, Monthly Labor Statistic and Briquette Consumption Survey Data are used.

Analysis of the cause-specific proportional hazards model with missing covariates (누락된 공변량을 가진 원인별 비례위험모형의 분석)

  • Minjung Lee
    • The Korean Journal of Applied Statistics
    • /
    • v.37 no.2
    • /
    • pp.225-237
    • /
    • 2024
  • In the analysis of competing risks data, some of covariates may not be fully observed for some subjects. In such cases, excluding subjects with missing covariate values from the analysis may result in biased estimates and loss of efficiency. In this paper, we studied multiple imputation and the augmented inverse probability weighting method for regression parameter estimation in the cause-specific proportional hazards model with missing covariates. The performance of estimators obtained from multiple imputation and the augmented inverse probability weighting method is evaluated by simulation studies, which show that those methods perform well. Multiple imputation and the augmented inverse probability weighting method were applied to investigate significant risk factors for the risk of death from breast cancer and from other causes for breast cancer data with missing values for tumor size obtained from the Prostate, Lung, Colorectal, and Ovarian Cancer Screen Trial Study. Under the cause-specific proportional hazards model, the methods show that race, marital status, stage, grade, and tumor size are significant risk factors for breast cancer mortality, and stage has the greatest effect on increasing the risk of breast cancer death. Age at diagnosis and tumor size have significant effects on increasing the risk of other-cause death.

Enhancement of durability of tall buildings by using deep-learning-based predictions of wind-induced pressure

  • K.R. Sri Preethaa;N. Yuvaraj;Gitanjali Wadhwa;Sujeen Song;Se-Woon Choi;Bubryur Kim
    • Wind and Structures
    • /
    • v.36 no.4
    • /
    • pp.237-247
    • /
    • 2023
  • The emergence of high-rise buildings has necessitated frequent structural health monitoring and maintenance for safety reasons. Wind causes damage and structural changes on tall structures; thus, safe structures should be designed. The pressure developed on tall buildings has been utilized in previous research studies to assess the impacts of wind on structures. The wind tunnel test is a primary research method commonly used to quantify the aerodynamic characteristics of high-rise buildings. Wind pressure is measured by placing pressure sensor taps at different locations on tall buildings, and the collected data are used for analysis. However, sensors may malfunction and produce erroneous data; these data losses make it difficult to analyze aerodynamic properties. Therefore, it is essential to generate missing data relative to the original data obtained from neighboring pressure sensor taps at various intervals. This study proposes a deep learning-based, deep convolutional generative adversarial network (DCGAN) to restore missing data associated with faulty pressure sensors installed on high-rise buildings. The performance of the proposed DCGAN is validated by using a standard imputation model known as the generative adversarial imputation network (GAIN). The average mean-square error (AMSE) and average R-squared (ARSE) are used as performance metrics. The calculated ARSE values by DCGAN on the building model's front, backside, left, and right sides are 0.970, 0.972, 0.984 and 0.978, respectively. The AMSE produced by DCGAN on four sides of the building model is 0.008, 0.010, 0.015 and 0.014. The average standard deviation of the actual measures of the pressure sensors on four sides of the model were 0.1738, 0.1758, 0.2234 and 0.2278. The average standard deviation of the pressure values generated by the proposed DCGAN imputation model was closer to that of the measured actual with values of 0.1736,0.1746,0.2191, and 0.2239 on four sides, respectively. In comparison, the standard deviation of the values predicted by GAIN are 0.1726,0.1735,0.2161, and 0.2209, which is far from actual values. The results demonstrate that DCGAN model fits better for data imputation than the GAIN model with improved accuracy and fewer error rates. Additionally, the DCGAN is utilized to estimate the wind pressure in regions of buildings where no pressure sensor taps are available; the model yielded greater prediction accuracy than GAIN.

Estimating a Binomial Proportion with Bayes Estimated Imputed Conditional Means

  • Shin, Min-Woong;Lee, Sang-Eun
    • Communications for Statistical Applications and Methods
    • /
    • v.9 no.1
    • /
    • pp.63-73
    • /
    • 2002
  • The one of analytic imputation technique involving conditional means was mentioned by Schafer and Schenker(2000). And their derivations are based on asymptotic expansions of point estimator and their associated variance estimator, and the result of imputation can be thought of as first-order approximations to the estimators. Specially in this paper, we are presenting the method of estimating a Binomial proportion with Bayesian approach of imputed conditional means. That is, instead of using maximum likelihood(ML) estimator to estimate a Binomial proportion, in general, we use the Bayesian estimators and will show the result of estimated Imputed conditional means.