• Title/Summary/Keyword: Imputation Method

Search Result 134, Processing Time 0.022 seconds

Comparison of binary data imputation methods in clinical trials (임상시험에서 이분형 결측치 처리방법의 비교연구)

  • An, Koosung;Kim, Dongjae
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.3
    • /
    • pp.539-547
    • /
    • 2016
  • We discussed how to handle missing binary data clinical trials. Patterns of occurring missing data are discussed and introduce missing binary data imputation methods that include the modified method. A simulation is performed by modifying actual data for each method. The condition of this simulation is controlled by a response rate and a missing value rate. We list the simulation results for each method and discussed them at the end of this paper.

Application of NORM to the Multiple Imputation for Multivariate Missing Data

  • Kim, Hyun-Jeong;Moon, Sung-Ho;Shin, Jae-Kyoung
    • Journal of the Korean Data and Information Science Society
    • /
    • v.13 no.2
    • /
    • pp.105-113
    • /
    • 2002
  • The statistical analysis of incomplete data sometimes requires handling of incomplete observations. Towards this end, each case with some missing values generally should be deleted, namely, resulting in only use of non-missing cases. EM algorithm(Dempster et al., 1977) which involves prediction and estimation steps is a general method among others. In this article, we use the free software NORM developed for multiple imputation, which uses DA(Data Augmentation) algorithm in its imputation, and evaluate its efficiency through a numerical example.

  • PDF

Handling Incomplete Data Problem in Collaborative Filtering System

  • Noh, Hyun-ju;Kwak, Min-jung;Han, In-goo
    • Proceedings of the KAIS Fall Conference
    • /
    • 2003.11a
    • /
    • pp.105-110
    • /
    • 2003
  • Collaborative filtering is one of the methodologies that are most widely used for recommendation system. It is based on a data matrix of each customer's preferences of products. There could be a lot of missing values in such preference. data matrix. This incomplete data is one of the reasons to deteriorate the accuracy of recommendation system. Multiple imputation method imputes m values for each missing value. It overcomes flaws of single imputation approaches through considering the uncertainty of missing values.. The objective of this paper is to suggest multiple imputation-based collaborative filtering approach for recommendation system to improve the accuracy in prediction performance. The experimental works show that the proposed approach provides better performance than the traditional Collaborative filtering approach, especially in case that there are a lot of missing values in dataset used for recommendation system.

  • PDF

Handling Incomplete Data Problem in Collaborative Filtering System

  • Noh, Hyun-Ju;Kwak, Min-Jung;Han, In-Goo
    • Journal of Intelligence and Information Systems
    • /
    • v.9 no.2
    • /
    • pp.51-63
    • /
    • 2003
  • Collaborative filtering is one of the methodologies that are most widely used for recommendation system. It is based on a data matrix of each customer's preferences of products. There could be a lot of missing values in such preference data matrix. This incomplete data is one of the reasons to deteriorate the accuracy of recommendation system. There are several treatments to deal with the incomplete data problem such as case deletion and single imputation. Those approaches are simple and easy to implement but they may provide biased results. Multiple imputation method imputes m values for each missing value. It overcomes flaws of single imputation approaches through considering the uncertainty of missing values. The objective of this paper is to suggest multiple imputation-based collaborative filtering approach for recommendation system to improve the accuracy in prediction performance. The experimental works show that the proposed approach provides better performance than the traditional Collaborative filtering approach, especially in case that there are a lot of missing values in dataset used for recommendation system.

  • PDF

A Multiple Imputation for Reducing Outlier Effect (이상점 영향력 축소를 통한 무응답 대체법)

  • Kim, Man-Gyeom;Shin, Key-Il
    • The Korean Journal of Applied Statistics
    • /
    • v.27 no.7
    • /
    • pp.1229-1241
    • /
    • 2014
  • Most of sampling surveys have outliers and non-response missing values simultaneously. In that case, due to the effect of outliers, the result of imputation is not good enough to meet a given precision. To overcome this situation, outlier treatment should be conducted before imputation. In this paper in order for reducing the effect of outlier, we study outlier imputation methods and outlier weight adjustment methods. For the outlier detection, the method suggested by She and Owen (2011) is used. A small simulation study is conducted and for real data analysis, Monthly Labor Statistic and Briquette Consumption Survey Data are used.

A comparison of imputation methods for the consecutive missing temperature data (연속적 결측이 존재하는 기온 자료에 대한 결측복원 기법의 비교)

  • Kim, Hee-Kyung;Kang, In-Kyeong;Lee, Jae-Won;Lee, Yung-Seop
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.3
    • /
    • pp.549-557
    • /
    • 2016
  • Consecutive missing values are likely to occur in long climate data due to system error or defective equipment. Furthermore, it is difficult to impute missing values. However, these complicated problems can be overcame by imputing missing values with reference time series. Reference time series must be composed of similar time series to time series that include missing values. We performed a simulation to compare three missing imputation methods (the adjusted normal ratio method, the regression method and the IDW method) to complete the missing values of time series. A comparison of the three missing imputation methods for the daily mean temperatures at 14 climatological stations indicated that the IDW method was better thanx others at south seaside stations. We also found the regression method was better than others at most stations (except south seaside stations).

Predicting Personal Credit Rating with Incomplete Data Sets Using Frequency Matrix technique (Frequency Matrix 기법을 이용한 결측치 자료로부터의 개인신용예측)

  • Bae, Jae-Kwon;Kim, Jin-Hwa;Hwang, Kook-Jae
    • Journal of Information Technology Applications and Management
    • /
    • v.13 no.4
    • /
    • pp.273-290
    • /
    • 2006
  • This study suggests a frequency matrix technique to predict personal credit rate more efficiently using incomplete data sets. At first this study test on multiple discriminant analysis and logistic regression analysis for predicting personal credit rate with incomplete data sets. Missing values are predicted with mean imputation method and regression imputation method here. An artificial neural network and frequency matrix technique are also tested on their performance in predicting personal credit rating. A data set of 8,234 customers in 2004 on personal credit information of Bank A are collected for the test. The performance of frequency matrix technique is compared with that of other methods. The results from the experiments show that the performance of frequency matrix technique is superior to that of all other models such as MDA-mean, Logit-mean, MDA-regression, Logit-regression, and artificial neural networks.

  • PDF

Modified BLS Weight Adjustment (수정된 BLS 가중치보정법)

  • Park, Jung-Joon;Cho, Ki-Jong;Lee, Sang-Eun;Shin, Key-Il
    • Communications for Statistical Applications and Methods
    • /
    • v.18 no.3
    • /
    • pp.367-376
    • /
    • 2011
  • BLS weight adjustment is a widely used method for business surveys with non-responses and outliers. Recent surveys show that the non-response weight adjustment of the BLS method is the same as the ratio imputation method. In this paper, we suggested a modified BLS weight adjustment method by imputing missing values instead of using weight adjustment for non-response. Monthly labor survey data is used for a small Monte-Carlo simulation and we conclude that the suggested method is superior to the original BLS weight adjustment method.

On the Use of Sequential Adaptive Nearest Neighbors for Missing Value Imputation (순차 적응 최근접 이웃을 활용한 결측값 대치법)

  • Park, So-Hyun;Bang, Sung-Wan;Jhun, Myoung-Shic
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.6
    • /
    • pp.1249-1257
    • /
    • 2011
  • In this paper, we propose a Sequential Adaptive Nearest Neighbor(SANN) imputation method that combines the Adaptive Nearest Neighbor(ANN) method and the Sequential k-Nearest Neighbor(SKNN) method. When choosing the nearest neighbors of missing observations, the proposed SANN method takes the local feature of the missing observations into account as well as reutilizes the imputed observations in a sequential manner. By using a Monte Carlo study and a real data example, we demonstrate the characteristics of the SANN method and its potential performance.

Imputation method for missing data based on clustering and measure of property (군집화 및 특성도를 이용한 결측치 대체 방법)

  • Kim, Sunghyun;Kim, Dongjae
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.1
    • /
    • pp.29-40
    • /
    • 2018
  • There are various reasons for missing values when collecting data. Missing values have some influence on the analysis and results; consequently, various methods of processing missing values have been studied to solve the problem. It is thought that the later point of view may be affected by the initial time point value in the repeated measurement data. However, in the existing method, there was no method for the imputation of missing values using this concept. Therefore, we proposed a new missing value imputation method in this study using clustering in initial time point of the repeated measurement data and the measure of property proposed by Kim and Kim (The Korean Communications in Statistics, 30, 463-473, 2017). We also applied the Monte Carlo simulations to compare the performance of the established method and suggested methods in repeated measurement data.