• Title/Summary/Keyword: missing data

Search Result 1,265, Processing Time 0.024 seconds

Treatment of Missing Data by Decomposition and Voting with Ordinal Data

  • Chun, Young-M.;Son, Hong-K.;Chung, Sung-S.
    • Journal of the Korean Data and Information Science Society
    • /
    • v.18 no.3
    • /
    • pp.585-598
    • /
    • 2007
  • It is so difficult to get complete data when we conduct a questionaire in actuality. And we get inefficient results if we analyze statistical tests with ignoring missing values. Therefore, we use imputation methods which evaluate quality of data. This study proposes a imputation method by decomposition and voting with ordinal data. First, data are sorted by each variable. After that, imputation methods are used by each decomposition level. And the last step is selection of values with voting. The proposed method is evaluated by accuracy and RMSE. In conclusion, missing values are related to each variable, median imputation method using decomposition and voting is powerful.

  • PDF

Estimating the Total Precipitation Amount with Simulated Precipitation for Ungauged Stations in Jeju Island (미계측 관측 강수 자료 생성을 통한 제주도 지역의 수문총량 추정)

  • Kim, Nam-Won;Um, Myoung-Jin;Chung, Il-Moon;Heo, Jun-Haeng
    • Journal of Korea Water Resources Association
    • /
    • v.45 no.9
    • /
    • pp.875-885
    • /
    • 2012
  • In this study, the total precipitation amount in Jeju Island was estimated with the simulated precipitation for ungauged stations missing precipitation data using the spatial precipitation analysis. The missing data were generated through the modified multiple linear regression in this study, and the analysis of spatial precipitation was conducted with the PRISM(Parameter-elevation Regression on Independent Slope Model). The generated data with modified multiple linear regression model have similar pattern with original data. Thus, the model in this study shows good applicability to estimate the missing data. The difference of annual average precipitation between Case 1 (original data) and Case 2 (modified data) appears very small ratio which is about 1.5%. However, the difference of annual average precipitation according to elevation shows the large ratio up to 37.4%. As the results, the method of estimating missing data in this study would be useful to calculate the total precipitation amount at the low station density area and the places with the high spatial variation of precipitation.

Neighboring Elemental Image Exemplar Based Inpainting for Computational Integral Imaging Reconstruction with Partial Occlusion

  • Ko, Bumseok;Lee, Byung-Gook;Lee, Sukho
    • Journal of the Optical Society of Korea
    • /
    • v.19 no.4
    • /
    • pp.390-396
    • /
    • 2015
  • We propose a partial occlusion removal method for computational integral imaging reconstruction (CIIR) based on the usage of the exemplar based inpainting technique. The proposed method is an improved version of the original linear inpainting based CIIR (LI-CIIR), which uses the inpainting technique to fill in the data missing region. The LI-CIIR shows good results for images which contain objects with smooth surfaces. However, if the object has a textured surface, the result of the LI-CIIR deteriorates, since the linear inpainting cannot recover the textured data in the data missing region well. In this work, we utilize the exemplar based inpainting to fill in the textured data in the data missing region. We call the proposed method the neighboring elemental image exemplar based inpainting (NEI-exemplar inpainting) method, since it uses sources from neighboring elemental images to fill in the data missing region. Furthermore, we also propose an automatic occluding region extraction method based on the use of the mutual constraint using depth estimation (MC-DE) and the level set based bimodal segmentation. Experimental results show the validity of the proposed system.

Handling Incomplete Data Problem in Collaborative Filtering System

  • Noh, Hyun-Ju;Kwak, Min-Jung;Han, In-Goo
    • Journal of Intelligence and Information Systems
    • /
    • v.9 no.2
    • /
    • pp.51-63
    • /
    • 2003
  • Collaborative filtering is one of the methodologies that are most widely used for recommendation system. It is based on a data matrix of each customer's preferences of products. There could be a lot of missing values in such preference data matrix. This incomplete data is one of the reasons to deteriorate the accuracy of recommendation system. There are several treatments to deal with the incomplete data problem such as case deletion and single imputation. Those approaches are simple and easy to implement but they may provide biased results. Multiple imputation method imputes m values for each missing value. It overcomes flaws of single imputation approaches through considering the uncertainty of missing values. The objective of this paper is to suggest multiple imputation-based collaborative filtering approach for recommendation system to improve the accuracy in prediction performance. The experimental works show that the proposed approach provides better performance than the traditional Collaborative filtering approach, especially in case that there are a lot of missing values in dataset used for recommendation system.

  • PDF

Some Considerations on the Problems of PSA(Pulse Sequence Analysis) as a Partial Discharge Analysis Method (부분방전 해석 방법으로 PSA(Pulse Sequence Analysis)의 문제점에 대한 고찰)

  • Kim, Jeong-Tae;Lee, Ho-Keun
    • Proceedings of the Korean Institute of Electrical and Electronic Material Engineers Conference
    • /
    • 2004.11a
    • /
    • pp.327-330
    • /
    • 2004
  • Because of its effectiveness for the PD(partial discharge) pattern recognition, PSA(Pulse Sequence Analysis) has been considered as a new analytic method instead of conventional PRPDA(Phase Resolved Partial Discharge Analysis). However, PSA has a big problem that can misanalyze patterns in case of data missing resulting from poor sensitivity because it analyses the correlation between sequential pulses, which leads to hesitate to apply it to on-site. Therefore, in this paper, the problems of PSA such as data missing and noise adding cases were investigated. For the purpose, PD data obtained from various defects including noise adding data were used and analysed, The result showed that both cases can cause fatal errors in recognizing PD patterns. In case of the data missing, the error depends on the kinds of defect and the degree of degradation. Also, it could be noticed that the error due to adding noises was larger than that due to some data missing.

  • PDF

Some Considerations on the On-site Applicability of PSA(Pulse Sequence Analysis) as a Partial Discharge Analysis Method (부분방전 해석 방법으로 PSA(Pulse Sequence Analysis)의 현장 적용성에 대한 고찰)

  • Kim, Jeong-Tae;Lee, Ho-Keun
    • Journal of the Korean Institute of Electrical and Electronic Material Engineers
    • /
    • v.18 no.5
    • /
    • pp.484-489
    • /
    • 2005
  • Because of its effectiveness for the PD(Partial Discharge) pattern recognition, PSA(Pulse Sequence Analysis) has been considered as a new analytic method instead of conventional PRPDA(Phase Resolved Partial Discharge Analysis). However, it is generally thought that PSA has some possibility to misjudge patterns in case of data-missing resulting from poor sensitivity because it analyses the correlation between sequential pulses, which leads to hesitate to apply it to on-site. Therefore, in this paper, the problems of PSA such as data-missing and noise-adding cases were investigated. for the purpose, PD data obtained from various defects including noise-adding data were used and analyzed. As a result, it was shown that both cases could cause fatal errors in recognizing PD patterns. In case of the data missing, the error was dependant on the kinds of defect and the degree of degradation Also, it could be noticed that the error due to adding noises was larger than that due to some data missing.

The Comparison of Imputation Methods in Space Time Series Data with Missing Values (공간시계열모형의 결측치 추정방법 비교)

  • Lee, Sung-Duck;Kim, Duck-Ki
    • Communications for Statistical Applications and Methods
    • /
    • v.17 no.2
    • /
    • pp.263-273
    • /
    • 2010
  • Missing values in time series can be treated as unknown parameters and estimated by maximum likelihood or as random variables and predicted by the conditional expectation of the unknown values given the data. The purpose of this study is to impute missing values which are regarded as the maximum likelihood estimator and random variable in incomplete data and to compare with two methods using ARMA and STAR model. For illustration, the Mumps data reported from the national capital region monthly over the years 2001~2009 are used, and estimate precision of missing values and forecast precision of future data are compared with two methods.

Asymptotic Test for Dimensionality in Probabilistic Principal Component Analysis with Missing Values

  • Park, Chong-sun
    • Communications for Statistical Applications and Methods
    • /
    • v.11 no.1
    • /
    • pp.49-58
    • /
    • 2004
  • In this talk we proposed an asymptotic test for dimensionality in the latent variable model for probabilistic principal component analysis with missing values at random. Proposed algorithm is a sequential likelihood ratio test for an appropriate Normal latent variable model for the principal component analysis. Modified EM-algorithm is used to find MLE for the model parameters. Results from simulations and real data sets give us promising evidences that the proposed method is useful in finding necessary number of components in the principal component analysis with missing values at random.

A Classifier Capable of Handling Incomplete Data Set (불완전한 데이터를 처리할수 있는 분류기)

  • Lee, Jong-Chan;Lee, Won-Don
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.14 no.1
    • /
    • pp.53-62
    • /
    • 2010
  • This paper introduces a classification algorithm which can be applied to a learning problem with incomplete data sets, missing variable values or a class value. This algorithm uses a data expansion method which utilizes weighted values and probability techniques. It operates by extending a classifier which are considered to be in the optimal projection plane based on Fisher's formula. To do this, some equations are derived from the procedure to be applied to the data expansion. To evaluate the performance of the proposed algorithm, results of different measurements are iteratively compared by choosing one variable in the data set and then modifying the rate of missing and non-missing values in this selected variable. And objective evaluation of data sets can be achieved by comparing, the result of a data set with non-missing variable with that of C4.5 which is a known knowledge acquisition tool in machine learning.

Variance estimation for distribution rate in stratified cluster sampling with missing values

  • Heo, Sunyeong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.2
    • /
    • pp.443-449
    • /
    • 2017
  • Estimation of population proportion like the distribution rate of LED TV and the prevalence of a disease are often estimated based on survey sample data. Population proportion is generally considered as a special form of population mean. In complex sampling like stratified multistage sampling with unequal probability sampling, the denominator of mean may be random variable and it is estimated like ratio estimator. In this research, we examined the estimation of distribution rate based on stratified multistage sampling, and determined some numerical outcomes using stratified random sample data with about 25% of missing observations. In the data used for this research, the survey weight was determined by deterministic way. So, the weights are not random variable, and the population distribution rate and its variance estimator can be estimated like population mean estimation. When the weights are not random variable, if one estimates the variance of proportion estimator using ratio method, then the variances may be inflated. Therefore, in estimating variance for population proportion, we need to examine the structure of data and survey design before making any decision for estimation methods.