• Title/Summary/Keyword: Missing data

Search Result 1,286, Processing Time 0.029 seconds

Bootstrap confidence intervals for classification error rate in circular models when a block of observations is missing

  • Chung, Hie-Choon;Han, Chien-Pai
    • Journal of the Korean Data and Information Science Society
    • /
    • v.20 no.4
    • /
    • pp.757-764
    • /
    • 2009
  • In discriminant analysis, we consider a special pattern which contains a block of missing observations. We assume that the two populations are equally likely and the costs of misclassification are equal. In this situation, we consider the bootstrap confidence intervals of the error rate in the circular models when the covariance matrices are equal and not equal.

  • PDF

Simultaneous Approach to Fuzzy Clustering and Quantification of Categorical Data with Missing Values

  • Honda, Katsuhiro;Nakamura, Yoshihito;Ichihashi, Hidetomo
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2003.09a
    • /
    • pp.36-39
    • /
    • 2003
  • This paper proposes a simultaneous application of homogeneity analysis and fuzzy clustering with in complete data. Taking the similarity between the loss of homogeneity in homogeneity analysis and the least squares criterion in principal component analysis into account, the new objective function is defined in a similar formulation to the linear fuzzy clustering with missing values. Numerical experiment shows the characteristic properties of the proposed method.

  • PDF

Treatment of Missing Data by Decomposition and Voting with Ordinal Data

  • Chun, Young-M.;Son, Hong-K.;Chung, Sung-S.
    • Journal of the Korean Data and Information Science Society
    • /
    • v.18 no.3
    • /
    • pp.585-598
    • /
    • 2007
  • It is so difficult to get complete data when we conduct a questionaire in actuality. And we get inefficient results if we analyze statistical tests with ignoring missing values. Therefore, we use imputation methods which evaluate quality of data. This study proposes a imputation method by decomposition and voting with ordinal data. First, data are sorted by each variable. After that, imputation methods are used by each decomposition level. And the last step is selection of values with voting. The proposed method is evaluated by accuracy and RMSE. In conclusion, missing values are related to each variable, median imputation method using decomposition and voting is powerful.

  • PDF

Estimating the Total Precipitation Amount with Simulated Precipitation for Ungauged Stations in Jeju Island (미계측 관측 강수 자료 생성을 통한 제주도 지역의 수문총량 추정)

  • Kim, Nam-Won;Um, Myoung-Jin;Chung, Il-Moon;Heo, Jun-Haeng
    • Journal of Korea Water Resources Association
    • /
    • v.45 no.9
    • /
    • pp.875-885
    • /
    • 2012
  • In this study, the total precipitation amount in Jeju Island was estimated with the simulated precipitation for ungauged stations missing precipitation data using the spatial precipitation analysis. The missing data were generated through the modified multiple linear regression in this study, and the analysis of spatial precipitation was conducted with the PRISM(Parameter-elevation Regression on Independent Slope Model). The generated data with modified multiple linear regression model have similar pattern with original data. Thus, the model in this study shows good applicability to estimate the missing data. The difference of annual average precipitation between Case 1 (original data) and Case 2 (modified data) appears very small ratio which is about 1.5%. However, the difference of annual average precipitation according to elevation shows the large ratio up to 37.4%. As the results, the method of estimating missing data in this study would be useful to calculate the total precipitation amount at the low station density area and the places with the high spatial variation of precipitation.

Neighboring Elemental Image Exemplar Based Inpainting for Computational Integral Imaging Reconstruction with Partial Occlusion

  • Ko, Bumseok;Lee, Byung-Gook;Lee, Sukho
    • Journal of the Optical Society of Korea
    • /
    • v.19 no.4
    • /
    • pp.390-396
    • /
    • 2015
  • We propose a partial occlusion removal method for computational integral imaging reconstruction (CIIR) based on the usage of the exemplar based inpainting technique. The proposed method is an improved version of the original linear inpainting based CIIR (LI-CIIR), which uses the inpainting technique to fill in the data missing region. The LI-CIIR shows good results for images which contain objects with smooth surfaces. However, if the object has a textured surface, the result of the LI-CIIR deteriorates, since the linear inpainting cannot recover the textured data in the data missing region well. In this work, we utilize the exemplar based inpainting to fill in the textured data in the data missing region. We call the proposed method the neighboring elemental image exemplar based inpainting (NEI-exemplar inpainting) method, since it uses sources from neighboring elemental images to fill in the data missing region. Furthermore, we also propose an automatic occluding region extraction method based on the use of the mutual constraint using depth estimation (MC-DE) and the level set based bimodal segmentation. Experimental results show the validity of the proposed system.

Handling Incomplete Data Problem in Collaborative Filtering System

  • Noh, Hyun-Ju;Kwak, Min-Jung;Han, In-Goo
    • Journal of Intelligence and Information Systems
    • /
    • v.9 no.2
    • /
    • pp.51-63
    • /
    • 2003
  • Collaborative filtering is one of the methodologies that are most widely used for recommendation system. It is based on a data matrix of each customer's preferences of products. There could be a lot of missing values in such preference data matrix. This incomplete data is one of the reasons to deteriorate the accuracy of recommendation system. There are several treatments to deal with the incomplete data problem such as case deletion and single imputation. Those approaches are simple and easy to implement but they may provide biased results. Multiple imputation method imputes m values for each missing value. It overcomes flaws of single imputation approaches through considering the uncertainty of missing values. The objective of this paper is to suggest multiple imputation-based collaborative filtering approach for recommendation system to improve the accuracy in prediction performance. The experimental works show that the proposed approach provides better performance than the traditional Collaborative filtering approach, especially in case that there are a lot of missing values in dataset used for recommendation system.

  • PDF

Some Considerations on the Problems of PSA(Pulse Sequence Analysis) as a Partial Discharge Analysis Method (부분방전 해석 방법으로 PSA(Pulse Sequence Analysis)의 문제점에 대한 고찰)

  • Kim, Jeong-Tae;Lee, Ho-Keun
    • Proceedings of the Korean Institute of Electrical and Electronic Material Engineers Conference
    • /
    • 2004.11a
    • /
    • pp.327-330
    • /
    • 2004
  • Because of its effectiveness for the PD(partial discharge) pattern recognition, PSA(Pulse Sequence Analysis) has been considered as a new analytic method instead of conventional PRPDA(Phase Resolved Partial Discharge Analysis). However, PSA has a big problem that can misanalyze patterns in case of data missing resulting from poor sensitivity because it analyses the correlation between sequential pulses, which leads to hesitate to apply it to on-site. Therefore, in this paper, the problems of PSA such as data missing and noise adding cases were investigated. For the purpose, PD data obtained from various defects including noise adding data were used and analysed, The result showed that both cases can cause fatal errors in recognizing PD patterns. In case of the data missing, the error depends on the kinds of defect and the degree of degradation. Also, it could be noticed that the error due to adding noises was larger than that due to some data missing.

  • PDF

Some Considerations on the On-site Applicability of PSA(Pulse Sequence Analysis) as a Partial Discharge Analysis Method (부분방전 해석 방법으로 PSA(Pulse Sequence Analysis)의 현장 적용성에 대한 고찰)

  • Kim, Jeong-Tae;Lee, Ho-Keun
    • Journal of the Korean Institute of Electrical and Electronic Material Engineers
    • /
    • v.18 no.5
    • /
    • pp.484-489
    • /
    • 2005
  • Because of its effectiveness for the PD(Partial Discharge) pattern recognition, PSA(Pulse Sequence Analysis) has been considered as a new analytic method instead of conventional PRPDA(Phase Resolved Partial Discharge Analysis). However, it is generally thought that PSA has some possibility to misjudge patterns in case of data-missing resulting from poor sensitivity because it analyses the correlation between sequential pulses, which leads to hesitate to apply it to on-site. Therefore, in this paper, the problems of PSA such as data-missing and noise-adding cases were investigated. for the purpose, PD data obtained from various defects including noise-adding data were used and analyzed. As a result, it was shown that both cases could cause fatal errors in recognizing PD patterns. In case of the data missing, the error was dependant on the kinds of defect and the degree of degradation Also, it could be noticed that the error due to adding noises was larger than that due to some data missing.

The Comparison of Imputation Methods in Space Time Series Data with Missing Values (공간시계열모형의 결측치 추정방법 비교)

  • Lee, Sung-Duck;Kim, Duck-Ki
    • Communications for Statistical Applications and Methods
    • /
    • v.17 no.2
    • /
    • pp.263-273
    • /
    • 2010
  • Missing values in time series can be treated as unknown parameters and estimated by maximum likelihood or as random variables and predicted by the conditional expectation of the unknown values given the data. The purpose of this study is to impute missing values which are regarded as the maximum likelihood estimator and random variable in incomplete data and to compare with two methods using ARMA and STAR model. For illustration, the Mumps data reported from the national capital region monthly over the years 2001~2009 are used, and estimate precision of missing values and forecast precision of future data are compared with two methods.

Asymptotic Test for Dimensionality in Probabilistic Principal Component Analysis with Missing Values

  • Park, Chong-sun
    • Communications for Statistical Applications and Methods
    • /
    • v.11 no.1
    • /
    • pp.49-58
    • /
    • 2004
  • In this talk we proposed an asymptotic test for dimensionality in the latent variable model for probabilistic principal component analysis with missing values at random. Proposed algorithm is a sequential likelihood ratio test for an appropriate Normal latent variable model for the principal component analysis. Modified EM-algorithm is used to find MLE for the model parameters. Results from simulations and real data sets give us promising evidences that the proposed method is useful in finding necessary number of components in the principal component analysis with missing values at random.