• Title/Summary/Keyword: Missing data

Search Result 1,303, Processing Time 0.037 seconds

STRONG CONSISTENCY FOR AR MODEL WITH MISSING DATA

  • Lee, Myung-Sook
    • Journal of the Korean Mathematical Society
    • /
    • v.41 no.6
    • /
    • pp.1071-1086
    • /
    • 2004
  • This paper is concerned with the strong consistency of the estimators of the autocovariance function and the spectral density function for the autoregressive process in the case where only an amplitude modulated process with missing data is observed. These results will give a simple and practical sufficient condition for the strong consistency of those estimators. Finally, some examples are given to illustrate the application of main result.

A Modified Grey-Based k-NN Approach for Treatment of Missing Value

  • Chun, Young-M.;Lee, Joon-W.;Chung, Sung-S.
    • Journal of the Korean Data and Information Science Society
    • /
    • v.17 no.2
    • /
    • pp.421-436
    • /
    • 2006
  • Huang proposed a grey-based nearest neighbor approach to predict accurately missing attribute value in 2004. Our study proposes which way to decide the number of nearest neighbors using not only the deng's grey relational grade but also the wen's grey relational grade. Besides, our study uses not an arithmetic(unweighted) mean but a weighted one. Also, GRG is used by a weighted value when we impute missing values. There are four different methods - DU, DW, WU, WW. The performance of WW(Wen's GRG & weighted mean) method is the best of any other methods. It had been proven by Huang that his method was much better than mean imputation method and multiple imputation method. The performance of our study is far superior to that of Huang.

  • PDF

Discriminant Analysis under a Patterned Missing Values

  • Kim, Hea-Jung
    • Journal of the Korean Statistical Society
    • /
    • v.18 no.1
    • /
    • pp.13-25
    • /
    • 1989
  • This paper suggests a classification rule with unequal covariance matrices when a patterned incomplete data are involved in the discriminant analysis. This is an extension of Geisser's (1966) result to the case of missing observations. For the calssificaiton rule, we introduce an algorithm which contains data augmentation step and Monte Carlo integration step and show that the algorithm yields a consistant estimator of true classification probability. The proposed method is compared to the complete observation vector method through a Monte Carlo study. The results show that the suggested method, in general, performs better than the complete observation vector method which ignores those vectors of observation with one or more missing values from the analysis. The results also verify the consistency of the algorithm.

  • PDF

Correction Technique of Missing Load Data Using ARIMA Model and Piecewise Cubic Interpolation (ARIMA 모형과 Piecewise Cubic interpolation을 이용한 누락된 수요실적자료의 보정기법)

  • Lee, J.Y.;Lee, C.J.;Park, J.B.;Shin, J.R.;Kim, S.S.
    • Proceedings of the KIEE Conference
    • /
    • 2003.07a
    • /
    • pp.83-85
    • /
    • 2003
  • This paper presents a correction technique of missing load data. In this paper, the ARIMA(Autoregressive Integrated Moving Average) model and Piecewise Cubic Interpolation are applied to seek the missing parameters. The new model has been tested under a variety of conditions and it is shown in this paper to produce excellent results. It is helpful for operators to designed the load duration curve.

  • PDF

The Development of Genetic Fuzzy System for Estimating Link Traveling Speed (주행속도 추정을 위한 Genetic Fuzzy System의 개발)

  • Youn, Yeo-Hun;Lee, Hong-Chul;Kim, Yong-Sik
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.29 no.1
    • /
    • pp.32-40
    • /
    • 2003
  • In this study, we develop the Genetic Fuzzy System(GFS) to estimate the link traveling speed. Based on the genetic algorithm, we can get the fuzzy rules and membership functions that reflect more accurate correlation between traffic data and speed. From the fact that there exist missing links that lack traffic data, we added a Case Base Reasoning(CBR) to GFS to support estimating the speed of missing links. The case base stores the fuzzy rules and membership functions as its instances. As cases are accumulated, the case base comes to offer appropriate cases to missing links. Experiments show that the proposed GFS provides the more accurate estimation of link traveling speed than existing methods.

Missing Values Estimation for Time Course Gene Expression Data Using the Sequential Partial Least Squares Regression Fitting (순차적 부분최소제곱 회귀적합에 의한 시간경로 유전자 발현 자료의 결측치 추정)

  • Kim, Kyung-Sook;Oh, Mi-Ra;Baek, Jang-Sun;Son, Young-Sook
    • The Korean Journal of Applied Statistics
    • /
    • v.21 no.2
    • /
    • pp.275-290
    • /
    • 2008
  • The size of microarray gene expression data is very big and its observation process is also very complex. Thus missing values are frequently occurred. In this paper we propose the sequential partial least squares(SPLS) regression fitting method to estimate missing values for time course gene expression data that has correlations among observations over time points. The SPLS method is to combine the sequential technique with the partial least squares(PLS) regression fitting method. The usefulness of method proposed is evaluated through some simulation study for three yeast time course data.

Variational Mode Decomposition with Missing Data (결측치가 있는 자료에서의 변동모드분해법)

  • Choi, Guebin;Oh, Hee-Seok;Lee, Youngjo;Kim, Donghoh;Yu, Kyungsang
    • The Korean Journal of Applied Statistics
    • /
    • v.28 no.2
    • /
    • pp.159-174
    • /
    • 2015
  • Dragomiretskiy and Zosso (2014) developed a new decomposition method, termed variational mode decomposition (VMD), which is efficient for handling the tone detection and separation of signals. However, VMD may be inefficient in the presence of missing data since it is based on a fast Fourier transform (FFT) algorithm. To overcome this problem, we propose a new approach based on a novel combination of VMD and hierarchical (or h)-likelihood method. The h-likelihood provides an effective imputation methodology for missing data when VMD decomposes the signal into several meaningful modes. A simulation study and real data analysis demonstrates that the proposed method can produce substantially effective results.

Recovering Incomplete Data using Tucker Model for Tensor with Low-n-rank

  • Thieu, Thao Nguyen;Yang, Hyung-Jeong;Vu, Tien Duong;Kim, Sun-Hee
    • International Journal of Contents
    • /
    • v.12 no.3
    • /
    • pp.22-28
    • /
    • 2016
  • Tensor with missing or incomplete values is a ubiquitous problem in various fields such as biomedical signal processing, image processing, and social network analysis. In this paper, we considered how to reconstruct a dataset with missing values by using tensor form which is called tensor completion process. We applied Tucker factorization to solve tensor completion which was built base on optimization problem. We formulated the optimization objective function using components of Tucker model after decomposing. The weighted least square matric contained only known values of the tensor with low rank in its modes. A first order optimization method, namely Nonlinear Conjugated Gradient, was applied to solve the optimization problem. We demonstrated the effectiveness of the proposed method in EEG signals with about 70% missing entries compared to other algorithms. The relative error was proposed to compare the difference between original tensor and the process output.

Filling of Incomplete Rainfall Data Using Fuzzy-Genetic Algorithm (퍼지-유전자 알고리즘을 이용한 결측 강우량의 보정)

  • Kim, Do Jin;Jang, Dae Won;Seoh, Byung Ha;Kim, Hung Soo
    • Journal of Wetlands Research
    • /
    • v.7 no.4
    • /
    • pp.97-107
    • /
    • 2005
  • As the distributed model is developed and widely used, the accuracy of a rainfall measurement and more dense rainfall observation network are required for the reflection of various spatial properties. However, in reality, it is not easy to get the accurate data from dense network. Generally, we could not have the proper rainfall gages in space and even we have proper network for rainfall gages it is not easy to reflect the variations of rainfall in space and time. Often, we do also have missing rainfall data at the rainfall gage stations due to various reasons. We estimate the distribution of mean areal rainfall data from the point rainfalls. So, in the aspect of continuous rainfall property in time, we should fill the missing rainfall data then we can represent the spatial distribution of rainfall data. This study uses the Fuzzy-Genetic algorithm as a interpolation method for filling the missing rainfall data. We compare the Fuzzy-Genetic algorithm with arithmetic average method, inverse distance method, normal ratio method, and ratio of distance and elevation method which are widely used previously. As the results, the previous methods showed the accuracy of 70 to 80 % but the Fuzzy-Genetic algorithm showed that of 90 %. Especially, from the sensitivity analysis, we suggest the values of power in the equation for filling the missing data according to the distance and elevation.

  • PDF

Application of DINEOF to Reconstruct the Missing Data from GOCI Chlorophyll-a (GOCI Chlorophyll-a 결측 자료의 복원을 위한 DINEOF 방법 적용)

  • Hwang, Do-Hyun;Jung, Hahn Chul;Ahn, Jae-Hyun;Choi, Jong-Kuk
    • Korean Journal of Remote Sensing
    • /
    • v.37 no.6_1
    • /
    • pp.1507-1515
    • /
    • 2021
  • If chlorophyll-a is estimated through ocean color remote sensing, it is able to understand the global distribution of phytoplankton and primary production. However, there are missing data in the ocean color observed from the satellites due to the clouds or weather conditions. In thisstudy, the missing data of the GOCI (Geostationary Ocean Color Imager) chlorophyll-a product wasreconstructed by using DINEOF (Data INterpolation Empirical Orthogonal Functions). DINEOF reconstructs the missing data based on spatio-temporal data, and the accuracy was cross-verified by removing a part of the GOCI chlorophyll-a image and comparing it with the reconstructed image. In the study area, the optimal EOF (Empirical Orthogonal Functions) mode for DINEOF wasin 10-13. The temporal and spatialreconstructed data reflected the increasing chlorophyll-a concentration in the afternoon, and the noise of outliers was filtered. Therefore, it is expected that DINEOF is useful to reconstruct the missing images, also it is considered that it is able to use as basic data for monitoring the ocean environment.