• Title/Summary/Keyword: Missing value

Search Result 312, Processing Time 0.024 seconds

Comparision of Missing Imputaion Methods In fine dust data (미세먼지 자료에서의 결측치 대체 방법 비교)

  • Kim, YeonJin;Park, HeonJin
    • The Journal of Bigdata
    • /
    • v.4 no.2
    • /
    • pp.105-114
    • /
    • 2019
  • Missing value replacement is one of the big issues in data analysis. If you ignore the occurrence of the missing value and proceed with the analysis, a bias can occur and give incorrect results for the estimate. In this paper, we need to find and apply an appropriate alternative to missing data from weather data. Through this, we attempted to clarify and compare the simulations for various situations using existing methods such as MICE and MissForest based on R and time series-based models. When comparing these results with each variable, it was determined that the kalman filter of the auto arima model using the ImputeTS package and the MissForest model gave good results in the weather data.

  • PDF

Using Missing Values in the Model Tree to Change Performance for Predict Cholesterol Levels (모델트리의 결측치 처리 방법에 따른 콜레스테롤수치 예측의 성능 변화)

  • Jung, Yong Gyu;Won, Jae Kang;Sihn, Sung Chul
    • Journal of Service Research and Studies
    • /
    • v.2 no.2
    • /
    • pp.35-43
    • /
    • 2012
  • Data mining is an interest area in all field around us not in any specific areas, which could be used applications in a number of areas heavily. In other words, it is used in the decision-making process, data and correlation analysis in hidden relations, for finding the actionable information and prediction. But some of the data sets contains many missing values in the variables and do not exist a large number of records in the data set. In this paper, missing values are handled in accordance with the model tree algorithm. Cholesterol value is applied for predicting. For the performance analysis, experiments are approached for each treatment. Through this, efficient alternative is presented to apply the missing data.

  • PDF

HANDLING MISSING VALUES IN FUZZY c-MEANS

  • Miyamoto, Sadaaki;Takata, Osamu;Unayahara, Kazutaka
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 1998.06a
    • /
    • pp.139-142
    • /
    • 1998
  • Missing values in data for fuzzy c-menas clustering is discussed. Two basic methods of fuzzy c-means, i.e., the standard fuzzy c-means and the entropy method are considered and three options of handling missing values are proposed, among which one is to define a new distance between data with missing values, second is to alter a weight in the new distance, and the third is to fill the missing values by an appropriate numbers. Experimental Results are shown.

  • PDF

A Study on the Sensing System Construction of a Missing Roadbed (철도 노반유실검지시스템 구축에 관한 연구)

  • Kim, Ki-Young;Kang, Kyung-Sik
    • Proceedings of the Safety Management and Science Conference
    • /
    • 2009.11a
    • /
    • pp.461-470
    • /
    • 2009
  • A railroad has a benefit of the mass transportation of a passenger and cargo, but just a time of accident could cause a huge loss of a human life and property. Especially, a typhoon and a localized torrential downpour usually happened in summer season have caused average 38.29 times of the missing roadbed which support the railroad in recent 7 years. If a train would pass on this railroad which the roadbed was missed, there could be a huge accident and many people will die. But, the security issue is not satisfied because the method of sensing the missing roadbed is depending solely on the naked eye inspection by a person in charge. So, in this study, I would like to suggest the missing roadbed real-time sensing and train operation system to reduce the possibility of the railroad accident by controlling the operation of train when the missing roadbed condition would be sensed in the real-time system.

  • PDF

Statistical Methods for Multivariate Missing Data in Health Survey Research (보건조사연구에서 다변량결측치가 내포된 자료를 효율적으로 분석하기 위한 통계학적 방법)

  • Kim, Dong-Kee;Park, Eun-Cheol;Sohn, Myong-Sei;Kim, Han-Joong;Park, Hyung-Uk;Ahn, Chae-Hyung;Lim, Jong-Gun;Song, Ki-Jun
    • Journal of Preventive Medicine and Public Health
    • /
    • v.31 no.4 s.63
    • /
    • pp.875-884
    • /
    • 1998
  • Missing observations are common in medical research and health survey research. Several statistical methods to handle the missing data problem have been proposed. The EM algorithm (Expectation-Maximization algorithm) is one of the ways of efficiently handling the missing data problem based on sufficient statistics. In this paper, we developed statistical models and methods for survey data with multivariate missing observations. Especially, we adopted the EM algorithm to handle the multivariate missing observations. We assume that the multivariate observations follow a multivariate normal distribution, where the mean vector and the covariance matrix are primarily of interest. We applied the proposed statistical method to analyze data from a health survey. The data set we used came from a physician survey on Resource-Based Relative Value Scale(RBRVS). In addition to the EM algorithm, we applied the complete case analysis, which uses only completely observed cases, and the available case analysis, which utilizes all available information. The residual and normal probability plots were evaluated to access the assumption of normality. We found that the residual sum of squares from the EM algorithm was smaller than those of the complete-case and the available-case analyses.

  • PDF

The Research fur Prediction of Missing Value in Collaborative Filtering (협력적 여과(Collaborative Filtering)에서 결측치(Missing Value) 예측에 관한 연구)

  • 황철현;박영길;박용준
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2000.11a
    • /
    • pp.333-337
    • /
    • 2000
  • 성공적인 사이트를 위한 필수적인 요소로 각광받고 있는 collaborative filtering 기술은 정보의 과부하를 줄일 수 있고 고객에 대한 충성도를 높여주는 효과로 인해 많은 사이트에 적용되어 운용되고 있다. 이 논문에서는 collaborative filtering 적용 포기에 발생하는 정보의 부족으로 인한 정확도 저하를 막기 위해 상품간 연관성을 이용한 결측티 예측 방안을 제안한다.

  • PDF

On the Use of Weighted k-Nearest Neighbors for Missing Value Imputation (Weighted k-Nearest Neighbors를 이용한 결측치 대치)

  • Lim, Chanhui;Kim, Dongjae
    • The Korean Journal of Applied Statistics
    • /
    • v.28 no.1
    • /
    • pp.23-31
    • /
    • 2015
  • A conventional missing value problem in the statistical analysis k-Nearest Neighbor(KNN) method are used for a simple imputation method. When one of the k-nearest neighbors is an extreme value or outlier, the KNN method can create a bias. In this paper, we propose a Weighted k-Nearest Neighbors(WKNN) imputation method that can supplement KNN's faults. A Monte-Carlo simulation study is also adapted to compare the WKNN method and KNN method using real data set.

Imputation Method Using Local Linear Regression Based on Bidirectional k-nearest-components

  • Yonggeol, Lee
    • Journal of information and communication convergence engineering
    • /
    • v.21 no.1
    • /
    • pp.62-67
    • /
    • 2023
  • This paper proposes an imputation method using a bidirectional k-nearest components search based local linear regression method. The bidirectional k-nearest-components search method selects components in the dynamic range from the missing points. Unlike the existing methods, which use a fixed-size window, the proposed method can flexibly select adjacent components in an imputation problem. The weight values assigned to the components around the missing points are calculated using local linear regression. The local linear regression method is free from the rank problem in a matrix of dependent variables. In addition, it can calculate the weight values that reflect the data flow in a specific environment, such as a blackout. The original missing values were estimated from a linear combination of the components and their weights. Finally, the estimated value imputes the missing values. In the experimental results, the proposed method outperformed the existing methods when the error between the original data and imputation data was measured using MAE and RMSE.

A Research for Imputation Method of Photovoltaic Power Missing Data to Apply Time Series Models (태양광 발전량 데이터의 시계열 모델 적용을 위한 결측치 보간 방법 연구)

  • Jeong, Ha-Young;Hong, Seok-Hoon;Jeon, Jae-Sung;Lim, Su-Chang;Kim, Jong-Chan;Park, Chul-Young
    • Journal of Korea Multimedia Society
    • /
    • v.24 no.9
    • /
    • pp.1251-1260
    • /
    • 2021
  • This paper discusses missing data processing using simple moving average (SMA) and kalman filter. Also SMA and kalman predictive value are made a comparative study. Time series analysis is a generally method to deals with time series data in photovoltaic field. Photovoltaic system records data irregularly whenever the power value changes. Irregularly recorded data must be transferred into a consistent format to get accurate results. Missing data results from the process having same intervals. For the reason, it was imputed using SMA and kalman filter. The kalman filter has better performance to observed data than SMA. SMA graph is stepped line graph and kalman filter graph is a smoothing line graph. MAPE of SMA prediction is 0.00737%, MAPE of kalman prediction is 0.00078%. But time complexity of SMA is O(N) and time complexity of kalman filter is O(D2) about D-dimensional object. Accordingly we suggest that you pick the best way considering computational power.