• 제목/요약/키워드: Missing data

검색결과 1,278건 처리시간 0.026초

한반도 연안 조위자료의 결측 양상 (Missing Pattern of the Tidal Elevation Data in Korean Coasts)

  • 조홍연;고동휘;정신택
    • 한국해안·해양공학회논문집
    • /
    • 제23권6호
    • /
    • pp.496-501
    • /
    • 2011
  • 우리나라 연안 조위자료의 결측양상을 분석 제시하였다. 결측구간은 결측정보지시행렬을 이용하여 전체적인 결측양상을 파악할 수 있도록 도시하였으며, 시간적 공간적인 결측비율도 분석하여 제시하였다. 전반적으로 조위의 결측비율은 낮은 수준이나, 결측이 특정 조위관측소에 집중되는 경향을 보이고 있다. 또한 연속적인 결측자료 발생간격에 대한 자기상관함수를 분석한 결과, 조위자료의 결측은 무작위적으로 발생하고 있는 것으로는 파악되었다.

Analysis of Incomplete Data with Nonignorable Missing Values

  • 김현정
    • Journal of the Korean Data and Information Science Society
    • /
    • 제13권2호
    • /
    • pp.167-174
    • /
    • 2002
  • In the case of "nonignorable missing data", it is necessary to assume a model dealing with the missing on each situations. In this article, for example, we sometimes meet situations where data set are income amounts in a survey of individuals and assume a model as the values are the larger, a missing data probability is the higher. The method is to maximize using the EM(Expectation and Maximization) algorithm based on the (missing data) mechanism that creates missing data of the case of exponential distribution. The method started from any initial values, and converged in a few iterations. We changed the missing data probability and the artificial data size to show the estimated accuracy. Then we discuss the properties of estimates.

  • PDF

Application of SOLAS to the Multiple Imputation for Missing Data

  • Moon, Sung-Ho;Kim, Hyun-Jeong;Shin, Jae-Kyoung
    • Journal of the Korean Data and Information Science Society
    • /
    • 제14권3호
    • /
    • pp.579-590
    • /
    • 2003
  • When we analyze incomplete data, i.e., data with missing values, we need treatment for the missing values. A common way to deal with this problem is to delete the cases with missing values. Various other methods have been developed. Among them are EM algorithm and regression algorithm which can estimate missing values and impute the missing elements with the estimated values. In this paper, we introduce multiple imputation software SOLAS which generates multiple data sets and imputes with them.

  • PDF

K-nn을 이용한 Hot Deck 기반의 결측치 대체 (Imputation of Missing Data Based on Hot Deck Method Using K-nn)

  • 권순창
    • 한국IT서비스학회지
    • /
    • 제13권4호
    • /
    • pp.359-375
    • /
    • 2014
  • Researchers cannot avoid missing data in collecting data, because some respondents arbitrarily or non-arbitrarily do not answer questions in studies and experiments. Missing data not only increase and distort standard deviations, but also impair the convenience of estimating parameters and the reliability of research results. Despite widespread use of hot deck, researchers have not been interested in it, since it handles missing data in ambiguous ways. Hot deck can be complemented using K-nn, a method of machine learning, which can organize donor groups closest to properties of missing data. Interested in the role of k-nn, this study was conducted to impute missing data based on the hot deck method using k-nn. After setting up imputation of missing data based on hot deck using k-nn as a study objective, deletion of listwise, mean, mode, linear regression, and svm imputation were compared and verified regarding nominal and ratio data types and then, data closest to original values were obtained reasonably. Simulations using different neighboring numbers and the distance measuring method were carried out and better performance of k-nn was accomplished. In this study, imputation of hot deck was re-discovered which has failed to attract the attention of researchers. As a result, this study shall be able to help select non-parametric methods which are less likely to be affected by the structure of missing data and its causes.

머신러닝기반의 데이터 결측 구간의 자동 보정 및 분석 예측 모델에 대한 연구 (A Novel on Auto Imputation and Analysis Prediction Model of Data Missing Scope based on Machine Learning)

  • 정세훈;이한성;김준영;심춘보
    • 한국멀티미디어학회논문지
    • /
    • 제25권2호
    • /
    • pp.257-268
    • /
    • 2022
  • When there is a missing value in the raw data, if ignore the missing values and proceed with the analysis, the accuracy decrease due to the decrease in the number of sample. The method of imputation and analyzing patterns and significant values can compensate for the problem of lower analysis quality and analysis accuracy as a result of bias rather than simply removing missing values. In this study, we proposed to study irregular data patterns and missing processing methods of data using machine learning techniques for the study of correction of missing values. we would like to propose a plan to replace the missing with data from a similar past point in time by finding the situation at the time when the missing data occurred. Unlike previous studies, data correction techniques present new algorithms using DNN and KNN-MLE techniques. As a result of the performance evaluation, the ANAE measurement value compared to the existing missing section correction algorithm confirmed a performance improvement of about 0.041 to 0.321.

분류 성능 향상을 위한 지역적 선형 재구축 기반 결측치 대치 (Missing Value Imputation based on Locally Linear Reconstruction for Improving Classification Performance)

  • 강필성
    • 대한산업공학회지
    • /
    • 제38권4호
    • /
    • pp.276-284
    • /
    • 2012
  • Classification algorithms generally assume that the data is complete. However, missing values are common in real data sets due to various reasons. In this paper, we propose to use locally linear reconstruction (LLR) for missing value imputation to improve the classification performance when missing values exist. We first investigate how much missing values degenerate the classification performance with regard to various missing ratios. Then, we compare the proposed missing value imputation (LLR) with three well-known single imputation methods over three different classifiers using eight data sets. The experimental results showed that (1) any imputation methods, although some of them are very simple, helped to improve the classification accuracy; (2) among the imputation methods, the proposed LLR imputation was the most effective over all missing ratios, and (3) when the missing ratio is relatively high, LLR was outstanding and its classification accuracy was as high as the classification accuracy derived from the compete data set.

Deep learning-based recovery method for missing structural temperature data using LSTM network

  • Liu, Hao;Ding, You-Liang;Zhao, Han-Wei;Wang, Man-Ya;Geng, Fang-Fang
    • Structural Monitoring and Maintenance
    • /
    • 제7권2호
    • /
    • pp.109-124
    • /
    • 2020
  • Benefiting from the massive monitoring data collected by the Structural health monitoring (SHM) system, scholars can grasp the complex environmental effects and structural state during structure operation. However, the monitoring data is often missing due to sensor faults and other reasons. It is necessary to study the recovery method of missing monitoring data. Taking the structural temperature monitoring data of Nanjing Dashengguan Yangtze River Bridge as an example, the long short-term memory (LSTM) network-based recovery method for missing structural temperature data is proposed in this paper. Firstly, the prediction results of temperature data using LSTM network, support vector machine (SVM), and wavelet neural network (WNN) are compared to verify the accuracy advantage of LSTM network in predicting time series data (such as structural temperature). Secondly, the application of LSTM network in the recovery of missing structural temperature data is discussed in detail. The results show that: the LSTM network can effectively recover the missing structural temperature data; incorporating more intact sensor data as input will further improve the recovery effect of missing data; selecting the sensor data which has a higher correlation coefficient with the data we want to recover as the input can achieve higher accuracy.

Comparison of missing data methods in clustered survival data using Bayesian adaptive B-Spline estimation

  • Yoo, Hanna;Lee, Jae Won
    • Communications for Statistical Applications and Methods
    • /
    • 제25권2호
    • /
    • pp.159-172
    • /
    • 2018
  • In many epidemiological studies, missing values in the outcome arise due to censoring. Such censoring is what makes survival analysis special and differentiated from other analytical methods. There are many methods that deal with censored data in survival analysis. However, few studies have dealt with missing covariates in survival data. Furthermore, studies dealing with missing covariates are rare when data are clustered. In this paper, we conducted a simulation study to compare results of several missing data methods when data had clustered multi-structured type with missing covariates. In this study, we modeled unknown baseline hazard and frailty with Bayesian B-Spline to obtain more smooth and accurate estimates. We also used prior information to achieve more accurate results. We assumed the missing mechanism as MAR. We compared the performance of five different missing data techniques and compared these results through simulation studies. We also presented results from a Multi-Center study of Korean IBD patients with Crohn's disease(Lee et al., Journal of the Korean Society of Coloproctology, 28, 188-194, 2012).

arraylmpute: Software for Exploratory Analysis and Imputation of Missing Values for Microarray Data

  • Lee, Eun-Kyung;Yoon, Dan-Kyu;Park, Tae-Sung
    • Genomics & Informatics
    • /
    • 제5권3호
    • /
    • pp.129-132
    • /
    • 2007
  • arraylmpute is a software for exploratory analysis of missing data and imputation of missing values in microarray data. It also provides a comparative analysis of the imputed values obtained from various imputation methods. Thus, it allows the users to choose an appropriate imputation method for microarray data. It is built on R and provides a user-friendly graphical interface. Therefore, the users can easily use arraylmpute to explore, estimate missing data, and compare imputation methods for further analysis.

Veri cation of Improving a Clustering Algorith for Microarray Data with Missing Values

  • Kim, Su-Young
    • 응용통계연구
    • /
    • 제24권2호
    • /
    • pp.315-321
    • /
    • 2011
  • Gene expression microarray data often include multiple missing values. Most gene expression analysis (including gene clustering analysis); however, require a complete data matric as an input. In ordinary clustering methods, just a single missing value makes one abandon the whole data of a gene even if the rest of data for that gene was intact. The quality of analysis may decrease seriously as the missing rate is increased. In the opposite aspect, the imputation of missing value may result in an artifact that reduces the reliability of the analysis. To clarify this contradiction in microarray clustering analysis, this paper compared the accuracy of clustering with and without imputation over several microarray data having different missing rates. This paper also tested the clustering efficiency of several imputation methods including our propose algorithm. The results showed it is worthwhile to check the clustering result in this alternative way without any imputed data for the imperfect microarray data.