• Title/Summary/Keyword: Missing data

Search Result 1,278, Processing Time 0.029 seconds

Symptom Pattern Classification using Neural Networks in the Ubiquitous Healthcare Environment with Missing Values (손실 값을 갖는 유비쿼터스 헬스케어 환경에서 신경망을 이용한 에이전트 기반 증상 패턴 분류)

  • Salvo, Michael Angelo G.;Lee, Jae-Wan;Lee, Mal-Rey
    • Journal of Internet Computing and Services
    • /
    • v.11 no.2
    • /
    • pp.129-142
    • /
    • 2010
  • The ubiquitous healthcare environment is one of the systems that benefit from wireless sensor network. But one of the challenges with wireless sensor network is its high loss rates when transmitting data. Data from the biosensors may not reach the base stations which can result in missing values. This paper proposes the Health Monitor Agent (HMA) to gather data from the base stations, predict missing values, classify symptom patterns into medical conditions, and take appropriate action in case of emergency. This agent is applied in the Ubiquitous Healthcare Environment and uses data from the biosensors and from the patient’s medical history as symptom patterns to recognize medical conditions. In the event of missing data, the HMA uses a predictive algorithm to fill missing values in the symptom patterns before classification. Simulation results show that the predictive algorithm using the HMA makes classification of the symptom patterns more accurate than other methods.

A Study on Shape Variability in Canonical Correlation Biplot with Missing Values (결측값이 있는 정준상관 행렬도의 형상변동 연구)

  • Hong, Hyun-Uk;Choi, Yong-Seok;Shin, Sang-Min;Ka, Chang-Wan
    • The Korean Journal of Applied Statistics
    • /
    • v.23 no.5
    • /
    • pp.955-966
    • /
    • 2010
  • Canonical correlation biplot is a useful biplot for giving a graphical description of the data matrix which consists of the association between two sets of variables, for detecting patterns and displaying results found by more formal methods of analysis. Nevertheless, when some values are missing in data, most biplots are not directly applicable. To solve this problem, we estimate the missing data using the median, mean, EM algorithm and MCMC imputation methods according to missing rates. Even though we estimate the missing values of biplot of incomplete data, we have different shapes of biplots according to the imputation methods and missing rates. Therefore we use a RMS(root mean square) which was proposed by Shin et al. (2007) and PS(procrustes statistic) for measuring and comparing the shape variability between the original biplots and the estimated biplots.

Imputation of Missing SST Observation Data Using Multivariate Bidirectional RNN (다변수 Bidirectional RNN을 이용한 표층수온 결측 데이터 보간)

  • Shin, YongTak;Kim, Dong-Hoon;Kim, Hyeon-Jae;Lim, Chaewook;Woo, Seung-Buhm
    • Journal of Korean Society of Coastal and Ocean Engineers
    • /
    • v.34 no.4
    • /
    • pp.109-118
    • /
    • 2022
  • The data of the missing section among the vertex surface sea temperature observation data was imputed using the Bidirectional Recurrent Neural Network(BiRNN). Among artificial intelligence techniques, Recurrent Neural Networks (RNNs), which are commonly used for time series data, only estimate in the direction of time flow or in the reverse direction to the missing estimation position, so the estimation performance is poor in the long-term missing section. On the other hand, in this study, estimation performance can be improved even for long-term missing data by estimating in both directions before and after the missing section. Also, by using all available data around the observation point (sea surface temperature, temperature, wind field, atmospheric pressure, humidity), the imputation performance was further improved by estimating the imputation data from these correlations together. For performance verification, a statistical model, Multivariate Imputation by Chained Equations (MICE), a machine learning-based Random Forest model, and an RNN model using Long Short-Term Memory (LSTM) were compared. For imputation of long-term missing for 7 days, the average accuracy of the BiRNN/statistical models is 70.8%/61.2%, respectively, and the average error is 0.28 degrees/0.44 degrees, respectively, so the BiRNN model performs better than other models. By applying a temporal decay factor representing the missing pattern, it is judged that the BiRNN technique has better imputation performance than the existing method as the missing section becomes longer.

Using Missing Values in the Model Tree to Change Performance for Predict Cholesterol Levels (모델트리의 결측치 처리 방법에 따른 콜레스테롤수치 예측의 성능 변화)

  • Jung, Yong Gyu;Won, Jae Kang;Sihn, Sung Chul
    • Journal of Service Research and Studies
    • /
    • v.2 no.2
    • /
    • pp.35-43
    • /
    • 2012
  • Data mining is an interest area in all field around us not in any specific areas, which could be used applications in a number of areas heavily. In other words, it is used in the decision-making process, data and correlation analysis in hidden relations, for finding the actionable information and prediction. But some of the data sets contains many missing values in the variables and do not exist a large number of records in the data set. In this paper, missing values are handled in accordance with the model tree algorithm. Cholesterol value is applied for predicting. For the performance analysis, experiments are approached for each treatment. Through this, efficient alternative is presented to apply the missing data.

  • PDF

Likelihood Ratio Criterion for Testing Sphericity from a Multivariate Normal Sample with 2-step Monotone Missing Data Pattern

  • Choi, Byung-Jin
    • Communications for Statistical Applications and Methods
    • /
    • v.12 no.2
    • /
    • pp.473-481
    • /
    • 2005
  • The testing problem for sphericity structure of the covariance matrix in a multivariate normal distribution is introduced when there is a sample with 2-step monotone missing data pattern. The maximum likelihood method is described to estimate the parameters on the basis of the sample. Using these estimates, the likelihood ratio criterion for testing sphericity is derived.

Development and Application of Imputation Technique Based on NPR for Missing Traffic Data (NPR기반 누락 교통자료 추정기법 개발 및 적용)

  • Jang, Hyeon-Ho;Han, Dong-Hui;Lee, Tae-Gyeong;Lee, Yeong-In;Won, Je-Mu
    • Journal of Korean Society of Transportation
    • /
    • v.28 no.3
    • /
    • pp.61-74
    • /
    • 2010
  • ITS (Intelligent transportation systems) collects real-time traffic data, and accumulates vest historical data. But tremendous historical data has not been managed and employed efficiently. With the introduction of data management systems like ADMS (Archived Data Management System), the potentiality of huge historical data dramatically surfs up. However, traffic data in any data management system includes missing values in nature, and one of major obstacles in applying these data has been the missing data because it makes an entire dataset useless every so often. For these reasons, imputation techniques take a key role in data management systems. To address these limitations, this paper presents a promising imputation technique which could be mounted in data management systems and robustly generates the estimations for missing values included in historical data. The developed model, based on NPR (Non-Parametric Regression) approach, employs various traffic data patterns in historical data and is designated for practical requirements such as the minimization of parameters, computational speed, the imputation of various types of missing data, and multiple imputation. The model was tested under the conditions of various missing data types. The results showed that the model outperforms reported existing approaches in the side of prediction accuracy, and meets the computational speed required to be mounted in traffic data management systems.

A Missing Value Replacement Method for Agricultural Meteorological Data Using Bayesian Spatio-Temporal Model (농업기상 결측치 보정을 위한 통계적 시공간모형)

  • Park, Dain;Yoon, Sanghoo
    • Journal of Environmental Science International
    • /
    • v.27 no.7
    • /
    • pp.499-507
    • /
    • 2018
  • Agricultural meteorological information is an important resource that affects farmers' income, food security, and agricultural conditions. Thus, such data are used in various fields that are responsible for planning, enforcing, and evaluating agricultural policies. The meteorological information obtained from automatic weather observation systems operated by rural development agencies contains missing values owing to temporary mechanical or communication deficiencies. It is known that missing values lead to reduction in the reliability and validity of the model. In this study, the hierarchical Bayesian spatio-temporal model suggests replacements for missing values because the meteorological information includes spatio-temporal correlation. The prior distribution is very important in the Bayesian approach. However, we found a problem where the spatial decay parameter was not converged through the trace plot. A suitable spatial decay parameter, estimated on the bias of root-mean-square error (RMSE), which was determined to be the difference between the predicted and observed values. The latitude, longitude, and altitude were considered as covariates. The estimated spatial decay parameters were 0.041 and 0.039, for the spatio-temporal model with latitude and longitude and for latitude, longitude, and altitude, respectively. The posterior distributions were stable after the spatial decay parameter was fixed. root mean square error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), and bias were calculated for model validation. Finally, the missing values were generated using the independent Gaussian process model.

Identification of Differentially Expressed Genes Using Tests Based on Multiple Imputations

  • Kim, Sang Cheol;Yu, Donghyeon
    • Quantitative Bio-Science
    • /
    • v.36 no.1
    • /
    • pp.23-31
    • /
    • 2017
  • Datasets from DNA microarray experiments, which are in the form of large matrices of expression levels of genes, often have missing values. However, the existing statistical methods including the principle components analysis (PCA) and Hotelling's t-test are not directly applicable for the datasets having missing values due to the fact that they assume the observed dataset is complete in general. Many methods have been proposed in previous literature to impute the missing in the observed data. Troyanskaya et al. [1] study the k-nearest neighbor (kNN) imputation, Kim et al. [2] propose the local least squares (LLS) method and Rubin [3] propose the multiple imputation (MI) for missing values. To identify differentially expressed genes, we propose a new testing procedure when the missing exists in the observed data. The proposed procedure uses the Stouffer's z-scores and combines the test results of individual imputed samples, which are dependent to each other. We numerically show that the proposed test procedure based on MI performs better than the existing test procedures based on single imputation (SI) by comparing their ROC curves. We apply the proposed method to analyzing a public microarray data.

A Proposal of an Interpolation Method of Missing Wind Velocity Data in Writing a Typical Weather Data (표준기상데이터 작성 시 누락된 풍속 데이터의 보간 방법 제안)

  • Park, So-Woo;Kim, Joo-wook;Song, Doo-sam
    • Journal of the Korean Solar Energy Society
    • /
    • v.37 no.6
    • /
    • pp.79-91
    • /
    • 2017
  • The meteorological data of 1 hour interval are required to write a typical weather data for building energy simulation. However, many meterological data are missing and the interpolation method to recover the missing data is required. Especially, lots of meterological data are replicated by linear interpolation method because the changes are not significant. While, the wind velocity fluctuates with the time or locations, so linear interpolation method is not appropriate in interpolation of the wind velocity data. In this study, three interpolation methods, using surrounding wind velocity data, Inverse Distance Weighting (IDW), Revised Inverse Distance Weighting (IDW-r), were analyzed considering the characteristics of wind velocity. The Revised Inverse Distance Weighting method, proposed in this study, showed the highest reliability in restoration of the wind velocity data among the analyzed methods.

Recovery the Missing Streamflow Data on River Basin Based on the Deep Neural Network Model

  • Le, Xuan-Hien;Lee, Giha
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2019.05a
    • /
    • pp.156-156
    • /
    • 2019
  • In this study, a gated recurrent unit (GRU) network is constructed based on a deep neural network (DNN) with the aim of restoring the missing daily flow data in river basins. Lai Chau hydrological station is located upstream of the Da river basin (Vietnam) is selected as the target station for this study. Input data of the model are data on observed daily flow for 24 years from 1961 to 1984 (before Hoa Binh dam was built) at 5 hydrological stations, in which 4 gauge stations in the basin downstream and restoring - target station (Lai Chau). The total available data is divided into sections for different purposes. The data set of 23 years (1961-1983) was employed for training and validation purposes, with corresponding rates of 80% for training and 20% for validation respectively. Another data set of one year (1984) was used for the testing purpose to objectively verify the performance and accuracy of the model. Though only a modest amount of input data is required and furthermore the Lai Chau hydrological station is located upstream of the Da River, the calculated results based on the suggested model are in satisfactory agreement with observed data, the Nash - Sutcliffe efficiency (NSE) is higher than 95%. The finding of this study illustrated the outstanding performance of the GRU network model in recovering the missing flow data at Lai Chau station. As a result, DNN models, as well as GRU network models, have great potential for application within the field of hydrology and hydraulics.

  • PDF