• Title/Summary/Keyword: incomplete data

Search Result 721, Processing Time 0.032 seconds

Pre-Adjustment of Incomplete Group Variable via K-Means Clustering

  • Hwang, S.Y.;Hahn, H.E.
    • Journal of the Korean Data and Information Science Society
    • /
    • v.15 no.3
    • /
    • pp.555-563
    • /
    • 2004
  • In classification and discrimination, we often face with incomplete group variable arising typically from many missing values and/or incredible cases. This paper suggests the use of K-means clustering for pre-adjusting incompleteness and in turn classification based on generalized statistical distance is performed. For illustrating the proposed procedure, simulation study is conducted comparatively with CART in data mining and traditional techniques which are ignoring incompleteness of group variable. Simulation study manifests that our methodology out-performs.

  • PDF

Algorithms for Handling Incomplete Data in SVM and Deep Learning (SVM과 딥러닝에서 불완전한 데이터를 처리하기 위한 알고리즘)

  • Lee, Jong-Chan
    • Journal of the Korea Convergence Society
    • /
    • v.11 no.3
    • /
    • pp.1-7
    • /
    • 2020
  • This paper introduces two different techniques for dealing with incomplete data and algorithms for learning this data. The first method is to process the incomplete data by assigning the missing value with equal probability that the missing variable can have, and learn this data with the SVM. This technique ensures that the higher the frequency of missing for any variable, the higher the entropy so that it is not selected in the decision tree. This method is characterized by ignoring all remaining information in the missing variable and assigning a new value. On the other hand, the new method is to calculate the entropy probability from the remaining information except the missing value and use it as an estimate of the missing variable. In other words, using a lot of information that is not lost from incomplete learning data to recover some missing information and learn using deep learning. These two methods measure performance by selecting one variable in turn from the training data and iteratively comparing the results of different measurements with varying proportions of data lost in the variable.

Bayesian Prediction Analysis for the Exponential Model Under the Censored Sample with Incomplete Information

  • Kim, Yeung-Hoon;Ko, Jeong-Hwan
    • Journal of the Korean Data and Information Science Society
    • /
    • v.13 no.1
    • /
    • pp.139-145
    • /
    • 2002
  • This paper deals with the problem of obtaining the Bayesian predictive density function and the prediction intervals for a future observation and the p-th order statistics of n future observations for the exponential model under the censored sampling with incomplete information.

  • PDF

A data extension technique to handle incomplete data (불완전한 데이터를 처리하기 위한 데이터 확장기법)

  • Lee, Jong Chan
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.2
    • /
    • pp.7-13
    • /
    • 2021
  • This paper introduces an algorithm that compensates for missing values after converting them into a format that can represent the probability for incomplete data including missing values in training data. In the previous method using this data conversion, incomplete data was processed by allocating missing values with an equal probability that missing variables can have. This method applied to many problems and obtained good results, but it was pointed out that there is a loss of information in that all information remaining in the missing variable is ignored and a new value is assigned. On the other hand, in the new proposed method, only complete information not including missing values is input into the well-known classification algorithm (C4.5), and the decision tree is constructed during learning. Then, the probability of the missing value is obtained from this decision tree and assigned as an estimated value of the missing variable. That is, some lost information is recovered using a lot of information that has not been lost from incomplete learning data.

Handling Incomplete Data Problem in Collaborative Filtering System

  • Noh, Hyun-Ju;Kwak, Min-Jung;Han, In-Goo
    • Journal of Intelligence and Information Systems
    • /
    • v.9 no.2
    • /
    • pp.51-63
    • /
    • 2003
  • Collaborative filtering is one of the methodologies that are most widely used for recommendation system. It is based on a data matrix of each customer's preferences of products. There could be a lot of missing values in such preference data matrix. This incomplete data is one of the reasons to deteriorate the accuracy of recommendation system. There are several treatments to deal with the incomplete data problem such as case deletion and single imputation. Those approaches are simple and easy to implement but they may provide biased results. Multiple imputation method imputes m values for each missing value. It overcomes flaws of single imputation approaches through considering the uncertainty of missing values. The objective of this paper is to suggest multiple imputation-based collaborative filtering approach for recommendation system to improve the accuracy in prediction performance. The experimental works show that the proposed approach provides better performance than the traditional Collaborative filtering approach, especially in case that there are a lot of missing values in dataset used for recommendation system.

  • PDF

A case study of competing risk analysis in the presence of missing data

  • Limei Zhou;Peter C. Austin;Husam Abdel-Qadir
    • Communications for Statistical Applications and Methods
    • /
    • v.30 no.1
    • /
    • pp.1-19
    • /
    • 2023
  • Observational data with missing or incomplete data are common in biomedical research. Multiple imputation is an effective approach to handle missing data with the ability to decrease bias while increasing statistical power and efficiency. In recent years propensity score (PS) matching has been increasingly used in observational studies to estimate treatment effect as it can reduce confounding due to measured baseline covariates. In this paper, we describe in detail approaches to competing risk analysis in the setting of incomplete observational data when using PS matching. First, we used multiple imputation to impute several missing variables simultaneously, then conducted propensity-score matching to match statin-exposed patients with those unexposed. Afterwards, we assessed the effect of statin exposure on the risk of heart failure-related hospitalizations or emergency visits by estimating both relative and absolute effects. Collectively, we provided a general methodological framework to assess treatment effect in incomplete observational data. In addition, we presented a practical approach to produce overall cumulative incidence function (CIF) based on estimates from multiple imputed and PS-matched samples.

Model Updating Method Based on Mode Decoupling Controller with Incomplete Modal Data (불완전 모달 정보를 이용한 모드 분리 제어기 기반의 모델 개선법)

  • Ha, Jae-Hoon;Park, Youn-Sik;Park, Young-Jin
    • Proceedings of the Korean Society for Noise and Vibration Engineering Conference
    • /
    • 2005.11a
    • /
    • pp.963-966
    • /
    • 2005
  • Model updating method is known to the area to correct finite element models by the results of the experimental modal analysis. Most common methods in model updating depend on a parametric model of the structure. In this case, the number of parameters is normally smaller than that of modal data obtained from an experiment. In order to overcome this limitation, many researchers are trying to get modal data as many as possible to date. 1 want to name this method multiple modified-system generation method. These Methods consist of direct system modification method and feedback controller method. The direct system modification Is to add a mass or stiffness on the original structure or perturb the boundary conditions. The feedback controller method is to make the closed food system with sensor and actuator so as to get the closed loop modal data. In this paper, we need to focus on the feedback controller method because of its simplicity. Several methods related the feedback controller methods are virtual passive controller (VPC) sensitivity enhancement controller (SEC) and mode decoupling controller (MDC). Among them, we will apply MDC to the model updating problem. MDC has various advantages compared with other controllers, such as VPC and SEC. To begin with, only the target mode can be changed without changing modal property of non-target modes. In addition, it is possible to fix any modes if the number of sensors is equal to that of the system modes. Finally, the required control power to achieve desired change of target mode is always lower than those of other methods such as VPC. However, MDC can make the closed loop system unstable when using incomplete modal data. So we need to take action to avoid undesirable instability from incomplete modal data. In this paper, we address the method to design the unique and robust MDD obtained from incomplete modal data. The associated simulation will be Incorporated to demonstrate the usefulness of this method.

  • PDF

Analysis of Incomplete Field Data with Covariates (설명변수를 고려한 불완전 사용현장데이터 분석)

  • Oh, Young-Seok;Choi, In-Su;Bai, Do-Sun
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.25 no.4
    • /
    • pp.510-516
    • /
    • 1999
  • This paper proposes methods of estimating lifetime distribution from incomplete field data under parametric regression models. Failure-record data-failure times and covariates-reported to the manufacturer can be seriously incomplete for satisfactory inference since only reported failures are recorded. This paper assumes that within-warranty data are reported with probability $P_1$ ($\leq1$) and after-warranty data are reported with Methods of obtaining pseudo and after-warranty data are reported with $P_2$ (< $P_1$). Methods of obtaining pseudo maximum likelihood estimators(PMLEs) are outlined, their asymptotic properties are studied, and specific formulas for Weibull distribution are obtained. Simulation studies are perfumed to investigate the effects of follow-up percentage on the PMLEs.

  • PDF

Limit of the Ratio of Incomplete Beta Functions

  • Hong, Yeon-Woong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.7 no.2
    • /
    • pp.289-294
    • /
    • 1996
  • This paper considers the limit of the ratio of two incomplete beta functions $I_{x}(p+s,q+r)\;to\;I_{x}(p,q)\;as\;p+q{\rightarrow}{\infty}$. The results show that the limits depend on r,s,x and the limit of p/(p+q).

  • PDF