• Title/Summary/Keyword: Incomplete variable

Search Result 82, Processing Time 0.032 seconds

Pre-Adjustment of Incomplete Group Variable via K-Means Clustering

  • Hwang, S.Y.;Hahn, H.E.
    • Journal of the Korean Data and Information Science Society
    • /
    • v.15 no.3
    • /
    • pp.555-563
    • /
    • 2004
  • In classification and discrimination, we often face with incomplete group variable arising typically from many missing values and/or incredible cases. This paper suggests the use of K-means clustering for pre-adjusting incompleteness and in turn classification based on generalized statistical distance is performed. For illustrating the proposed procedure, simulation study is conducted comparatively with CART in data mining and traditional techniques which are ignoring incompleteness of group variable. Simulation study manifests that our methodology out-performs.

  • PDF

Algorithms for Handling Incomplete Data in SVM and Deep Learning (SVM과 딥러닝에서 불완전한 데이터를 처리하기 위한 알고리즘)

  • Lee, Jong-Chan
    • Journal of the Korea Convergence Society
    • /
    • v.11 no.3
    • /
    • pp.1-7
    • /
    • 2020
  • This paper introduces two different techniques for dealing with incomplete data and algorithms for learning this data. The first method is to process the incomplete data by assigning the missing value with equal probability that the missing variable can have, and learn this data with the SVM. This technique ensures that the higher the frequency of missing for any variable, the higher the entropy so that it is not selected in the decision tree. This method is characterized by ignoring all remaining information in the missing variable and assigning a new value. On the other hand, the new method is to calculate the entropy probability from the remaining information except the missing value and use it as an estimate of the missing variable. In other words, using a lot of information that is not lost from incomplete learning data to recover some missing information and learn using deep learning. These two methods measure performance by selecting one variable in turn from the training data and iteratively comparing the results of different measurements with varying proportions of data lost in the variable.

Customer Classification Method for Household Appliances Industries with a Large Number of Incomplete Data (다수의 결측치가 존재하는 가전업 고객 데이터 활용을 위한 고객분류기법의 개발)

  • Chang, Young-Soon;Seo, Jong-Hyen
    • IE interfaces
    • /
    • v.19 no.1
    • /
    • pp.86-96
    • /
    • 2006
  • Some customer data of manufacturing industries have a large number of incomplete data set due to the customer's infrequent purchasing behavior and the limitation of customer profile data gathered from sales representatives. So that, most sophisticated data analysis methods may not be applied directly. This paper proposes a heuristic data analysis method to classify customers in household appliances industries. The proposed PD (percent of difference) method can be used for the discriminant analysis of incomplete customer data with simple mathematical calculations. The method is composed of variable distribution estimation step, PD measure and cluster score evaluation steps, variable impact construction step, and segment assignment step. A real example is also presented.

A data extension technique to handle incomplete data (불완전한 데이터를 처리하기 위한 데이터 확장기법)

  • Lee, Jong Chan
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.2
    • /
    • pp.7-13
    • /
    • 2021
  • This paper introduces an algorithm that compensates for missing values after converting them into a format that can represent the probability for incomplete data including missing values in training data. In the previous method using this data conversion, incomplete data was processed by allocating missing values with an equal probability that missing variables can have. This method applied to many problems and obtained good results, but it was pointed out that there is a loss of information in that all information remaining in the missing variable is ignored and a new value is assigned. On the other hand, in the new proposed method, only complete information not including missing values is input into the well-known classification algorithm (C4.5), and the decision tree is constructed during learning. Then, the probability of the missing value is obtained from this decision tree and assigned as an estimated value of the missing variable. That is, some lost information is recovered using a lot of information that has not been lost from incomplete learning data.

Neural Network-based Decision Class Analysis with Incomplete Information

  • Kim, Jae-Kyeong;Lee, Jae-Kwang;Park, Kyung-Sam
    • Proceedings of the Korea Database Society Conference
    • /
    • 1999.06a
    • /
    • pp.281-287
    • /
    • 1999
  • Decision class analysis (DCA) is viewed as a classification problem where a set of input data (situation-specific knowledge) and output data (a topological leveled influence diagram (ID)) is given. Situation-specific knowledge is usually given from a decision maker (DM) with the help of domain expert(s). But it is not easy for the DM to know the situation-specific knowledge of decision problem exactly. This paper presents a methodology fur sensitivity analysis of DCA under incomplete information. The purpose of sensitivity analysis in DCA is to identify the effects of incomplete situation-specific frames whose uncertainty affects the importance of each variable in the resulting model. For such a purpose, our suggested methodology consists of two procedures: generative procedure and adaptive procedure. An interactive procedure is also suggested based the sensitivity analysis to build a well-formed ID. These procedures are formally explained and illustrated with a raw material purchasing problem.

  • PDF

Neural Network-based Decision Class Analysis with Incomplete Information

  • 김재경;이재광;박경삼
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 1999.03a
    • /
    • pp.281-287
    • /
    • 1999
  • Decision class analysis (DCA) is viewed as a classification problem where a set of input data (situation-specific knowledge) and output data(a topological leveled influence diagram (ID)) is given. Situation-specific knowledge is usually given from a decision maker (DM) with the help of domain expert(s). But it is not easy for the DM to know the situation-specific knowledge of decision problem exactly. This paper presents a methodology for sensitivity analysis of DCA under incomplete information. The purpose of sensitivity analysis in DCA is to identify the effects of incomplete situation-specific frames whose uncertainty affects the importance of each variable in the resulting model. For such a purpose, our suggested methodology consists of two procedures: generative procedure and adaptive procedure. An interactive procedure is also suggested based the sensitivity analysis to build a well-formed ID. These procedures are formally explained and illustrated with a raw material purchasing problem.

  • PDF

Analytical Approximation Algorithm for the Inverse of the Power of the Incomplete Gamma Function Based on Extreme Value Theory

  • Wu, Shanshan;Hu, Guobing;Yang, Li;Gu, Bin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.12
    • /
    • pp.4567-4583
    • /
    • 2021
  • This study proposes an analytical approximation algorithm based on extreme value theory (EVT) for the inverse of the power of the incomplete Gamma function. First, the Gumbel function is used to approximate the power of the incomplete Gamma function, and the corresponding inverse problem is transformed into the inversion of an exponential function. Then, using the tail equivalence theorem, the normalized coefficient of the general Weibull distribution function is employed to replace the normalized coefficient of the random variable following a Gamma distribution, and the approximate closed form solution is obtained. The effects of equation parameters on the algorithm performance are evaluated through simulation analysis under various conditions, and the performance of this algorithm is compared to those of the Newton iterative algorithm and other existing approximate analytical algorithms. The proposed algorithm exhibits good approximation performance under appropriate parameter settings. Finally, the performance of this method is evaluated by calculating the thresholds of space-time block coding and space-frequency block coding pattern recognition in multiple-input and multiple-output orthogonal frequency division multiplexing. The analytical approximation method can be applied to other related situations involving the maximum statistics of independent and identically distributed random variables following Gamma distributions.

A Classifier Capable of Handling Incomplete Data Set (불완전한 데이터를 처리할수 있는 분류기)

  • Lee, Jong-Chan;Lee, Won-Don
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.14 no.1
    • /
    • pp.53-62
    • /
    • 2010
  • This paper introduces a classification algorithm which can be applied to a learning problem with incomplete data sets, missing variable values or a class value. This algorithm uses a data expansion method which utilizes weighted values and probability techniques. It operates by extending a classifier which are considered to be in the optimal projection plane based on Fisher's formula. To do this, some equations are derived from the procedure to be applied to the data expansion. To evaluate the performance of the proposed algorithm, results of different measurements are iteratively compared by choosing one variable in the data set and then modifying the rate of missing and non-missing values in this selected variable. And objective evaluation of data sets can be achieved by comparing, the result of a data set with non-missing variable with that of C4.5 which is a known knowledge acquisition tool in machine learning.

The Design of Variable Structure Controller for the System in Phase Canonical Form with Incomplete State Measurements (비 측정 상태변수를 갖는 위상 표준형계통에 대한 가변구조 제어기의 설계)

  • 박귀태;최중경
    • The Transactions of the Korean Institute of Electrical Engineers
    • /
    • v.41 no.8
    • /
    • pp.902-913
    • /
    • 1992
  • There have been several control schemes for the single input systems with unmeasurable state variables using variable structure control(VSC) theory. In the previous VSC, the systems must be represented in phase canonical form and the complete measurements for each state variable must be assumed. In order to eliminate these restrictions several VSC methods were proposed. And especially for the systems in phase canonical form with unmeasurable state variables, the reduced order switching function algorithm was proposed. But this method has many drawbacks and can not be used in the case of general form (not phase canonical form) dynamic system. Therefore this paper propose new construction method of switching fuction for the systems in phase canonical form, which reduce the restriction of reduced order switching function algorithm. And this algorithm can be realized for any state representation and adopted in the systems where not all states are available for switching function synthesis or control.

  • PDF

The Role of Negative Binomial Sampling In Determining the Distribution of Minimum Chi-Square

  • Hamdy H.I.;Bentil Daniel E.;Son M.S.
    • International Journal of Contents
    • /
    • v.3 no.1
    • /
    • pp.1-8
    • /
    • 2007
  • The distributions of the minimum correlated F-variable arises in many applied statistical problems including simultaneous analysis of variance (SANOVA), equality of variance, selection and ranking populations, and reliability analysis. In this paper, negative binomial sampling technique is employed to derive the distributions of the minimum of chi-square variables and hence the distributions of the minimum correlated F-variables. The work presented in this paper is divided in two parts. The first part is devoted to develop some combinatorial identities arised from the negative binomial sampling. These identities are constructed and justified to serve important purpose, when we deal with these distributions or their characteristics. Other important results including cumulants and moments of these distributions are also given in somewhat simple forms. Second, the distributions of minimum, chisquare variable and hence the distribution of the minimum correlated F-variables are then derived within the negative binomial sampling framework. Although, multinomial theory applied to order statistics and standard transformation techniques can be used to derive these distributions, the negative binomial sampling approach provides more information regarding the nature of the relationship between the sampling vehicle and the probability distributions of these functions of chi-square variables. We also provide an algorithm to compute the percentage points of the distributions. The computation methods we adopted are exact and no interpolations are involved.