• 제목/요약/키워드: Incomplete variable

검색결과 82건 처리시간 0.043초

Pre-Adjustment of Incomplete Group Variable via K-Means Clustering

  • Hwang, S.Y.;Hahn, H.E.
    • Journal of the Korean Data and Information Science Society
    • /
    • 제15권3호
    • /
    • pp.555-563
    • /
    • 2004
  • In classification and discrimination, we often face with incomplete group variable arising typically from many missing values and/or incredible cases. This paper suggests the use of K-means clustering for pre-adjusting incompleteness and in turn classification based on generalized statistical distance is performed. For illustrating the proposed procedure, simulation study is conducted comparatively with CART in data mining and traditional techniques which are ignoring incompleteness of group variable. Simulation study manifests that our methodology out-performs.

  • PDF

SVM과 딥러닝에서 불완전한 데이터를 처리하기 위한 알고리즘 (Algorithms for Handling Incomplete Data in SVM and Deep Learning)

  • 이종찬
    • 한국융합학회논문지
    • /
    • 제11권3호
    • /
    • pp.1-7
    • /
    • 2020
  • 본 논문은 불완전한 데이터를 처리하기 위해 2가지의 서로 다른 기법과 이를 학습하는 알고리즘을 소개한다. 첫째방법은 손실변수가 가질 수 있는 균등한 확률로 손실값을 할당하여 불완전한 데이터를 처리하고, SVM 알고리즘으로 이 데이터를 학습하는 것이다. 이 기법은 임의의 변수에 손실 값의 빈도가 높을수록 엔트로피가 높도록 하여 이 변수가 결정트리에서 선택되지 않도록 하는 것이다. 이 방법은 손실 변수에 남아있는 정보를 모두 무시하고 새로운 값을 할당한다는 특징이 있다. 이에 반해 새로운 방법은 손실 값을 제외하고 남아있는 정보로 엔트로피 확률을 구하고 이를 손실 변수의 추정 값으로 사용하는 것이다. 즉, 불완전한 학습데이터로부터 소실되지 않은 많은 정보들을 이용해 소실된 일부 정보를 복구하고 딥러닝을 이용해 학습한다. 이 2가지 방법은 학습데이터에서 차례로 변수 하나를 선택하고, 이 변수에 손실된 데이터의 비율을 달리하면서 서로 다른 측정값들의 결과들과 반복적으로 비교함으로써 성능을 측정한다.

다수의 결측치가 존재하는 가전업 고객 데이터 활용을 위한 고객분류기법의 개발 (Customer Classification Method for Household Appliances Industries with a Large Number of Incomplete Data)

  • 장영순;서종현
    • 산업공학
    • /
    • 제19권1호
    • /
    • pp.86-96
    • /
    • 2006
  • Some customer data of manufacturing industries have a large number of incomplete data set due to the customer's infrequent purchasing behavior and the limitation of customer profile data gathered from sales representatives. So that, most sophisticated data analysis methods may not be applied directly. This paper proposes a heuristic data analysis method to classify customers in household appliances industries. The proposed PD (percent of difference) method can be used for the discriminant analysis of incomplete customer data with simple mathematical calculations. The method is composed of variable distribution estimation step, PD measure and cluster score evaluation steps, variable impact construction step, and segment assignment step. A real example is also presented.

불완전한 데이터를 처리하기 위한 데이터 확장기법 (A data extension technique to handle incomplete data)

  • 이종찬
    • 한국융합학회논문지
    • /
    • 제12권2호
    • /
    • pp.7-13
    • /
    • 2021
  • 본 논문은 학습 데이터에 손실값을 포함하고 있는 불완전한 데이터를 위하여 확률을 나타낼 수 있는 형식으로 변환한 후 손실값을 보상하는 알고리즘을 소개한다. 기존에 이러한 데이터 변환을 사용한 방법에서는 손실 변수가 가질 수 있는 균등한 확률로 손실값을 할당하여 불완전한 데이터를 처리하는 것이었다. 이 방법으로 많은 문제에 적용하여 좋은 결과를 얻었으나, 손실 변수에 남아있는 모든 정보를 무시하고 새로운 값을 할당한다는 점에서 정보의 손실이 있다는 지적이 있었다. 이에 반해 새로운 제안 방법은 손실값을 포함하지 않는 완전한 정보만을 잘 알려진 분류 알고리즘(C4.5)에 입력하고 학습하는 중에 결정트리가 구축된다. 그리고 이 결정트리로 부터 손실값에 대한 확률을 구하여 이를 손실 변수의 추정값으로 할당한다. 즉, 불완전한 학습 데이터에서 손실되지 않은 많은 정보들을 사용하여 손실된 일부 정보를 복구하는 것이다.

Neural Network-based Decision Class Analysis with Incomplete Information

  • Kim, Jae-Kyeong;Lee, Jae-Kwang;Park, Kyung-Sam
    • 한국데이타베이스학회:학술대회논문집
    • /
    • 한국데이타베이스학회 1999년도 춘계공동학술대회: 지식경영과 지식공학
    • /
    • pp.281-287
    • /
    • 1999
  • Decision class analysis (DCA) is viewed as a classification problem where a set of input data (situation-specific knowledge) and output data (a topological leveled influence diagram (ID)) is given. Situation-specific knowledge is usually given from a decision maker (DM) with the help of domain expert(s). But it is not easy for the DM to know the situation-specific knowledge of decision problem exactly. This paper presents a methodology fur sensitivity analysis of DCA under incomplete information. The purpose of sensitivity analysis in DCA is to identify the effects of incomplete situation-specific frames whose uncertainty affects the importance of each variable in the resulting model. For such a purpose, our suggested methodology consists of two procedures: generative procedure and adaptive procedure. An interactive procedure is also suggested based the sensitivity analysis to build a well-formed ID. These procedures are formally explained and illustrated with a raw material purchasing problem.

  • PDF

Neural Network-based Decision Class Analysis with Incomplete Information

  • 김재경;이재광;박경삼
    • 한국지능정보시스템학회:학술대회논문집
    • /
    • 한국지능정보시스템학회 1999년도 춘계공동학술대회-지식경영과 지식공학
    • /
    • pp.281-287
    • /
    • 1999
  • Decision class analysis (DCA) is viewed as a classification problem where a set of input data (situation-specific knowledge) and output data(a topological leveled influence diagram (ID)) is given. Situation-specific knowledge is usually given from a decision maker (DM) with the help of domain expert(s). But it is not easy for the DM to know the situation-specific knowledge of decision problem exactly. This paper presents a methodology for sensitivity analysis of DCA under incomplete information. The purpose of sensitivity analysis in DCA is to identify the effects of incomplete situation-specific frames whose uncertainty affects the importance of each variable in the resulting model. For such a purpose, our suggested methodology consists of two procedures: generative procedure and adaptive procedure. An interactive procedure is also suggested based the sensitivity analysis to build a well-formed ID. These procedures are formally explained and illustrated with a raw material purchasing problem.

  • PDF

Analytical Approximation Algorithm for the Inverse of the Power of the Incomplete Gamma Function Based on Extreme Value Theory

  • Wu, Shanshan;Hu, Guobing;Yang, Li;Gu, Bin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제15권12호
    • /
    • pp.4567-4583
    • /
    • 2021
  • This study proposes an analytical approximation algorithm based on extreme value theory (EVT) for the inverse of the power of the incomplete Gamma function. First, the Gumbel function is used to approximate the power of the incomplete Gamma function, and the corresponding inverse problem is transformed into the inversion of an exponential function. Then, using the tail equivalence theorem, the normalized coefficient of the general Weibull distribution function is employed to replace the normalized coefficient of the random variable following a Gamma distribution, and the approximate closed form solution is obtained. The effects of equation parameters on the algorithm performance are evaluated through simulation analysis under various conditions, and the performance of this algorithm is compared to those of the Newton iterative algorithm and other existing approximate analytical algorithms. The proposed algorithm exhibits good approximation performance under appropriate parameter settings. Finally, the performance of this method is evaluated by calculating the thresholds of space-time block coding and space-frequency block coding pattern recognition in multiple-input and multiple-output orthogonal frequency division multiplexing. The analytical approximation method can be applied to other related situations involving the maximum statistics of independent and identically distributed random variables following Gamma distributions.

불완전한 데이터를 처리할수 있는 분류기 (A Classifier Capable of Handling Incomplete Data Set)

  • 이종찬;이원돈
    • 한국정보통신학회논문지
    • /
    • 제14권1호
    • /
    • pp.53-62
    • /
    • 2010
  • 본 논문은 변수 값들이나 부류 값을 손실한, 불완전한 데이터를 포함하는 데이터 집합을 가지고 학습하는 문제에 적용될 수 있는 분류 알고리즘을 소개한다. 이 알고리즘은 가중치 값과 확률 기법들을 이용하는 데이터 확장 방법을 사용한다. 이는 휘셔(Fisher)의 식을 기반으로 최적의 투사 면이 되도록 고려된 분류기를 확장함으로써 수행한다. 이를 위해, 데이터 확장에 적용되는 과정으로 부터 몇몇 식들이 유도된다. 제안한 알고리즘의 성능평가를 위해, 데이터에서 하나의 변수를 선택하고 이 선택된 변수에 소실 값과 소실되지 않은 값들의 비율을 변형함에 의해 다른 측정값들의 결과들이 반복적으로 비교된다. 또한 데이터 집합의 객관적인 평가를 위해 기계학습에서 지식 습득 도구로 널리 쓰이는 C4.5의 결과와 비교한다.

비 측정 상태변수를 갖는 위상 표준형계통에 대한 가변구조 제어기의 설계 (The Design of Variable Structure Controller for the System in Phase Canonical Form with Incomplete State Measurements)

  • 박귀태;최중경
    • 대한전기학회논문지
    • /
    • 제41권8호
    • /
    • pp.902-913
    • /
    • 1992
  • There have been several control schemes for the single input systems with unmeasurable state variables using variable structure control(VSC) theory. In the previous VSC, the systems must be represented in phase canonical form and the complete measurements for each state variable must be assumed. In order to eliminate these restrictions several VSC methods were proposed. And especially for the systems in phase canonical form with unmeasurable state variables, the reduced order switching function algorithm was proposed. But this method has many drawbacks and can not be used in the case of general form (not phase canonical form) dynamic system. Therefore this paper propose new construction method of switching fuction for the systems in phase canonical form, which reduce the restriction of reduced order switching function algorithm. And this algorithm can be realized for any state representation and adopted in the systems where not all states are available for switching function synthesis or control.

  • PDF

The Role of Negative Binomial Sampling In Determining the Distribution of Minimum Chi-Square

  • Hamdy H.I.;Bentil Daniel E.;Son M.S.
    • International Journal of Contents
    • /
    • 제3권1호
    • /
    • pp.1-8
    • /
    • 2007
  • The distributions of the minimum correlated F-variable arises in many applied statistical problems including simultaneous analysis of variance (SANOVA), equality of variance, selection and ranking populations, and reliability analysis. In this paper, negative binomial sampling technique is employed to derive the distributions of the minimum of chi-square variables and hence the distributions of the minimum correlated F-variables. The work presented in this paper is divided in two parts. The first part is devoted to develop some combinatorial identities arised from the negative binomial sampling. These identities are constructed and justified to serve important purpose, when we deal with these distributions or their characteristics. Other important results including cumulants and moments of these distributions are also given in somewhat simple forms. Second, the distributions of minimum, chisquare variable and hence the distribution of the minimum correlated F-variables are then derived within the negative binomial sampling framework. Although, multinomial theory applied to order statistics and standard transformation techniques can be used to derive these distributions, the negative binomial sampling approach provides more information regarding the nature of the relationship between the sampling vehicle and the probability distributions of these functions of chi-square variables. We also provide an algorithm to compute the percentage points of the distributions. The computation methods we adopted are exact and no interpolations are involved.