• 제목/요약/키워드: Misclassification Error

검색결과 37건 처리시간 0.015초

Chi-squared Tests for Homogeneity based on Complex Sample Survey Data Subject to Misclassification Error

  • Heo, Sunyeong
    • Communications for Statistical Applications and Methods
    • /
    • 제9권3호
    • /
    • pp.853-864
    • /
    • 2002
  • In the analysis of categorical data subject to misclassification errors, the observed cell proportions are adjusted by a misclassification probabilities and estimates of variances are adjusted accordingly. In this case, it is important to determine the extent to which misclassification probabilities are homogeneous within a population. This paper considers methods to evaluate the power of chi-squared tests for homogeneity with complex survey data subject to misclassification errors. Two cases are considered: adjustment with homogeneous misclassification probabilities; adjustment with heterogeneous misclassification probabilities. To estimate misclassification probabilities, logistic regression method is considered.

Cost-Sensitive Case Based Reasoning using Genetic Algorithm: Application to Diagnose for Diabetes

  • Park Yoon-Joo;Kim Byung-Chun
    • 한국지능정보시스템학회:학술대회논문집
    • /
    • 한국지능정보시스템학회 2006년도 춘계학술대회
    • /
    • pp.327-335
    • /
    • 2006
  • Case Based Reasoning (CBR) has come to be considered as an appropriate technique for diagnosis, prognosis and prescription in medicine. However, canventional CBR has a limitation in that it cannot incorporate asymmetric misclassification cast. It assumes that the cast of type1 error and type2 error are the same, so it cannot be modified according ta the error cast of each type. This problem provides major disincentive to apply conventional CBR ta many real world cases that have different casts associated with different types of error. Medical diagnosis is an important example. In this paper we suggest the new knowledge extraction technique called Cast-Sensitive Case Based Reasoning (CSCBR) that can incorporate unequal misclassification cast. The main idea involves a dynamic adaptation of the optimal classification boundary paint and the number of neighbors that minimize the tatol misclassification cast according ta the error casts. Our technique uses a genetic algorithm (GA) for finding these two feature vectors of CSCBR. We apply this new method ta diabetes datasets and compare the results with those of the cast-sensitive methods, C5.0 and CART. The results of this paper shaw that the proposed technique outperforms other methods and overcomes the limitation of conventional CBR.

  • PDF

Evaluating Predictive Ability of Classification Models with Ordered Multiple Categories

  • Oong-Hyun Sung
    • Communications for Statistical Applications and Methods
    • /
    • 제6권2호
    • /
    • pp.383-395
    • /
    • 1999
  • This study is concerned with the evaluation of predictive ability of classification models with ordered multiple categories. If categories can be ordered or ranked the spread of misclassification should be considered to evaluate the performance of the classification models using loss rate since the apparent error rate can not measure the spread of misclassification. Since loss rate is known to underestimate the true loss rate the bootstrap method were used to estimate the true loss rate. thus this study suggests the method to evaluate the predictive power of the classification models using loss rate and the bootstrap estimate of the true loss rate.

  • PDF

Estimating Prediction Errors in Binary Classification Problem: Cross-Validation versus Bootstrap

  • Kim Ji-Hyun;Cha Eun-Song
    • Communications for Statistical Applications and Methods
    • /
    • 제13권1호
    • /
    • pp.151-165
    • /
    • 2006
  • It is important to estimate the true misclassification rate of a given classifier when an independent set of test data is not available. Cross-validation and bootstrap are two possible approaches in this case. In related literature bootstrap estimators of the true misclassification rate were asserted to have better performance for small samples than cross-validation estimators. We compare the two estimators empirically when the classification rule is so adaptive to training data that its apparent misclassification rate is close to zero. We confirm that bootstrap estimators have better performance for small samples because of small variance, and we have found a new fact that their bias tends to be significant even for moderate to large samples, in which case cross-validation estimators have better performance with less computation.

범주형 자료에서 경험적 베이지안 오분류 분석 (Empirical Bayesian Misclassification Analysis on Categorical Data)

  • 임한승;홍종선;서문섭
    • 응용통계연구
    • /
    • 제14권1호
    • /
    • pp.39-57
    • /
    • 2001
  • 범주형 자료에서 오분류는 자료를 수집하는 과정에서 발생될 수 있다. 오분류되어 있는 자료를 정확한 자료로 간주하여 분석한다면 추정결과에 편의가 발생하고 검정력이 약화되는 결과를 초래하게 되며, 정확하게 분류된 자료를 오분류하고 판단한다면 오분류의 수정을 위해 불필요한 비용과 시간을 낭비해야 할 것이다. 따라서 정확하게 분류된 표본인지 오분류된 표본인지를 판정하는 것은 자료를 분석하기 전에 이루어져야할 매우 중요한 과정이다. 본 논문은 I$\times$J 분할표로 주어지는 범주형 자료에서 두 변수 중 하나의 변수에서만 오분류가 발생되는 경우에 오분류 여부를 검정하기 위해서 오분류 가능성이 없는 변수에 대한 주변합은 고정시키고, 오분류 여부를 가능성이 있는 변수의 주변합을 Sebastiani와 Ramoni(1997)가 제안한 Bound와 외부정보로 표현되는 Collapse의 개념, 그리고 베이지안 방법을 확장하여 자료에 적합한 모형과 사전정보를 고려한 사전모수를 다양하게 설정하면서 재분류하는 연구를 하였다. 오분류에 대한 정보를 얻기 위해서 Tenenbein(1970)에 의해 연구된 이중추출법을 이용하여 오분류 검정을 위한 새로운 통계량을 제안하였으며, 제안된 오분류 검정통계량에 관한 분포를 다양한 모의실험을 통하여 연구하였다.

  • PDF

Data-Adaptive ECOC for Multicategory Classification

  • Seok, Kyung-Ha
    • Journal of the Korean Data and Information Science Society
    • /
    • 제19권1호
    • /
    • pp.25-36
    • /
    • 2008
  • Error Correcting Output Codes (ECOC) can improve generalization performance when applied to multicategory classification problem. In this study we propose a new criterion to select hyperparameters included in ECOC scheme. Instead of margins of a data we propose to use the probability of misclassification error since it makes the criterion simple. Using this we obtain an upper bound of leave-one-out error of OVA(one vs all) method. Our experiments from real and synthetic data indicate that the bound leads to good estimates of parameters.

  • PDF

Bootstrap confidence intervals for classification error rate in circular models when a block of observations is missing

  • Chung, Hie-Choon;Han, Chien-Pai
    • Journal of the Korean Data and Information Science Society
    • /
    • 제20권4호
    • /
    • pp.757-764
    • /
    • 2009
  • In discriminant analysis, we consider a special pattern which contains a block of missing observations. We assume that the two populations are equally likely and the costs of misclassification are equal. In this situation, we consider the bootstrap confidence intervals of the error rate in the circular models when the covariance matrices are equal and not equal.

  • PDF

Bootstrap Confidence Intervals of Classification Error Rate for a Block of Missing Observations

  • Chung, Hie-Choon
    • Communications for Statistical Applications and Methods
    • /
    • 제16권4호
    • /
    • pp.675-686
    • /
    • 2009
  • In this paper, it will be assumed that there are two distinct populations which are multivariate normal with equal covariance matrix. We also assume that the two populations are equally likely and the costs of misclassification are equal. The classification rule depends on the situation when the training samples include missing values or not. We consider the bootstrap confidence intervals for classification error rate when a block of observation is missing.

Hyperparameter Selection for APC-ECOC

  • Seok, Kyung-Ha
    • Journal of the Korean Data and Information Science Society
    • /
    • 제19권4호
    • /
    • pp.1219-1231
    • /
    • 2008
  • The main object of this paper is to develop a leave-one-out(LOO) bound of all pairwise comparison error correcting output codes (APC-ECOC). To avoid using classifiers whose corresponding target values are 0 in APC-ECOC and requiring pilot estimates we developed a bound based on mean misclassification probability(MMP). It can be used to tune kernel hyperparameters. Our empirical experiment using kernel mean squared estimate(KMSE) as the binary classifier indicates that the bound leads to good estimates of kernel hyperparameters.

  • PDF