• 제목/요약/키워드: Statistical classification

검색결과 1,428건 처리시간 0.031초

Comparison Study of Multi-class Classification Methods

  • Bae, Wha-Soo;Jeon, Gab-Dong;Seok, Kyung-Ha
    • Communications for Statistical Applications and Methods
    • /
    • 제14권2호
    • /
    • pp.377-388
    • /
    • 2007
  • As one of multi-class classification methods, ECOC (Error Correcting Output Coding) method is known to have low classification error rate. This paper aims at suggesting effective multi-class classification method (1) by comparing various encoding methods and decoding methods in ECOC method and (2) by comparing ECOC method and direct classification method. Both SVM (Support Vector Machine) and logistic regression model were used as binary classifiers in comparison.

Evaluation of the classification method using ancestry SNP markers for ethnic group

  • Lee, Hyo Jung;Hong, Sun Pyo;Lee, Soong Deok;Rhee, Hwan seok;Lee, Ji Hyun;Jeong, Su Jin;Lee, Jae Won
    • Communications for Statistical Applications and Methods
    • /
    • 제26권1호
    • /
    • pp.1-9
    • /
    • 2019
  • Various probabilistic methods have been proposed for using interpopulation allele frequency differences to infer the ethnic group of a DNA specimen. The selection of the statistical method is critical because the accuracy of the statistical classification results vary. For the ancestry classification, we proposed a new ancestry evaluation method that estimate the combined ethnicity index as well as compared its performance with various classical classification methods using two real data sets. We selected 13 SNPs that are useful for the inference of ethnic origin. These single nucleotide polymorphisms (SNPs) were analyzed by restriction fragment mass polymorphism assay and followed by classification among ethnic groups. We genotyped 400 individuals from four ethnic groups (100 African-American, 100 Caucasian, 100 Korean, and 100 Mexican-American) for 13 SNPs and allele frequencies that differed among the four ethnic groups. Additionally, we applied our new method to HapMap SNP genotypes for 1,011 samples from 4 populations (African, European, East Asian, and Central-South Asian). Our proposed method yielded the highest accuracy among statistical classification methods. Our ethnic group classification system based on the analysis of ancestry informative SNP markers can provide a useful statistical tool to identify ethnic groups.

Bivariate ROC Curve and Optimal Classification Function

  • Hong, C.S.;Jeong, J.A.
    • Communications for Statistical Applications and Methods
    • /
    • 제19권4호
    • /
    • pp.629-638
    • /
    • 2012
  • We propose some methods to obtain optimal thresholds and classification functions by using various cutoff criterion based on the bivariate ROC curve that represents bivariate cumulative distribution functions. The false positive rate and false negative rate are calculated with these classification functions for bivariate normal distributions.

Classification via principal differential analysis

  • Jang, Eunseong;Lim, Yaeji
    • Communications for Statistical Applications and Methods
    • /
    • 제28권2호
    • /
    • pp.135-150
    • /
    • 2021
  • We propose principal differential analysis based classification methods. Computations of squared multiple correlation function (RSQ) and principal differential analysis (PDA) scores are reviewed; in addition, we combine principal differential analysis results with the logistic regression for binary classification. In the numerical study, we compare the principal differential analysis based classification methods with functional principal component analysis based classification. Various scenarios are considered in a simulation study, and principal differential analysis based classification methods classify the functional data well. Gene expression data is considered for real data analysis. We observe that the PDA score based method also performs well.

A Note on Fuzzy Support Vector Classification

  • Lee, Sung-Ho;Hong, Dug-Hun
    • Communications for Statistical Applications and Methods
    • /
    • 제14권1호
    • /
    • pp.133-140
    • /
    • 2007
  • The support vector machine has been well developed as a powerful tool for solving classification problems. In many real world applications, each training point has a different effect on constructing classification rule. Lin and Wang (2002) proposed fuzzy support vector machines for this kind of classification problems, which assign fuzzy memberships to the input data and reformulate the support vector classification. In this paper another intuitive approach is proposed by using the fuzzy ${\alpha}-cut$ set. It will show us the trend of classification functions as ${\alpha}$ changes.

Binary classification on compositional data

  • Joo, Jae Yun;Lee, Seokho
    • Communications for Statistical Applications and Methods
    • /
    • 제28권1호
    • /
    • pp.89-97
    • /
    • 2021
  • Due to boundedness and sum constraint, compositional data are often transformed by logratio transformation and their transformed data are put into traditional binary classification or discriminant analysis. However, it may be problematic to directly apply traditional multivariate approaches to the transformed data because class distributions are not Gaussian and Bayes decision boundary are not polynomial on the transformed space. In this study, we propose to use flexible classification approaches to transformed data for compositional data classification. Empirical studies using synthetic and real examples demonstrate that flexible approaches outperform traditional multivariate classification or discriminant analysis.

퍼지 규칙기반 분류시스템에서 퍼지 분할의 선택방법 (Selection Method of Fuzzy Partitions in Fuzzy Rule-Based Classification Systems)

  • 손창식;정환묵;권순학
    • 한국지능시스템학회논문지
    • /
    • 제18권3호
    • /
    • pp.360-366
    • /
    • 2008
  • 퍼지 규칙기반 분류 시스템에서 초기의 퍼지 분할은 주어진 데이터가 가진 속성들의 도메인을 고려함으로서 결정되어지고, 최적의 분류 경계면은 초기에 정의된 퍼지 분할의 파라미터들을 조정함으로서 찾을 수 있다. 본 논문에서는 학습과정들을 사용하지 않고 패턴분류의 성능을 최대화하기 위해 통계적 정보에 기반을 둔 퍼지 분할의 선택방법을 제안한다. 제안된 방법에서 통계적 정보는 주어진 수치적인 데이터로부터 각 입력 속성의 '불확실성 영역', 즉 패턴분류문제에서 분류 경계면이 결정되는 영역을 추출하기 위해 사용되었다. 또한 통계적인 정보에 의해서 생성된 퍼지 분할구간에 대응하는 후보 규칙들을 추출하기 위한 방법과 그 후보 규칙들 간의 커플링 문제를 최소화하기 위한 방법도 추가적으로 논의하였다. 실험에서는 제안된 방법의 효용성을 보이기 위해 IRIS와 New Thyroid Cancer 데이터를 사용한 기존 패턴분류 방법들과의 분류 정확성을 비교하였고, 그 결과들로부터 제안된 방법이 기존의 방법들보다 더 좋은 분류 정확성을 제공함을 확인할 수 있었다.

Logistic Regression Classification by Principal Component Selection

  • Kim, Kiho;Lee, Seokho
    • Communications for Statistical Applications and Methods
    • /
    • 제21권1호
    • /
    • pp.61-68
    • /
    • 2014
  • We propose binary classification methods by modifying logistic regression classification. We use variable selection procedures instead of original variables to select the principal components. We describe the resulting classifiers and discuss their properties. The performance of our proposals are illustrated numerically and compared with other existing classification methods using synthetic and real datasets.

Discriminant Analysis of Binary Data by Using the Maximum Entropy Distribution

  • Lee, Jung Jin;Hwang, Joon
    • Communications for Statistical Applications and Methods
    • /
    • 제10권3호
    • /
    • pp.909-917
    • /
    • 2003
  • Although many classification models have been used to classify binary data, none of the classification models dominates all varying circumstances depending on the number of variables and the size of data(Asparoukhov and Krzanowski (2001)). This paper proposes a classification model which uses information on marginal distributions of sub-variables and its maximum entropy distribution. Classification experiments by using simulation are discussed.

A Note on Linear SVM in Gaussian Classes

  • Jeon, Yongho
    • Communications for Statistical Applications and Methods
    • /
    • 제20권3호
    • /
    • pp.225-233
    • /
    • 2013
  • The linear support vector machine(SVM) is motivated by the maximal margin separating hyperplane and is a popular tool for binary classification tasks. Many studies exist on the consistency properties of SVM; however, it is unknown whether the linear SVM is consistent for estimating the optimal classification boundary even in the simple case of two Gaussian classes with a common covariance, where the optimal classification boundary is linear. In this paper we show that the linear SVM can be inconsistent in the univariate Gaussian classification problem with a common variance, even when the best tuning parameter is used.