• 제목/요약/키워드: Statistics-based Classification

검색결과 396건 처리시간 0.036초

Classification via principal differential analysis

  • Jang, Eunseong;Lim, Yaeji
    • Communications for Statistical Applications and Methods
    • /
    • 제28권2호
    • /
    • pp.135-150
    • /
    • 2021
  • We propose principal differential analysis based classification methods. Computations of squared multiple correlation function (RSQ) and principal differential analysis (PDA) scores are reviewed; in addition, we combine principal differential analysis results with the logistic regression for binary classification. In the numerical study, we compare the principal differential analysis based classification methods with functional principal component analysis based classification. Various scenarios are considered in a simulation study, and principal differential analysis based classification methods classify the functional data well. Gene expression data is considered for real data analysis. We observe that the PDA score based method also performs well.

Object-oriented Classification and QuickBird Multi-spectral Imagery in Forest Density Mapping

  • Jayakumar, S.;Ramachandran, A.;Lee, Jung-Bin;Heo, Joon
    • 대한원격탐사학회지
    • /
    • 제23권3호
    • /
    • pp.153-160
    • /
    • 2007
  • Forest cover density studies using high resolution satellite data and object oriented classification are limited in India. This article focuses on the potential use of QuickBird satellite data and object oriented classification in forest density mapping. In this study, the high-resolution satellite data was classified based on NDVI/pixel based and object oriented classification methods and results were compared. The QuickBird satellite data was found to be suitable in forest density mapping. Object oriented classification was superior than the NDVI/pixel based classification. The Object oriented classification method classified all the density classes of forest (dense, open, degraded and bare soil) with higher producer and user accuracies and with more kappa statistics value compared to pixel based method. The overall classification accuracy and Kappa statistics values of the object oriented classification were 83.33% and 0.77 respectively, which were higher than the pixel based classification (68%, 0.56 respectively). According to the Z statistics, the results of these two classifications were significantly different at 95% confidence level.

Classification of Microarray Gene Expression Data by MultiBlock Dimension Reduction

  • Oh, Mi-Ra;Kim, Seo-Young;Kim, Kyung-Sook;Baek, Jang-Sun;Son, Young-Sook
    • Communications for Statistical Applications and Methods
    • /
    • 제13권3호
    • /
    • pp.567-576
    • /
    • 2006
  • In this paper, we applied the multiblock dimension reduction methods to the classification of tumor based on microarray gene expressions data. This procedure involves clustering selected genes, multiblock dimension reduction and classification using linear discrimination analysis and quadratic discrimination analysis.

Bivariate ROC Curve and Optimal Classification Function

  • Hong, C.S.;Jeong, J.A.
    • Communications for Statistical Applications and Methods
    • /
    • 제19권4호
    • /
    • pp.629-638
    • /
    • 2012
  • We propose some methods to obtain optimal thresholds and classification functions by using various cutoff criterion based on the bivariate ROC curve that represents bivariate cumulative distribution functions. The false positive rate and false negative rate are calculated with these classification functions for bivariate normal distributions.

옥타브밴드 순서 통계량에 기반한 음악 장르 분류 (A Musical Genre Classification Method Based on the Octave-Band Order Statistics)

  • 서진수
    • 한국음향학회지
    • /
    • 제33권1호
    • /
    • pp.81-86
    • /
    • 2014
  • 본 논문은 음악신호의 옥타브 밴드 상에서 주파수와 시간 방향의 순서 통계량에 기반한 음악분류기에 대한 연구이다. 음악의 화음 및 강약 구조를 표현하기 위해서 파워스펙트럼의 옥타브 밴드 순서 통계량을 이용하였다. 널리 사용되고 있는 두 음악 데이터셋을 이용한 성능 실험을 통해서, 옥타브 밴드 순서 통계량이 기존의 MFCC 와 옥타브밴드 스펙트럼 고저차 특징에 비해서 두 데이터셋에대해 각각 2.61 %와 8.9 % 장르 분류정확도가 개선되었다. 실험결과는 옥타브 밴드 순서 통계량이 음악 장르 분류에 적합함을 보인다.

L1-거리와 L1-데이터뎁스를 이용한 분류방법의 비교연구 (Comparison Studies of Classification Methods based on L1-Distance and L1-Data Depth)

  • 백수진;황진수;김진경
    • 응용통계연구
    • /
    • 제19권1호
    • /
    • pp.183-193
    • /
    • 2006
  • $L_1$-데이터뎁스를 이용한 분류방법(L1DDclass)과 관측치들 사이의 $L_1$-거리를 이용한 분류방법(L1DISTclass)의 특징을 살펴보고, 이 두 방법을 결합한 새로운 분류방법 (DnDclass: Distance and Data-depth based classification)의 효용성을 소개하고자 한다. 모의실험을 통해 세가지 분류방법의 결과를 비교하고 제안된 분류방법이 다양한 경우에 더 효과적일 수 있다는 사실을 확인한다.

On the Performance of Cuckoo Search and Bat Algorithms Based Instance Selection Techniques for SVM Speed Optimization with Application to e-Fraud Detection

  • AKINYELU, Andronicus Ayobami;ADEWUMI, Aderemi Oluyinka
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제12권3호
    • /
    • pp.1348-1375
    • /
    • 2018
  • Support Vector Machine (SVM) is a well-known machine learning classification algorithm, which has been widely applied to many data mining problems, with good accuracy. However, SVM classification speed decreases with increase in dataset size. Some applications, like video surveillance and intrusion detection, requires a classifier to be trained very quickly, and on large datasets. Hence, this paper introduces two filter-based instance selection techniques for optimizing SVM training speed. Fast classification is often achieved at the expense of classification accuracy, and some applications, such as phishing and spam email classifiers, are very sensitive to slight drop in classification accuracy. Hence, this paper also introduces two wrapper-based instance selection techniques for improving SVM predictive accuracy and training speed. The wrapper and filter based techniques are inspired by Cuckoo Search Algorithm and Bat Algorithm. The proposed techniques are validated on three popular e-fraud types: credit card fraud, spam email and phishing email. In addition, the proposed techniques are validated on 20 other datasets provided by UCI data repository. Moreover, statistical analysis is performed and experimental results reveals that the filter-based and wrapper-based techniques significantly improved SVM classification speed. Also, results reveal that the wrapper-based techniques improved SVM predictive accuracy in most cases.

Prediction of extreme PM2.5 concentrations via extreme quantile regression

  • Lee, SangHyuk;Park, Seoncheol;Lim, Yaeji
    • Communications for Statistical Applications and Methods
    • /
    • 제29권3호
    • /
    • pp.319-331
    • /
    • 2022
  • In this paper, we develop a new statistical model to forecast the PM2.5 level in Seoul, South Korea. The proposed model is based on the extreme quantile regression model with lasso penalty. Various meteorological variables and air pollution variables are considered as predictors in the regression model, and the lasso quantile regression performs variable selection and solves the multicollinearity problem. The final prediction model is obtained by combining various extreme lasso quantile regression estimators and we construct a binary classifier based on the model. Prediction performance is evaluated through the statistical measures of the performance of a binary classification test. We observe that the proposed method works better compared to the other classification methods, and predicts 'very bad' cases of the PM2.5 level well.

On EM Algorithm For Discrete Classification With Bahadur Model: Unknown Prior Case

  • Kim, Hea-Jung;Jung, Hun-Jo
    • Journal of the Korean Statistical Society
    • /
    • 제23권1호
    • /
    • pp.63-78
    • /
    • 1994
  • For discrimination with binary variables, reformulated full and first order Bahadur model with incomplete observations are presented. This allows prior probabilities associated with multiple population to be estimated for the sample-based classification rule. The EM algorithm is adopted to provided the maximum likelihood estimates of the parameters of interest. Some experiences with the models are evaluated and discussed.

  • PDF

Pre-Adjustment of Incomplete Group Variable via K-Means Clustering

  • Hwang, S.Y.;Hahn, H.E.
    • Journal of the Korean Data and Information Science Society
    • /
    • 제15권3호
    • /
    • pp.555-563
    • /
    • 2004
  • In classification and discrimination, we often face with incomplete group variable arising typically from many missing values and/or incredible cases. This paper suggests the use of K-means clustering for pre-adjusting incompleteness and in turn classification based on generalized statistical distance is performed. For illustrating the proposed procedure, simulation study is conducted comparatively with CART in data mining and traditional techniques which are ignoring incompleteness of group variable. Simulation study manifests that our methodology out-performs.

  • PDF