• Title/Summary/Keyword: Binary Classification

Search Result 460, Processing Time 0.022 seconds

A Note on Linear SVM in Gaussian Classes

  • Jeon, Yongho
    • Communications for Statistical Applications and Methods
    • /
    • v.20 no.3
    • /
    • pp.225-233
    • /
    • 2013
  • The linear support vector machine(SVM) is motivated by the maximal margin separating hyperplane and is a popular tool for binary classification tasks. Many studies exist on the consistency properties of SVM; however, it is unknown whether the linear SVM is consistent for estimating the optimal classification boundary even in the simple case of two Gaussian classes with a common covariance, where the optimal classification boundary is linear. In this paper we show that the linear SVM can be inconsistent in the univariate Gaussian classification problem with a common variance, even when the best tuning parameter is used.

OptiNeural System for Optical Pattern Classification

  • Kim, Myung-Soo
    • Journal of Electrical Engineering and information Science
    • /
    • v.3 no.3
    • /
    • pp.342-347
    • /
    • 1998
  • An OptiNeural system is developed for optical pattern classification. It is a novel hybrid system which consists of an optical processor and a multilayer neural network. It takes advantages of two dimensional processing capability of an optical processor and nonlinear mapping capability of a neural network. The optical processor with a binary phase only filter is used as a preprocessor for feature extraction and the neural network is used as a decision system through mapping. OptiNeural system is trained for optical pattern classification by use of a simulated annealing algorithm. Its classification performance for grey tone texture patterns is excellent, while a conventional optical system shows poor classification performance.

  • PDF

A design of binary decision tree using genetic algorithms and its application to the alphabetic charcter (유전 알고리즘을 이용한 이진 결정 트리의 설계와 영문자 인식에의 응용)

  • 정순원;김경민;박귀태
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 1995.10b
    • /
    • pp.218-223
    • /
    • 1995
  • A new design scheme of a binary decision tree is proposed. In this scheme a binary decision tree is constructed by using genetic algorithm and FCM algorithm. At each node optimal or near-optimal feature or feature subset among all the available features is selected based on fitness function in genetic algorithm which is inversely proportional to classification error, balance between cluster, number of feature used. The proposed design scheme is applied to the handwtitten alphabetic characters. Experimental results show the usefulness of the proposed scheme.

  • PDF

A Data Mining Procedure for Unbalanced Binary Classification (불균형 이분 데이터 분류분석을 위한 데이터마이닝 절차)

  • Jung, Han-Na;Lee, Jeong-Hwa;Jun, Chi-Hyuck
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.36 no.1
    • /
    • pp.13-21
    • /
    • 2010
  • The prediction of contract cancellation of customers is essential in insurance companies but it is a difficult problem because the customer database is large and the target or cancelled customers are a small proportion of the database. This paper proposes a new data mining approach to the binary classification by handling a large-scale unbalanced data. Over-sampling, clustering, regularized logistic regression and boosting are also incorporated in the proposed approach. The proposed approach was applied to a real data set in the area of insurance and the results were compared with some other classification techniques.

Prediction of extreme PM2.5 concentrations via extreme quantile regression

  • Lee, SangHyuk;Park, Seoncheol;Lim, Yaeji
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.3
    • /
    • pp.319-331
    • /
    • 2022
  • In this paper, we develop a new statistical model to forecast the PM2.5 level in Seoul, South Korea. The proposed model is based on the extreme quantile regression model with lasso penalty. Various meteorological variables and air pollution variables are considered as predictors in the regression model, and the lasso quantile regression performs variable selection and solves the multicollinearity problem. The final prediction model is obtained by combining various extreme lasso quantile regression estimators and we construct a binary classifier based on the model. Prediction performance is evaluated through the statistical measures of the performance of a binary classification test. We observe that the proposed method works better compared to the other classification methods, and predicts 'very bad' cases of the PM2.5 level well.

MARGIN-BASED GENERALIZATION FOR CLASSIFICATIONS WITH INPUT NOISE

  • Choe, Hi Jun;Koh, Hayeong;Lee, Jimin
    • Journal of the Korean Mathematical Society
    • /
    • v.59 no.2
    • /
    • pp.217-233
    • /
    • 2022
  • Although machine learning shows state-of-the-art performance in a variety of fields, it is short a theoretical understanding of how machine learning works. Recently, theoretical approaches are actively being studied, and there are results for one of them, margin and its distribution. In this paper, especially we focused on the role of margin in the perturbations of inputs and parameters. We show a generalization bound for two cases, a linear model for binary classification and neural networks for multi-classification, when the inputs have normal distributed random noises. The additional generalization term caused by random noises is related to margin and exponentially inversely proportional to the noise level for binary classification. And in neural networks, the additional generalization term depends on (input dimension) × (norms of input and weights). For these results, we used the PAC-Bayesian framework. This paper is considering random noises and margin together, and it will be helpful to a better understanding of model sensitivity and the construction of robust generalization.

Multiclass-based AdaBoost Algorithm (다중 클래스 아다부스트 알고리즘)

  • Kim, Tae-Hyun;Park, Dong-Chul
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.48 no.1
    • /
    • pp.44-50
    • /
    • 2011
  • We propose a multi-class AdaBoost algorithm for en efficient classification of multi-class data in this paper. Traditional AdaBoost algorithm is basically a binary classifier and it has limitations when applied to multi-class data problems even though multi-class versions are available. In order to overcome the problems on the AdaBoost algorithm for multi-class classification problems, we devise an AdaBoost architecture with a training algorithm that utilizes multi-class classifiers for its weak classifiers instead of series of binary classifiers. Experiments on a image classification problem using collected Caltech Image Database are preformed. The results show that the proposed AdaBoost architecture can reduce its training time while maintaining its classification accuracy competitive when compared to Adaboost.M2.

CNN-based Android Malware Detection Using Reduced Feature Set

  • Kim, Dong-Min;Lee, Soo-jin
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.10
    • /
    • pp.19-26
    • /
    • 2021
  • The performance of deep learning-based malware detection and classification models depends largely on how to construct a feature set to be applied to training. In this paper, we propose an approach to select the optimal feature set to maximize detection performance for CNN-based Android malware detection. The features to be included in the feature set were selected through the Chi-Square test algorithm, which is widely used for feature selection in machine learning and deep learning. To validate the proposed approach, the CNN model was trained using 36 characteristics selected for the CICANDMAL2017 dataset and then the malware detection performance was measured. As a result, 99.99% of Accuracy was achieved in binary classification and 98.55% in multiclass classification.

Classification of e-mail Using Dynamic Category Hierarchy and Automatic category generation (자동 카테고리 생성과 동적 분류 체계를 사용한 이메일 분류)

  • Ahn Chan Min;Park Sang Ho;Lee Ju-Hong;Choi Bum-Ghi;Park Sun
    • Journal of Intelligence and Information Systems
    • /
    • v.10 no.2
    • /
    • pp.79-89
    • /
    • 2004
  • Since the amount of E-mail messages has increased , we need a new technique for efficient e-mail classification. E-mail classifications are grouped into two classes: binary classification, multi-classification. The current binary classification methods are mostly spm mail classification methods which are based on rule driven, bayesian, SVM, etc. The current multi- classification methods are based on clustering which groups e-mails by similarity. In this paper, we propose a novel method for e-mail classification. It combines the automatic category generation method based on the vector model and the dynamic category hierarchy construction method. This method can multi-classify e-mail automatically and manage a large amount of e-mail efficiently. In addition, this method increases the search accuracy by dynamic reclassification of e-mails.

  • PDF

Emotion Recognition Method Using FLD and Staged Classification Based on Profile Data (프로파일기반의 FLD와 단계적 분류를 이용한 감성 인식 기법)

  • Kim, Jae-Hyup;Oh, Na-Rae;Jun, Gab-Song;Moon, Young-Shik
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.48 no.6
    • /
    • pp.35-46
    • /
    • 2011
  • In this paper, we proposed the method of emotion recognition using staged classification model and Fisher's linear discriminant. By organizing the staged classification model, the proposed method improves the classification rate on the Fisher's feature space with high complexity. The staged classification model is achieved by the successive combining of binary classification model which has simple structure and high performance. On each stage, it forms Fisher's linear discriminant according to the two groups which contain each emotion class, and generates the binary classification model by using Adaboost method on the Fisher's space. Whole learning process is repeatedly performed until all the separations of emotion classes are finished. In experimental results, the proposed method provides about 72% classification rate on 8 classes of emotion and about 93% classification rate on specific 3 classes of emotion.