• Title/Summary/Keyword: one class classification

Search Result 346, Processing Time 0.028 seconds

An Efficient One Class Classifier Using Gaussian-based Hyper-Rectangle Generation (가우시안 기반 Hyper-Rectangle 생성을 이용한 효율적 단일 분류기)

  • Kim, Do Gyun;Choi, Jin Young;Ko, Jeonghan
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.41 no.2
    • /
    • pp.56-64
    • /
    • 2018
  • In recent years, imbalanced data is one of the most important and frequent issue for quality control in industrial field. As an example, defect rate has been drastically reduced thanks to highly developed technology and quality management, so that only few defective data can be obtained from production process. Therefore, quality classification should be performed under the condition that one class (defective dataset) is even smaller than the other class (good dataset). However, traditional multi-class classification methods are not appropriate to deal with such an imbalanced dataset, since they classify data from the difference between one class and the others that can hardly be found in imbalanced datasets. Thus, one-class classification that thoroughly learns patterns of target class is more suitable for imbalanced dataset since it only focuses on data in a target class. So far, several one-class classification methods such as one-class support vector machine, neural network and decision tree there have been suggested. One-class support vector machine and neural network can guarantee good classification rate, and decision tree can provide a set of rules that can be clearly interpreted. However, the classifiers obtained from the former two methods consist of complex mathematical functions and cannot be easily understood by users. In case of decision tree, the criterion for rule generation is ambiguous. Therefore, as an alternative, a new one-class classifier using hyper-rectangles was proposed, which performs precise classification compared to other methods and generates rules clearly understood by users as well. In this paper, we suggest an approach for improving the limitations of those previous one-class classification algorithms. Specifically, the suggested approach produces more improved one-class classifier using hyper-rectangles generated by using Gaussian function. The performance of the suggested algorithm is verified by a numerical experiment, which uses several datasets in UCI machine learning repository.

Design of One-Class Classifier Using Hyper-Rectangles (Hyper-Rectangles를 이용한 단일 분류기 설계)

  • Jeong, In Kyo;Choi, Jin Young
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.41 no.5
    • /
    • pp.439-446
    • /
    • 2015
  • Recently, the importance of one-class classification problem is more increasing. However, most of existing algorithms have the limitation on providing the information that effects on the prediction of the target value. Motivated by this remark, in this paper, we suggest an efficient one-class classifier using hyper-rectangles (H-RTGLs) that can be produced from intervals including observations. Specifically, we generate intervals for each feature and integrate them. For generating intervals, we consider two approaches : (i) interval merging and (ii) clustering. We evaluate the performance of the suggested methods by computing classification accuracy using area under the roc curve and compare them with other one-class classification algorithms using four datasets from UCI repository. Since H-RTGLs constructed for a given data set enable classification factors to be visible, we can discern which features effect on the classification result and extract patterns that a data set originally has.

One-Class Document Classification using Pseudo Negative Examples (One-class 문서 분류를 위한 가상 부정 예제의 사용)

  • Song Ho-Jin;Kang In-Su;Na Seung-Hoon;Lee Jong-Hyeok
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2005.07b
    • /
    • pp.469-471
    • /
    • 2005
  • 문서 분류에서의 one class classification 문제는 오직 하나의 범주를 생성하고 새로운 문서가 주어졌을 때 미리 만들어진 하나의 범주에 속하는가를 판별하는 문제이다. 기존의 여러 범주로 이루어진 분류 문제를 해결할 때와는 달리 one class classification에서는 학습 시에 이미 정해진 하나의 범주와 관련이 있는 문서들만을 사용하여 학습을 수행하기 때문에 범주의 경계를 정하는 것이 매우 어려운 작업이며 또한 분류기의 성능에 있어서도 매우 중요한 요소로 작용하게 된다. 본 논문에서는 기존의 연구에서 one class classification 문제를 해결할 때 관심의 대상이 되는 예제의 일부를 부정 예제로 간주하여 one class문제를 two class문제로 변경시켜 학습을 수행했던 것에서 더 나아가 추가적으로 새로운 가상 부정 예제를 설정하여 학습을 수행하고, SVM을 통하여 범주화 성능을 확인해 보기로 한다.

  • PDF

One-class Classification based Fault Classification for Semiconductor Process Cyclic Signal (단일 클래스 분류기법을 이용한 반도체 공정 주기 신호의 이상분류)

  • Cho, Min-Young;Baek, Jun-Geol
    • IE interfaces
    • /
    • v.25 no.2
    • /
    • pp.170-177
    • /
    • 2012
  • Process control is essential to operate the semiconductor process efficiently. This paper consider fault classification of semiconductor based cyclic signal for process control. In general, process signal usually take the different pattern depending on some different cause of fault. If faults can be classified by cause of faults, it could improve the process control through a definite and rapid diagnosis. One of the most important thing is a finding definite diagnosis in fault classification, even-though it is classified several times. This paper proposes the method that one-class classifier classify fault causes as each classes. Hotelling T2 chart, kNNDD(k-Nearest Neighbor Data Description), Distance based Novelty Detection are used to perform the one-class classifier. PCA(Principal Component Analysis) is also used to reduce the data dimension because the length of process signal is too long generally. In experiment, it generates the data based real signal patterns from semiconductor process. The objective of this experiment is to compare between the proposed method and SVM(Support Vector Machine). Most of the experiments' results show that proposed method using Distance based Novelty Detection has a good performance in classification and diagnosis problems.

Fuzzy SVM for Multi-Class Classification

  • Na, Eun-Young;Hong, Dug-Hun;Hwang, Chang-Ha
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 2003.10a
    • /
    • pp.123-123
    • /
    • 2003
  • More elaborated methods allowing the usage of binary classifiers for the resolution of multi-class classification problems are briefly presented. This way of using FSVC to learn a K-class classification problem consists in choosing the maximum applied to the outputs of K FSVC solving a one-per-class decomposition of the general problem.

  • PDF

Design of Distributed Processing Framework Based on H-RTGL One-class Classifier for Big Data (빅데이터를 위한 H-RTGL 기반 단일 분류기 분산 처리 프레임워크 설계)

  • Kim, Do Gyun;Choi, Jin Young
    • Journal of Korean Society for Quality Management
    • /
    • v.48 no.4
    • /
    • pp.553-566
    • /
    • 2020
  • Purpose: The purpose of this study was to design a framework for generating one-class classification algorithm based on Hyper-Rectangle(H-RTGL) in a distributed environment connected by network. Methods: At first, we devised one-class classifier based on H-RTGL which can be performed by distributed computing nodes considering model and data parallelism. Then, we also designed facilitating components for execution of distributed processing. In the end, we validate both effectiveness and efficiency of the classifier obtained from the proposed framework by a numerical experiment using data set obtained from UCI machine learning repository. Results: We designed distributed processing framework capable of one-class classification based on H-RTGL in distributed environment consisting of physically separated computing nodes. It includes components for implementation of model and data parallelism, which enables distributed generation of classifier. From a numerical experiment, we could observe that there was no significant change of classification performance assessed by statistical test and elapsed time was reduced due to application of distributed processing in dataset with considerable size. Conclusion: Based on such result, we can conclude that application of distributed processing for generating classifier can preserve classification performance and it can improve the efficiency of classification algorithms. In addition, we suggested an idea for future research directions of this paper as well as limitation of our work.

Comparison Study of Multi-class Classification Methods

  • Bae, Wha-Soo;Jeon, Gab-Dong;Seok, Kyung-Ha
    • Communications for Statistical Applications and Methods
    • /
    • v.14 no.2
    • /
    • pp.377-388
    • /
    • 2007
  • As one of multi-class classification methods, ECOC (Error Correcting Output Coding) method is known to have low classification error rate. This paper aims at suggesting effective multi-class classification method (1) by comparing various encoding methods and decoding methods in ECOC method and (2) by comparing ECOC method and direct classification method. Both SVM (Support Vector Machine) and logistic regression model were used as binary classifiers in comparison.

Performance Evaluation of One Class Classification to detect anomalies of NIDS (NIDS의 비정상 행위 탐지를 위한 단일 클래스 분류성능 평가)

  • Seo, Jae-Hyun
    • Journal of the Korea Convergence Society
    • /
    • v.9 no.11
    • /
    • pp.15-21
    • /
    • 2018
  • In this study, we try to detect anomalies on the network intrusion detection system by learning only one class. We use KDD CUP 1999 dataset, an intrusion detection dataset, which is used to evaluate classification performance. One class classification is one of unsupervised learning methods that classifies attack class by learning only normal class. When using unsupervised learning, it difficult to achieve relatively high classification efficiency because it does not use negative instances for learning. However, unsupervised learning has the advantage for classifying unlabeled data. In this study, we use one class classifiers based on support vector machines and density estimation to detect new unknown attacks. The test using the classifier based on density estimation has shown relatively better performance and has a detection rate of about 96% while maintaining a low FPR for the new attacks.

Multi-target Classification Method Based on Adaboost and Radial Basis Function (아이다부스트(Adaboost)와 원형기반함수를 이용한 다중표적 분류 기법)

  • Kim, Jae-Hyup;Jang, Kyung-Hyun;Lee, Jun-Haeng;Moon, Young-Shik
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.47 no.3
    • /
    • pp.22-28
    • /
    • 2010
  • Adaboost is well known for a representative learner as one of the kernel methods. Adaboost which is based on the statistical learning theory shows good generalization performance and has been applied to various pattern recognition problems. However, Adaboost is basically to deal with a two-class classification problem, so we cannot solve directly a multi-class problem with Adaboost. One-Vs-All and Pair-Wise have been applied to solve the multi-class classification problem, which is one of the multi-class problems. The two methods above are ones of the output coding methods, a general approach for solving multi-class problem with multiple binary classifiers, which decomposes a complex multi-class problem into a set of binary problems and then reconstructs the outputs of binary classifiers for each binary problem. However, two methods cannot show good performance. In this paper, we propose the method to solve a multi-target classification problem by using radial basis function of Adaboost weak classifier.

One-Class Classification Model Based on Lexical Information and Syntactic Patterns (어휘 정보와 구문 패턴에 기반한 단일 클래스 분류 모델)

  • Lee, Hyeon-gu;Choi, Maengsik;Kim, Harksoo
    • Journal of KIISE
    • /
    • v.42 no.6
    • /
    • pp.817-822
    • /
    • 2015
  • Relation extraction is an important information extraction technique that can be widely used in areas such as question-answering and knowledge population. Previous studies on relation extraction have been based on supervised machine learning models that need a large amount of training data manually annotated with relation categories. Recently, to reduce the manual annotation efforts for constructing training data, distant supervision methods have been proposed. However, these methods suffer from a drawback: it is difficult to use these methods for collecting negative training data that are necessary for resolving classification problems. To overcome this drawback, we propose a one-class classification model that can be trained without using negative data. The proposed model determines whether an input data item is included in an inner category by using a similarity measure based on lexical information and syntactic patterns in a vector space. In the experiments conducted in this study, the proposed model showed higher performance (an F1-score of 0.6509 and an accuracy of 0.6833) than a representative one-class classification model, one-class SVM(Support Vector Machine).