• Title/Summary/Keyword: one class classifiers

Search Result 44, Processing Time 0.021 seconds

A Genetic Algorithm-based Classifier Ensemble Optimization for Activity Recognition in Smart Homes

  • Fatima, Iram;Fahim, Muhammad;Lee, Young-Koo;Lee, Sungyoung
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.7 no.11
    • /
    • pp.2853-2873
    • /
    • 2013
  • Over the last few years, one of the most common purposes of smart homes is to provide human centric services in the domain of u-healthcare by analyzing inhabitants' daily living. Currently, the major challenges in activity recognition include the reliability of prediction of each classifier as they differ according to smart homes characteristics. Smart homes indicate variation in terms of performed activities, deployed sensors, environment settings, and inhabitants' characteristics. It is not possible that one classifier always performs better than all the other classifiers for every possible situation. This observation has motivated towards combining multiple classifiers to take advantage of their complementary performance for high accuracy. Therefore, in this paper, a method for activity recognition is proposed by optimizing the output of multiple classifiers with Genetic Algorithm (GA). Our proposed method combines the measurement level output of different classifiers for each activity class to make up the ensemble. For the evaluation of the proposed method, experiments are performed on three real datasets from CASAS smart home. The results show that our method systematically outperforms single classifier and traditional multiclass models. The significant improvement is achieved from 0.82 to 0.90 in the F-measures of recognized activities as compare to existing methods.

Prediction Model for Gastric Cancer via Class Balancing Techniques

  • Danish, Jamil ;Sellappan, Palaniappan;Sanjoy Kumar, Debnath;Muhammad, Naseem;Susama, Bagchi ;Asiah, Lokman
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.1
    • /
    • pp.53-63
    • /
    • 2023
  • Many researchers are trying hard to minimize the incidence of cancers, mainly Gastric Cancer (GC). For GC, the five-year survival rate is generally 5-25%, but for Early Gastric Cancer (EGC), it is almost 90%. Predicting the onset of stomach cancer based on risk factors will allow for an early diagnosis and more effective treatment. Although there are several models for predicting stomach cancer, most of these models are based on unbalanced datasets, which favours the majority class. However, it is imperative to correctly identify cancer patients who are in the minority class. This research aims to apply three class-balancing approaches to the NHS dataset before developing supervised learning strategies: Oversampling (Synthetic Minority Oversampling Technique or SMOTE), Undersampling (SpreadSubsample), and Hybrid System (SMOTE + SpreadSubsample). This study uses Naive Bayes, Bayesian Network, Random Forest, and Decision Tree (C4.5) methods. We measured these classifiers' efficacy using their Receiver Operating Characteristics (ROC) curves, sensitivity, and specificity. The validation data was used to test several ways of balancing the classifiers. The final prediction model was built on the one that did the best overall.

A Comparative Study of Word Embedding Models for Arabic Text Processing

  • Assiri, Fatmah;Alghamdi, Nuha
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.8
    • /
    • pp.399-403
    • /
    • 2022
  • Natural texts are analyzed to obtain their intended meaning to be classified depending on the problem under study. One way to represent words is by generating vectors of real values to encode the meaning; this is called word embedding. Similarities between word representations are measured to identify text class. Word embeddings can be created using word2vec technique. However, recently fastText was implemented to provide better results when it is used with classifiers. In this paper, we will study the performance of well-known classifiers when using both techniques for word embedding with Arabic dataset. We applied them to real data collected from Wikipedia, and we found that both word2vec and fastText had similar accuracy with all used classifiers.

An Efficient One Class Classifier Using Gaussian-based Hyper-Rectangle Generation (가우시안 기반 Hyper-Rectangle 생성을 이용한 효율적 단일 분류기)

  • Kim, Do Gyun;Choi, Jin Young;Ko, Jeonghan
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.41 no.2
    • /
    • pp.56-64
    • /
    • 2018
  • In recent years, imbalanced data is one of the most important and frequent issue for quality control in industrial field. As an example, defect rate has been drastically reduced thanks to highly developed technology and quality management, so that only few defective data can be obtained from production process. Therefore, quality classification should be performed under the condition that one class (defective dataset) is even smaller than the other class (good dataset). However, traditional multi-class classification methods are not appropriate to deal with such an imbalanced dataset, since they classify data from the difference between one class and the others that can hardly be found in imbalanced datasets. Thus, one-class classification that thoroughly learns patterns of target class is more suitable for imbalanced dataset since it only focuses on data in a target class. So far, several one-class classification methods such as one-class support vector machine, neural network and decision tree there have been suggested. One-class support vector machine and neural network can guarantee good classification rate, and decision tree can provide a set of rules that can be clearly interpreted. However, the classifiers obtained from the former two methods consist of complex mathematical functions and cannot be easily understood by users. In case of decision tree, the criterion for rule generation is ambiguous. Therefore, as an alternative, a new one-class classifier using hyper-rectangles was proposed, which performs precise classification compared to other methods and generates rules clearly understood by users as well. In this paper, we suggest an approach for improving the limitations of those previous one-class classification algorithms. Specifically, the suggested approach produces more improved one-class classifier using hyper-rectangles generated by using Gaussian function. The performance of the suggested algorithm is verified by a numerical experiment, which uses several datasets in UCI machine learning repository.

User Authentication Based on Keystroke Dynamics of Free Text and One-Class Classifiers (자유로운 문자열의 키스트로크 다이나믹스와 일범주 분류기를 활용한 사용자 인증)

  • Seo, Dongmin;Kang, Pilsung
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.42 no.4
    • /
    • pp.280-289
    • /
    • 2016
  • User authentication is an important issue on computer network systems. Most of the current computer network systems use the ID-password string match as the primary user authentication method. However, in password-based authentication, whoever acquires the password of a valid user can access the system without any restrictions. In this paper, we present a keystroke dynamics-based user authentication to resolve limitations of the password-based authentication. Since most previous studies employed a fixed-length text as an input data, we aims at enhancing the authentication performance by combining four different variable creation methods from a variable-length free text as an input data. As authentication algorithms, four one-class classifiers are employed. We verify the proposed approach through an experiment based on actual keystroke data collected from 100 participants who provided more than 17,000 keystrokes for both Korean and English. The experimental results show that our proposed method significantly improve the authentication performance compared to the existing approaches.

Multiple Classifier Fusion Method based on k-Nearest Templates (k-최근접 템플릿기반 다중 분류기 결합방법)

  • Min, Jun-Ki;Cho, Sung-Bae
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.14 no.4
    • /
    • pp.451-455
    • /
    • 2008
  • In this paper, the k-nearest templates method is proposed to combine multiple classifiers effectively. First, the method decomposes training samples of each class into several subclasses based on the outputs of classifiers to represent a class as multiple models, and estimates a localized template by averaging the outputs for each subclass. The distances between a test sample and templates are then calculated. Lastly, the test sample is assigned to the class that is most frequently represented among the k most similar templates. In this paper, C-means clustering algorithm is used as the decomposition method, and k is automatically chosen according to the intra-class compactness and inter-class separation of a given data set. Since the proposed method uses multiple models per class and refers to k models rather than matches with the most similar one, it could obtain stable and high accuracy. In this paper, experiments on UCI and ELENA database showed that the proposed method performed better than conventional fusion methods.

Night-time Vehicle Detection Based On Multi-class SVM (다중-클래스 SVM 기반 야간 차량 검출)

  • Lim, Hyojin;Lee, Heeyong;Park, Ju H.;Jung, Ho-Youl
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.10 no.5
    • /
    • pp.325-333
    • /
    • 2015
  • Vision based night-time vehicle detection has been an emerging research field in various advanced driver assistance systems(ADAS) and automotive vehicle as well as automatic head-lamp control. In this paper, we propose night-time vehicle detection method based on multi-class support vector machine(SVM) that consists of thresholding, labeling, feature extraction, and multi-class SVM. Vehicle light candidate blobs are extracted by local mean based thresholding following by labeling process. Seven geometric and stochastic features are extracted from each candidate through the feature extraction step. Each candidate blob is classified into vehicle light or not by multi-class SVM. Four different multi-class SVM including one-against-all(OAA), one-against-one(OAO), top-down tree structured and bottom-up tree structured SVM classifiers are implemented and evaluated in terms of vehicle detection performances. Through the simulations tested on road video sequences, we prove that top-down tree structured and bottom-up tree structured SVM have relatively better performances than the others.

Automatic Categorization of Real World FAQs Using Hierarchical Document Clustering (계층적 문서 클러스터링을 이용한 실세계 질의 메일의 자동 분류)

  • 류중원;조성배
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2001.05a
    • /
    • pp.187-190
    • /
    • 2001
  • Due to the recent proliferation of the internet, it is broadly granted that the necessity of the automatic document categorization has been on the rise. Since it is a heavy time-consuming work and takes too much manpower to process and classify manually, we need a system that categorizes them automatically as their contents. In this paper, we propose the automatic E-mail response system that is based on 2 hierarchical document clustering methods. One is to get the final result from the classifier trained seperatly within each class, after clustering the whole documents into 3 groups so that the first classifier categorize the input documents as the corresponding group. The other method is that the system classifies the most distinct classes first as their similarity, successively. Neural networks have been adopted as classifiers, we have used dendrograms to show the hierarchical aspect of similarities between classes. The comparison among the performances of hierarchical and non-hierarchical classifiers tells us clustering methods have provided the classification efficiency.

  • PDF

Performance Evaluation of One Class Classification to detect anomalies of NIDS (NIDS의 비정상 행위 탐지를 위한 단일 클래스 분류성능 평가)

  • Seo, Jae-Hyun
    • Journal of the Korea Convergence Society
    • /
    • v.9 no.11
    • /
    • pp.15-21
    • /
    • 2018
  • In this study, we try to detect anomalies on the network intrusion detection system by learning only one class. We use KDD CUP 1999 dataset, an intrusion detection dataset, which is used to evaluate classification performance. One class classification is one of unsupervised learning methods that classifies attack class by learning only normal class. When using unsupervised learning, it difficult to achieve relatively high classification efficiency because it does not use negative instances for learning. However, unsupervised learning has the advantage for classifying unlabeled data. In this study, we use one class classifiers based on support vector machines and density estimation to detect new unknown attacks. The test using the classifier based on density estimation has shown relatively better performance and has a detection rate of about 96% while maintaining a low FPR for the new attacks.

A Novel Feature Selection Method in the Categorization of Imbalanced Textual Data

  • Pouramini, Jafar;Minaei-Bidgoli, Behrouze;Esmaeili, Mahdi
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.8
    • /
    • pp.3725-3748
    • /
    • 2018
  • Text data distribution is often imbalanced. Imbalanced data is one of the challenges in text classification, as it leads to the loss of performance of classifiers. Many studies have been conducted so far in this regard. The proposed solutions are divided into several general categories, include sampling-based and algorithm-based methods. In recent studies, feature selection has also been considered as one of the solutions for the imbalance problem. In this paper, a novel one-sided feature selection known as probabilistic feature selection (PFS) was presented for imbalanced text classification. The PFS is a probabilistic method that is calculated using feature distribution. Compared to the similar methods, the PFS has more parameters. In order to evaluate the performance of the proposed method, the feature selection methods including Gini, MI, FAST and DFS were implemented. To assess the proposed method, the decision tree classifications such as C4.5 and Naive Bayes were used. The results of tests on Reuters-21875 and WebKB figures per F-measure suggested that the proposed feature selection has significantly improved the performance of the classifiers.