• Title/Summary/Keyword: classifier ensemble

Search Result 111, Processing Time 0.028 seconds

Relevancy contemplation in medical data analytics and ranking of feature selection algorithms

  • P. Antony Seba;J. V. Bibal Benifa
    • ETRI Journal
    • /
    • v.45 no.3
    • /
    • pp.448-461
    • /
    • 2023
  • This article performs a detailed data scrutiny on a chronic kidney disease (CKD) dataset to select efficient instances and relevant features. Data relevancy is investigated using feature extraction, hybrid outlier detection, and handling of missing values. Data instances that do not influence the target are removed using data envelopment analysis to enable reduction of rows. Column reduction is achieved by ranking the attributes through feature selection methodologies, namely, extra-trees classifier, recursive feature elimination, chi-squared test, analysis of variance, and mutual information. These methodologies are ranked via Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) using weight optimization to identify the optimal features for model building from the CKD dataset to facilitate better prediction while diagnosing the severity of the disease. An efficient hybrid ensemble and novel similarity-based classifiers are built using the pruned dataset, and the results are thereafter compared with random forest, AdaBoost, naive Bayes, k-nearest neighbors, and support vector machines. The hybrid ensemble classifier yields a better prediction accuracy of 98.31% for the features selected by extra tree classifier (ETC), which is ranked as the best by TOPSIS.

Study of Personal Credit Risk Assessment Based on SVM

  • LI, Xin;XIA, Han
    • The Journal of Industrial Distribution & Business
    • /
    • v.13 no.10
    • /
    • pp.1-8
    • /
    • 2022
  • Purpose: Support vector machines (SVMs) ensemble has been proposed to improve classification performance of Credit risk recently. However, currently used fusion strategies do not evaluate the importance degree of the output of individual component SVM classifier when combining the component predictions to the final decision. To deal with this problem, this paper designs a support vector machines (SVMs) ensemble method based on fuzzy integral, which aggregates the outputs of separate component SVMs with importance of each component SVM. Research design, data, and methodology: This paper designs a personal credit risk evaluation index system including 16 indicators and discusses a support vector machines (SVMs) ensemble method based on fuzzy integral for designing a credit risk assessment system to discriminate good creditors from bad ones. This paper randomly selects 1500 sample data of personal loan customers of a commercial bank in China 2015-2020 for simulation experiments. Results: By comparing the experimental result SVMs ensemble with the single SVM, the neural network ensemble, the proposed method outperforms the single SVM, and neural network ensemble in terms of classification accuracy. Conclusions: The results show that the method proposed in this paper has higher classification accuracy than other classification methods, which confirms the feasibility and effectiveness of this method.

A Study on Improving the Performance of Document Classification Using the Context of Terms (용어의 문맥활용을 통한 문헌 자동 분류의 성능 향상에 관한 연구)

  • Song, Sung-Jeon;Chung, Young-Mee
    • Journal of the Korean Society for information Management
    • /
    • v.29 no.2
    • /
    • pp.205-224
    • /
    • 2012
  • One of the limitations of BOW method is that each term is recognized only by its form, failing to represent the term's meaning or thematic background. To overcome the limitation, different profiles for each term were defined by thematic categories depending on contextual characteristics. In this study, a specific term was used as a classification feature based on its meaning or thematic background through the process of comparing the context in those profiles with the occurrences in an actual document. The experiment was conducted in three phases; term weighting, ensemble classifier implementation, and feature selection. The classification performance was enhanced in all the phases with the ensemble classifier showing the highest performance score. Also, the outcome showed that the proposed method was effective in reducing the performance bias caused by the total number of learning documents.

Ensemble of Nested Dichotomies for Activity Recognition Using Accelerometer Data on Smartphone (Ensemble of Nested Dichotomies 기법을 이용한 스마트폰 가속도 센서 데이터 기반의 동작 인지)

  • Ha, Eu Tteum;Kim, Jeongmin;Ryu, Kwang Ryel
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.4
    • /
    • pp.123-132
    • /
    • 2013
  • As the smartphones are equipped with various sensors such as the accelerometer, GPS, gravity sensor, gyros, ambient light sensor, proximity sensor, and so on, there have been many research works on making use of these sensors to create valuable applications. Human activity recognition is one such application that is motivated by various welfare applications such as the support for the elderly, measurement of calorie consumption, analysis of lifestyles, analysis of exercise patterns, and so on. One of the challenges faced when using the smartphone sensors for activity recognition is that the number of sensors used should be minimized to save the battery power. When the number of sensors used are restricted, it is difficult to realize a highly accurate activity recognizer or a classifier because it is hard to distinguish between subtly different activities relying on only limited information. The difficulty gets especially severe when the number of different activity classes to be distinguished is very large. In this paper, we show that a fairly accurate classifier can be built that can distinguish ten different activities by using only a single sensor data, i.e., the smartphone accelerometer data. The approach that we take to dealing with this ten-class problem is to use the ensemble of nested dichotomy (END) method that transforms a multi-class problem into multiple two-class problems. END builds a committee of binary classifiers in a nested fashion using a binary tree. At the root of the binary tree, the set of all the classes are split into two subsets of classes by using a binary classifier. At a child node of the tree, a subset of classes is again split into two smaller subsets by using another binary classifier. Continuing in this way, we can obtain a binary tree where each leaf node contains a single class. This binary tree can be viewed as a nested dichotomy that can make multi-class predictions. Depending on how a set of classes are split into two subsets at each node, the final tree that we obtain can be different. Since there can be some classes that are correlated, a particular tree may perform better than the others. However, we can hardly identify the best tree without deep domain knowledge. The END method copes with this problem by building multiple dichotomy trees randomly during learning, and then combining the predictions made by each tree during classification. The END method is generally known to perform well even when the base learner is unable to model complex decision boundaries As the base classifier at each node of the dichotomy, we have used another ensemble classifier called the random forest. A random forest is built by repeatedly generating a decision tree each time with a different random subset of features using a bootstrap sample. By combining bagging with random feature subset selection, a random forest enjoys the advantage of having more diverse ensemble members than a simple bagging. As an overall result, our ensemble of nested dichotomy can actually be seen as a committee of committees of decision trees that can deal with a multi-class problem with high accuracy. The ten classes of activities that we distinguish in this paper are 'Sitting', 'Standing', 'Walking', 'Running', 'Walking Uphill', 'Walking Downhill', 'Running Uphill', 'Running Downhill', 'Falling', and 'Hobbling'. The features used for classifying these activities include not only the magnitude of acceleration vector at each time point but also the maximum, the minimum, and the standard deviation of vector magnitude within a time window of the last 2 seconds, etc. For experiments to compare the performance of END with those of other methods, the accelerometer data has been collected at every 0.1 second for 2 minutes for each activity from 5 volunteers. Among these 5,900 ($=5{\times}(60{\times}2-2)/0.1$) data collected for each activity (the data for the first 2 seconds are trashed because they do not have time window data), 4,700 have been used for training and the rest for testing. Although 'Walking Uphill' is often confused with some other similar activities, END has been found to classify all of the ten activities with a fairly high accuracy of 98.4%. On the other hand, the accuracies achieved by a decision tree, a k-nearest neighbor, and a one-versus-rest support vector machine have been observed as 97.6%, 96.5%, and 97.6%, respectively.

Design of A Personalized Classifier using Soft Computing Techniques and Its Application to Facial Expression Recognition

  • Kim, Dae-Jin;Zeungnam Bien
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2003.09a
    • /
    • pp.521-524
    • /
    • 2003
  • In this paper, we propose a design process of 'personalized' classification with soft computing techniques. Based on human's thinking way, a construction methodology for personalized classifier is mentioned. Here, two fuzzy similarity measures and ensemble of classifiers are effectively used. As one of the possible applications, facial expression recognition problem is discussed. The numerical result shows that the proposed method is very useful for on-line learning, reusability of previous knowledge and so on.

  • PDF

A Comparative Study of Phishing Websites Classification Based on Classifier Ensembles

  • Tama, Bayu Adhi;Rhee, Kyung-Hyune
    • Journal of Multimedia Information System
    • /
    • v.5 no.2
    • /
    • pp.99-104
    • /
    • 2018
  • Phishing website has become a crucial concern in cyber security applications. It is performed by fraudulently deceiving users with the aim of obtaining their sensitive information such as bank account information, credit card, username, and password. The threat has led to huge losses to online retailers, e-business platform, financial institutions, and to name but a few. One way to build anti-phishing detection mechanism is to construct classification algorithm based on machine learning techniques. The objective of this paper is to compare different classifier ensemble approaches, i.e. random forest, rotation forest, gradient boosted machine, and extreme gradient boosting against single classifiers, i.e. decision tree, classification and regression tree, and credal decision tree in the case of website phishing. Area under ROC curve (AUC) is employed as a performance metric, whilst statistical tests are used as baseline indicator of significance evaluation among classifiers. The paper contributes the existing literature on making a benchmark of classifier ensembles for web phishing detection.

A Detailed Analysis of Classifier Ensembles for Intrusion Detection in Wireless Network

  • Tama, Bayu Adhi;Rhee, Kyung-Hyune
    • Journal of Information Processing Systems
    • /
    • v.13 no.5
    • /
    • pp.1203-1212
    • /
    • 2017
  • Intrusion detection systems (IDSs) are crucial in this overwhelming increase of attacks on the computing infrastructure. It intelligently detects malicious and predicts future attack patterns based on the classification analysis using machine learning and data mining techniques. This paper is devoted to thoroughly evaluate classifier ensembles for IDSs in IEEE 802.11 wireless network. Two ensemble techniques, i.e. voting and stacking are employed to combine the three base classifiers, i.e. decision tree (DT), random forest (RF), and support vector machine (SVM). We use area under ROC curve (AUC) value as a performance metric. Finally, we conduct two statistical significance tests to evaluate the performance differences among classifiers.

Contactless User Identification System using Multi-channel Palm Images Facilitated by Triple Attention U-Net and CNN Classifier Ensemble Models

  • Kim, Inki;Kim, Beomjun;Woo, Sunghee;Gwak, Jeonghwan
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.3
    • /
    • pp.33-43
    • /
    • 2022
  • In this paper, we propose an ensemble model facilitated by multi-channel palm images with attention U-Net models and pretrained convolutional neural networks (CNNs) for establishing a contactless palm-based user identification system using conventional inexpensive camera sensors. Attention U-Net models are used to extract the areas of interest including hands (i.e., with fingers), palms (i.e., without fingers) and palm lines, which are combined to generate three channels being ped into the ensemble classifier. Then, the proposed palm information-based user identification system predicts the class using the classifier ensemble with three outperforming pre-trained CNN models. The proposed model demonstrates that the proposed model could achieve the classification accuracy, precision, recall, F1-score of 98.60%, 98.61%, 98.61%, 98.61% respectively, which indicate that the proposed model is effective even though we are using very cheap and inexpensive image sensors. We believe that in this COVID-19 pandemic circumstances, the proposed palm-based contactless user identification system can be an alternative, with high safety and reliability, compared with currently overwhelming contact-based systems.

Face Recognition based on Hybrid Classifiers with Virtual Samples (가상 데이터와 융합 분류기에 기반한 얼굴인식)

  • 류연식;오세영
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.40 no.1
    • /
    • pp.19-29
    • /
    • 2003
  • This paper presents a novel hybrid classifier for face recognition with artificially generated virtual training samples. We utilize both the nearest neighbor approach in feature angle space and a connectionist model to obtain a synergy effect by combining the results of two heterogeneous classifiers. First, a classifier called the nearest feature angle (NFA), based on angular information, finds the most similar feature to the query from a given training set. Second, a classifier has been developed based on the recall of stored frontal projection of the query feature. It uses a frontal recall network (FRN) that finds the most similar frontal one among the stored frontal feature set. For FRN, we used an ensemble neural network consisting of multiple multiplayer perceptrons (MLPs), each of which is trained independently to enhance generalization capability. Further, both classifiers used the virtual training set generated adaptively, according to the spatial distribution of each person's training samples. Finally, the results of the two classifiers are combined to comprise the best matching class, and a corresponding similarit measure is used to make the final decision. The proposed classifier achieved an average classification rate of 96.33% against a large group of different test sets of images, and its average error rate is 61.5% that of the nearest feature line (NFL) method, and achieves a more robust classification performance.

Developing an Ensemble Classifier for Bankruptcy Prediction (부도 예측을 위한 앙상블 분류기 개발)

  • Min, Sung-Hwan
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.17 no.7
    • /
    • pp.139-148
    • /
    • 2012
  • An ensemble of classifiers is to employ a set of individually trained classifiers and combine their predictions. It has been found that in most cases the ensembles produce more accurate predictions than the base classifiers. Combining outputs from multiple classifiers, known as ensemble learning, is one of the standard and most important techniques for improving classification accuracy in machine learning. An ensemble of classifiers is efficient only if the individual classifiers make decisions as diverse as possible. Bagging is the most popular method of ensemble learning to generate a diverse set of classifiers. Diversity in bagging is obtained by using different training sets. The different training data subsets are randomly drawn with replacement from the entire training dataset. The random subspace method is an ensemble construction technique using different attribute subsets. In the random subspace, the training dataset is also modified as in bagging. However, this modification is performed in the feature space. Bagging and random subspace are quite well known and popular ensemble algorithms. However, few studies have dealt with the integration of bagging and random subspace using SVM Classifiers, though there is a great potential for useful applications in this area. The focus of this paper is to propose methods for improving SVM performance using hybrid ensemble strategy for bankruptcy prediction. This paper applies the proposed ensemble model to the bankruptcy prediction problem using a real data set from Korean companies.