• Title/Summary/Keyword: Decision Tree Classifiers

Search Result 62, Processing Time 0.032 seconds

MEC; A new decision tree generator based on multi-base entropy (다중 엔트로피를 기반으로 하는 새로운 결정 트리 생성기 MEC)

  • 전병환;김재희
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.22 no.3
    • /
    • pp.423-431
    • /
    • 1997
  • A new decision tree generator MEC is proposed in this paper, which uses the difference of multi-base entropy as a consistent criterion for discretization and selection of attributes. To evaluate the performance of the proposed generator, it is compared to other generators which use criteria based on entropy and adopt different discretization styles. As an experimental result, it is shown that the proposed generator produces the most efficient classifiers, which have the least number of leaves at the same error rate, regardless of whether attribute values constituting the training set are discrete or continuous.

  • PDF

Research on improving correctness of cardiac disorder data classifier by applying Best-First decision tree method (Best-First decision tree 기법을 적용한 심전도 데이터 분류기의 정확도 향상에 관한 연구)

  • Lee, Hyun-Ju;Shin, Dong-Kyoo;Park, Hee-Won;Kim, Soo-Han;Shin, Dong-Il
    • Journal of Internet Computing and Services
    • /
    • v.12 no.6
    • /
    • pp.63-71
    • /
    • 2011
  • Cardiac disorder data are generally tested using the classifier and QRS-Complex and R-R interval which is used in this experiment are often extracted by ECG(Electrocardiogram) signals. The experimentation of ECG data with classifier is generally performed with SVM(Support Vector Machine) and MLP(Multilayer Perceptron) classifier, but this study experimented with Best-First Decision Tree(B-F Tree) derived from the Dicision Tree among Random Forest classifier algorithms to improve accuracy. To compare and analyze accuracy, experimentation of SVM, MLP, RBF(Radial Basic Function) Network and Decision Tree classifiers are performed and also compared the result of announced papers carried out under same interval and data. Comparing the accuracy of Random Forest classifier with above four ones, Random Forest is the best in accuracy. As though R-R interval was extracted using Band-pass filter in pre-processing of this experiment, in future, more filter study is needed to extract accurate interval.

Prediction of Academic Performance of College Students with Bipolar Disorder using different Deep learning and Machine learning algorithms

  • Peerbasha, S.;Surputheen, M. Mohamed
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.7
    • /
    • pp.350-358
    • /
    • 2021
  • In modern years, the performance of the students is analysed with lot of difficulties, which is a very important problem in all the academic institutions. The main idea of this paper is to analyze and evaluate the academic performance of the college students with bipolar disorder by applying data mining classification algorithms using Jupiter Notebook, python tool. This tool has been generally used as a decision-making tool in terms of academic performance of the students. The various classifiers could be logistic regression, random forest classifier gini, random forest classifier entropy, decision tree classifier, K-Neighbours classifier, Ada Boost classifier, Extra Tree Classifier, GaussianNB, BernoulliNB are used. The results of such classification model deals with 13 measures like Accuracy, Precision, Recall, F1 Measure, Sensitivity, Specificity, R Squared, Mean Absolute Error, Mean Squared Error, Root Mean Squared Error, TPR, TNR, FPR and FNR. Therefore, conclusion could be reached that the Decision Tree Classifier is better than that of different algorithms.

A Comparative Study of Image Recognition by Neural Network Classifier and Linear Tree Classifier (신경망 분류기와 선형트리 분류기에 의한 영상인식의 비교연구)

  • Young Tae Park
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.31B no.5
    • /
    • pp.141-148
    • /
    • 1994
  • Both the neural network classifier utilizing multi-layer perceptron and the linear tree classifier composed of hierarchically structured linear discriminating functions can form arbitrarily complex decision boundaries in the feature space and have very similar decision making processes. In this paper, a new method for automatically choosing the number of neurons in the hidden layers and for initalzing the connection weights between the layres and its supporting theory are presented by mapping the sequential structure of the linear tree classifier to the parallel structure of the neural networks having one or two hidden layers. Experimental results on the real data obtained from the military ship images show that this method is effective, and that three exists no siginificant difference in the classification acuracy of both classifiers.

  • PDF

A New Incremental Learning Algorithm with Probabilistic Weights Using Extended Data Expression

  • Yang, Kwangmo;Kolesnikova, Anastasiya;Lee, Won Don
    • Journal of information and communication convergence engineering
    • /
    • v.11 no.4
    • /
    • pp.258-267
    • /
    • 2013
  • New incremental learning algorithm using extended data expression, based on probabilistic compounding, is presented in this paper. Incremental learning algorithm generates an ensemble of weak classifiers and compounds these classifiers to a strong classifier, using a weighted majority voting, to improve classification performance. We introduce new probabilistic weighted majority voting founded on extended data expression. In this case class distribution of the output is used to compound classifiers. UChoo, a decision tree classifier for extended data expression, is used as a base classifier, as it allows obtaining extended output expression that defines class distribution of the output. Extended data expression and UChoo classifier are powerful techniques in classification and rule refinement problem. In this paper extended data expression is applied to obtain probabilistic results with probabilistic majority voting. To show performance advantages, new algorithm is compared with Learn++, an incremental ensemble-based algorithm.

Data mining approach to predicting user's past location

  • Lee, Eun Min;Lee, Kun Chang
    • Journal of the Korea Society of Computer and Information
    • /
    • v.22 no.11
    • /
    • pp.97-104
    • /
    • 2017
  • Location prediction has been successfully utilized to provide high quality of location-based services to customers in many applications. In its usual form, the conventional type of location prediction is to predict future locations based on user's past movement history. However, as location prediction needs are expanded into much complicated cases, it becomes necessary quite frequently to make inference on the locations that target user visited in the past. Typical cases include the identification of locations that infectious disease carriers may have visited before, and crime suspects may have dropped by on a certain day at a specific time-band. Therefore, primary goal of this study is to predict locations that users visited in the past. Information used for this purpose include user's demographic information and movement histories. Data mining classifiers such as Bayesian network, neural network, support vector machine, decision tree were adopted to analyze 6868 contextual dataset and compare classifiers' performance. Results show that general Bayesian network is the most robust classifier.

Performance Analysis of Opinion Mining using Word2vec (Word2vec을 이용한 오피니언 마이닝 성과분석 연구)

  • Eo, Kyun Sun;Lee, Kun Chang
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2018.05a
    • /
    • pp.7-8
    • /
    • 2018
  • This study proposes an analysis of the Word2vec-based machine learning classifiers for the sake of opinion mining tasks. As a bench-marking method, BOW (Bag-of-Words) was adopted. On the basis of utilizing the Word2vec and BOW as feature extraction methods, we applied Laptop and Restaurant dataset to LR, DT, SVM, RF classifiers. The results showed that the Word2vec feature extraction yields more improved performance.

  • PDF

An enhanced feature selection filter for classification of microarray cancer data

  • Mazumder, Dilwar Hussain;Veilumuthu, Ramachandran
    • ETRI Journal
    • /
    • v.41 no.3
    • /
    • pp.358-370
    • /
    • 2019
  • The main aim of this study is to select the optimal set of genes from microarray cancer datasets that contribute to the prediction of specific cancer types. This study proposes the enhancement of the feature selection filter algorithm based on Joe's normalized mutual information and its use for gene selection. The proposed algorithm is implemented and evaluated on seven benchmark microarray cancer datasets, namely, central nervous system, leukemia (binary), leukemia (3 class), leukemia (4 class), lymphoma, mixed lineage leukemia, and small round blue cell tumor, using five well-known classifiers, including the naive Bayes, radial basis function network, instance-based classifier, decision-based table, and decision tree. An average increase in the prediction accuracy of 5.1% is observed on all seven datasets averaged over all five classifiers. The average reduction in training time is 2.86 seconds. The performance of the proposed method is also compared with those of three other popular mutual information-based feature selection filters, namely, information gain, gain ratio, and symmetric uncertainty. The results are impressive when all five classifiers are used on all the datasets.

Optimal Selection of Classifier Ensemble Using Genetic Algorithms (유전자 알고리즘을 이용한 분류자 앙상블의 최적 선택)

  • Kim, Myung-Jong
    • Journal of Intelligence and Information Systems
    • /
    • v.16 no.4
    • /
    • pp.99-112
    • /
    • 2010
  • Ensemble learning is a method for improving the performance of classification and prediction algorithms. It is a method for finding a highly accurateclassifier on the training set by constructing and combining an ensemble of weak classifiers, each of which needs only to be moderately accurate on the training set. Ensemble learning has received considerable attention from machine learning and artificial intelligence fields because of its remarkable performance improvement and flexible integration with the traditional learning algorithms such as decision tree (DT), neural networks (NN), and SVM, etc. In those researches, all of DT ensemble studies have demonstrated impressive improvements in the generalization behavior of DT, while NN and SVM ensemble studies have not shown remarkable performance as shown in DT ensembles. Recently, several works have reported that the performance of ensemble can be degraded where multiple classifiers of an ensemble are highly correlated with, and thereby result in multicollinearity problem, which leads to performance degradation of the ensemble. They have also proposed the differentiated learning strategies to cope with performance degradation problem. Hansen and Salamon (1990) insisted that it is necessary and sufficient for the performance enhancement of an ensemble that the ensemble should contain diverse classifiers. Breiman (1996) explored that ensemble learning can increase the performance of unstable learning algorithms, but does not show remarkable performance improvement on stable learning algorithms. Unstable learning algorithms such as decision tree learners are sensitive to the change of the training data, and thus small changes in the training data can yield large changes in the generated classifiers. Therefore, ensemble with unstable learning algorithms can guarantee some diversity among the classifiers. To the contrary, stable learning algorithms such as NN and SVM generate similar classifiers in spite of small changes of the training data, and thus the correlation among the resulting classifiers is very high. This high correlation results in multicollinearity problem, which leads to performance degradation of the ensemble. Kim,s work (2009) showedthe performance comparison in bankruptcy prediction on Korea firms using tradition prediction algorithms such as NN, DT, and SVM. It reports that stable learning algorithms such as NN and SVM have higher predictability than the unstable DT. Meanwhile, with respect to their ensemble learning, DT ensemble shows the more improved performance than NN and SVM ensemble. Further analysis with variance inflation factor (VIF) analysis empirically proves that performance degradation of ensemble is due to multicollinearity problem. It also proposes that optimization of ensemble is needed to cope with such a problem. This paper proposes a hybrid system for coverage optimization of NN ensemble (CO-NN) in order to improve the performance of NN ensemble. Coverage optimization is a technique of choosing a sub-ensemble from an original ensemble to guarantee the diversity of classifiers in coverage optimization process. CO-NN uses GA which has been widely used for various optimization problems to deal with the coverage optimization problem. The GA chromosomes for the coverage optimization are encoded into binary strings, each bit of which indicates individual classifier. The fitness function is defined as maximization of error reduction and a constraint of variance inflation factor (VIF), which is one of the generally used methods to measure multicollinearity, is added to insure the diversity of classifiers by removing high correlation among the classifiers. We use Microsoft Excel and the GAs software package called Evolver. Experiments on company failure prediction have shown that CO-NN is effectively applied in the stable performance enhancement of NNensembles through the choice of classifiers by considering the correlations of the ensemble. The classifiers which have the potential multicollinearity problem are removed by the coverage optimization process of CO-NN and thereby CO-NN has shown higher performance than a single NN classifier and NN ensemble at 1% significance level, and DT ensemble at 5% significance level. However, there remain further research issues. First, decision optimization process to find optimal combination function should be considered in further research. Secondly, various learning strategies to deal with data noise should be introduced in more advanced further researches in the future.

Shock Graph for Representation and Modeling of Posture

  • Tahir, Nooritawati Md.;Hussain, Aini;Abdul Samad, Salina;Husain, Hafizah
    • ETRI Journal
    • /
    • v.29 no.4
    • /
    • pp.507-515
    • /
    • 2007
  • Skeleton transform of which the medial axis transform is the most popular has been proposed as a useful shape abstraction tool for the representation and modeling of human posture. This paper explains this proposition with a description of the areas in which skeletons could serve to enable the representation of shapes. We present algorithms for two-dimensional posture modeling using the developed simplified shock graph (SSG). The efficacy of SSG extracted feature vectors as shape descriptors are also evaluated using three different classifiers, namely, decision tree, multilayer perceptron, and support vector machine. The paper concludes with a discussion of the issues involved in using shock graphs to model and classify human postures.

  • PDF