• Title/Summary/Keyword: Classification algorithms

Search Result 1,168, Processing Time 0.036 seconds

A Decision Tree Algorithm using Genetic Programming

  • Park, Chongsun;Ko, Young Kyong
    • Communications for Statistical Applications and Methods
    • /
    • v.10 no.3
    • /
    • pp.845-857
    • /
    • 2003
  • We explore the use of genetic programming to evolve decision trees directly for classification problems with both discrete and continuous predictors. We demonstrate that the derived hypotheses of standard algorithms can substantially deviated from the optimum. This deviation is partly due to their top-down style procedures. The performance of the system is measured on a set of real and simulated data sets and compared with the performance of well-known algorithms like CHAID, CART, C5.0, and QUEST. Proposed algorithm seems to be effective in handling problems caused by top-down style procedures of existing algorithms.

The empirical comparison of efficiency in classification algorithms (분류 알고리즘의 효율성에 대한 경험적 비교연구)

  • 전홍석;이주영
    • Journal of the Korea Safety Management & Science
    • /
    • v.2 no.3
    • /
    • pp.171-184
    • /
    • 2000
  • We may be given a set of observations with the classes or clusters. The aim of this article is to provide an up-to-date review of different approaches to classification, compare their performance on a wide range of challenging data-sets. In this paper, machine learning algorithm classifiers based on CART, C4.5, CAL5, FACT, QUEST and statistical discriminant analysis are compared on various datasets in classification error rate and algorithms.

  • PDF

Design and Implementation of Intelligent Agent System for Pattern Classification

  • Kim, Dae-su;Park, Ji-hoon;Chang, Jae-khun;Na, Guen-sik
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.11 no.7
    • /
    • pp.598-602
    • /
    • 2001
  • Recently, due to the widely use of personal computers and internet, many computer users requested intelligent system that can cope with various types of requirements and user-friendly interfaces. Based on this background, researches on the intelligent agent are now activating in various fields. In this paper, we modeled, designed and implemented an intelligent agent system for pattern classification by adopting intelligent agent concepts. We also investigated the pattern classification method by utilizing some pattern classification algorithms for the common data. As a result, we identified that 300 3-dimensional data are applied to three pattern classification algorithms and returned correct results. Our system showed a distinguished user-friendly interface feature by adopting various agents including graphic agent.

  • PDF

Design of One-Class Classifier Using Hyper-Rectangles (Hyper-Rectangles를 이용한 단일 분류기 설계)

  • Jeong, In Kyo;Choi, Jin Young
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.41 no.5
    • /
    • pp.439-446
    • /
    • 2015
  • Recently, the importance of one-class classification problem is more increasing. However, most of existing algorithms have the limitation on providing the information that effects on the prediction of the target value. Motivated by this remark, in this paper, we suggest an efficient one-class classifier using hyper-rectangles (H-RTGLs) that can be produced from intervals including observations. Specifically, we generate intervals for each feature and integrate them. For generating intervals, we consider two approaches : (i) interval merging and (ii) clustering. We evaluate the performance of the suggested methods by computing classification accuracy using area under the roc curve and compare them with other one-class classification algorithms using four datasets from UCI repository. Since H-RTGLs constructed for a given data set enable classification factors to be visible, we can discern which features effect on the classification result and extract patterns that a data set originally has.

Improved Decision Tree Classification (IDT) Algorithm For Social Media Data

  • Anu Sharma;M.K Sharma;R.K Dwivedi
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.6
    • /
    • pp.83-88
    • /
    • 2024
  • In this paper we used classification algorithms on social networking. We are proposing, a new classification algorithm called the improved Decision Tree (IDT). Our model provides better classification accuracy than the existing systems for classifying the social network data. Here we examined the performance of some familiar classification algorithms regarding their accuracy with our proposed algorithm. We used Support Vector Machines, Naïve Bayes, k-Nearest Neighbors, decision tree in our research and performed analyses on social media dataset. Matlab is used for performing experiments. The result shows that the proposed algorithm achieves the best results with an accuracy of 84.66%.

Analyzing performance of time series classification using STFT and time series imaging algorithms

  • Sung-Kyu Hong;Sang-Chul Kim
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.4
    • /
    • pp.1-11
    • /
    • 2023
  • In this paper, instead of using recurrent neural network, we compare a classification performance of time series imaging algorithms using convolution neural network. There are traditional algorithms that imaging time series data (e.g. GAF(Gramian Angular Field), MTF(Markov Transition Field), RP(Recurrence Plot)) in TSC(Time Series Classification) community. Furthermore, we compare STFT(Short Time Fourier Transform) algorithm that can acquire spectrogram that visualize feature of voice data. We experiment CNN's performance by adjusting hyper parameters of imaging algorithms. When evaluate with GunPoint dataset in UCR archive, STFT(Short-Time Fourier transform) has higher accuracy than other algorithms. GAF has 98~99% accuracy either, but there is a disadvantage that size of image is massive.

The Effect of Meta-Features of Multiclass Datasets on the Performance of Classification Algorithms (다중 클래스 데이터셋의 메타특징이 판별 알고리즘의 성능에 미치는 영향 연구)

  • Kim, Jeonghun;Kim, Min Yong;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.23-45
    • /
    • 2020
  • Big data is creating in a wide variety of fields such as medical care, manufacturing, logistics, sales site, SNS, and the dataset characteristics are also diverse. In order to secure the competitiveness of companies, it is necessary to improve decision-making capacity using a classification algorithm. However, most of them do not have sufficient knowledge on what kind of classification algorithm is appropriate for a specific problem area. In other words, determining which classification algorithm is appropriate depending on the characteristics of the dataset was has been a task that required expertise and effort. This is because the relationship between the characteristics of datasets (called meta-features) and the performance of classification algorithms has not been fully understood. Moreover, there has been little research on meta-features reflecting the characteristics of multi-class. Therefore, the purpose of this study is to empirically analyze whether meta-features of multi-class datasets have a significant effect on the performance of classification algorithms. In this study, meta-features of multi-class datasets were identified into two factors, (the data structure and the data complexity,) and seven representative meta-features were selected. Among those, we included the Herfindahl-Hirschman Index (HHI), originally a market concentration measurement index, in the meta-features to replace IR(Imbalanced Ratio). Also, we developed a new index called Reverse ReLU Silhouette Score into the meta-feature set. Among the UCI Machine Learning Repository data, six representative datasets (Balance Scale, PageBlocks, Car Evaluation, User Knowledge-Modeling, Wine Quality(red), Contraceptive Method Choice) were selected. The class of each dataset was classified by using the classification algorithms (KNN, Logistic Regression, Nave Bayes, Random Forest, and SVM) selected in the study. For each dataset, we applied 10-fold cross validation method. 10% to 100% oversampling method is applied for each fold and meta-features of the dataset is measured. The meta-features selected are HHI, Number of Classes, Number of Features, Entropy, Reverse ReLU Silhouette Score, Nonlinearity of Linear Classifier, Hub Score. F1-score was selected as the dependent variable. As a result, the results of this study showed that the six meta-features including Reverse ReLU Silhouette Score and HHI proposed in this study have a significant effect on the classification performance. (1) The meta-features HHI proposed in this study was significant in the classification performance. (2) The number of variables has a significant effect on the classification performance, unlike the number of classes, but it has a positive effect. (3) The number of classes has a negative effect on the performance of classification. (4) Entropy has a significant effect on the performance of classification. (5) The Reverse ReLU Silhouette Score also significantly affects the classification performance at a significant level of 0.01. (6) The nonlinearity of linear classifiers has a significant negative effect on classification performance. In addition, the results of the analysis by the classification algorithms were also consistent. In the regression analysis by classification algorithm, Naïve Bayes algorithm does not have a significant effect on the number of variables unlike other classification algorithms. This study has two theoretical contributions: (1) two new meta-features (HHI, Reverse ReLU Silhouette score) was proved to be significant. (2) The effects of data characteristics on the performance of classification were investigated using meta-features. The practical contribution points (1) can be utilized in the development of classification algorithm recommendation system according to the characteristics of datasets. (2) Many data scientists are often testing by adjusting the parameters of the algorithm to find the optimal algorithm for the situation because the characteristics of the data are different. In this process, excessive waste of resources occurs due to hardware, cost, time, and manpower. This study is expected to be useful for machine learning, data mining researchers, practitioners, and machine learning-based system developers. The composition of this study consists of introduction, related research, research model, experiment, conclusion and discussion.

Wearable Sensor-Based Biometric Gait Classification Algorithm Using WEKA

  • Youn, Ik-Hyun;Won, Kwanghee;Youn, Jong-Hoon;Scheffler, Jeremy
    • Journal of information and communication convergence engineering
    • /
    • v.14 no.1
    • /
    • pp.45-50
    • /
    • 2016
  • Gait-based classification has gained much interest as a possible authentication method because it incorporate an intrinsic personal signature that is difficult to mimic. The study investigates machine learning techniques to mitigate the natural variations in gait among different subjects. We incorporated several machine learning algorithms into this study using the data mining package called Waikato Environment for Knowledge Analysis (WEKA). WEKA's convenient interface enabled us to apply various sets of machine learning algorithms to understand whether each algorithm can capture certain distinctive gait features. First, we defined 24 gait features by analyzing three-axis acceleration data, and then selectively used them for distinguishing subjects 10 years of age or younger from those aged 20 to 40. We also applied a machine learning voting scheme to improve the accuracy of the classification. The classification accuracy of the proposed system was about 81% on average.

A Comparative Study of Medical Data Classification Methods Based on Decision Tree and System Reconstruction Analysis

  • Tang, Tzung-I;Zheng, Gang;Huang, Yalou;Shu, Guangfu;Wang, Pengtao
    • Industrial Engineering and Management Systems
    • /
    • v.4 no.1
    • /
    • pp.102-108
    • /
    • 2005
  • This paper studies medical data classification methods, comparing decision tree and system reconstruction analysis as applied to heart disease medical data mining. The data we study is collected from patients with coronary heart disease. It has 1,723 records of 71 attributes each. We use the system-reconstruction method to weight it. We use decision tree algorithms, such as induction of decision trees (ID3), classification and regression tree (C4.5), classification and regression tree (CART), Chi-square automatic interaction detector (CHAID), and exhausted CHAID. We use the results to compare the correction rate, leaf number, and tree depth of different decision-tree algorithms. According to the experiments, we know that weighted data can improve the correction rate of coronary heart disease data but has little effect on the tree depth and leaf number.

User Interface Application for Cancer Classification using Histopathology Images

  • Naeem, Tayyaba;Qamar, Shamweel;Park, Peom
    • Journal of the Korean Society of Systems Engineering
    • /
    • v.17 no.2
    • /
    • pp.91-97
    • /
    • 2021
  • User interface for cancer classification system is a software application with clinician's friendly tools and functions to diagnose cancer from pathology images. Pathology evolved from manual diagnosis to computer-aided diagnosis with the help of Artificial Intelligence tools and algorithms. In this paper, we explained each block of the project life cycle for the implementation of automated breast cancer classification software using AI and machine learning algorithms to classify normal and invasive breast histology images. The system was designed to help the pathologists in an automatic and efficient diagnosis of breast cancer. To design the classification model, Hematoxylin and Eosin (H&E) stained breast histology images were obtained from the ICIAR Breast Cancer challenge. These images are stain normalized to minimize the error that can occur during model training due to pathological stains. The normalized dataset was fed into the ResNet-34 for the classification of normal and invasive breast cancer images. ResNet-34 gave 94% accuracy, 93% F Score, 95% of model Recall, and 91% precision.