• Title/Summary/Keyword: Decision tree classifier

Search Result 107, Processing Time 0.023 seconds

A Question Type Classifier Using a Decision Tree and Lexico-syntactic Patterns (Lexico-syntactic 패턴과 결정트리를 이용한 질의 유형 분류기)

  • Kim, Hark-Soo;An, Young-Hun;Seo, Jung-Yun
    • Annual Conference on Human and Language Technology
    • /
    • 2002.10e
    • /
    • pp.189-196
    • /
    • 2002
  • 질의응답 시스템이 올바른 답변을 제시하기 위해서는 사용자의 의도를 정확하고 강건하게 파악하는 것이 매우 중요하다. 이러한 요구 사항을 만족시키기 위해서 본 논문에서는 실용적 질의응답 시스템을 위한 질의 유형 분류기를 제안한다. 제안된 질의 유형 분류기는 규칙 기반의 방법과 통계 기반의 방법을 접목시킨 하이브리드 방법을 사용한다. 제안된 방법을 사용함으로써 수동으로 규칙을 작성하는 시간을 줄일 수 있었고 정확률을 향상시킬 수 있었으며 안정성을 보장받을 수 있었다. 제안된 방법에 대한 실험에서 질의 유형을 분류하는데 86%의 정확률을 얻었다.

  • PDF

An enhanced feature selection filter for classification of microarray cancer data

  • Mazumder, Dilwar Hussain;Veilumuthu, Ramachandran
    • ETRI Journal
    • /
    • v.41 no.3
    • /
    • pp.358-370
    • /
    • 2019
  • The main aim of this study is to select the optimal set of genes from microarray cancer datasets that contribute to the prediction of specific cancer types. This study proposes the enhancement of the feature selection filter algorithm based on Joe's normalized mutual information and its use for gene selection. The proposed algorithm is implemented and evaluated on seven benchmark microarray cancer datasets, namely, central nervous system, leukemia (binary), leukemia (3 class), leukemia (4 class), lymphoma, mixed lineage leukemia, and small round blue cell tumor, using five well-known classifiers, including the naive Bayes, radial basis function network, instance-based classifier, decision-based table, and decision tree. An average increase in the prediction accuracy of 5.1% is observed on all seven datasets averaged over all five classifiers. The average reduction in training time is 2.86 seconds. The performance of the proposed method is also compared with those of three other popular mutual information-based feature selection filters, namely, information gain, gain ratio, and symmetric uncertainty. The results are impressive when all five classifiers are used on all the datasets.

Refining Rules of Decision Tree Using Extended Data Expression (확장형 데이터 표현을 이용하는 이진트리의 룰 개선)

  • Jeon, Hae Sook;Lee, Won Don
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.18 no.6
    • /
    • pp.1283-1293
    • /
    • 2014
  • In ubiquitous environment, data are changing rapidly and new data is coming as times passes. And sometimes all of the past data will be lost if there is not sufficient space in memory. Therefore, there is a need to make rules and combine it with new data not to lose all the past data or to deal with large amounts of data. In making decision trees and extracting rules, the weight of each of rules is generally determined by the total number of the class at leaf. The computational problem of finding a minimum finite state acceptor compatible with given data is NP-hard. We assume that rules extracted are not correct and may have the loss of some information. Because of this precondition. this paper presents a new approach for refining rules. It controls their weight of rules of previous knowledge or data. In solving rule refinement, this paper tries to make a variety of rules with pruning method with majority and minority properties, control weight of each of rules and observe the change of performances. In this paper, the decision tree classifier with extended data expression having static weight is used for this proposed study. Experiments show that performances conducted with a new policy of refining rules may get better.

A Machine Learning Approach to Web Image Classification (기계학습 기반의 웹 이미지 분류)

  • Cho, Soo-Sun;Lee, Dong-Woo;Han, Dong-Won;Hwang, Chi-Jung
    • The KIPS Transactions:PartB
    • /
    • v.9B no.6
    • /
    • pp.759-764
    • /
    • 2002
  • Although image occupies a large part of importance on the Web documents, there have not been many researches for analyzing and understanding it. Many Web images are used for carrying important information but others are not used for it. In this paper classify the Web images from presently served Web sites to erasable or non-erasable classes. based on machine learning methods. For this research, we have detected 16 special and rich features for Web images and experimented by using the Baysian and decision tree methods. As the results, F-measures of 87.09%, 82.72% were achived for each method and particularly, from the experiments to compare the effects of feature groups, it has proved that the added features on this study are very useful for Web image classification.

Analysis of the Factors and Patterns Associated with Death in Aircraft Accidents and Incidents Using Data Mining Techniques (데이터 마이닝 기법을 활용한 항공기 사고 및 준사고로 인한 사망 발생 요인 및 패턴 분석)

  • Kim, Jeong-Hun;Kim, Tae-Un;Yoo, Dong-Hee
    • Journal of Digital Convergence
    • /
    • v.17 no.9
    • /
    • pp.79-88
    • /
    • 2019
  • This study analyzes the influential factors and patterns associated with death from aircraft accidents and incidents using data mining techniques. To this end, we used two datasets for aircraft accidents and incidents, one from the National Transportation Safety Board (NTSB) and the other from the Federal Aviation Administration (FAA). We developed our prediction models using the decision tree classifier to predict death from aircraft accidents or aircraft incidents and thereby derive the main cause factors and patterns that can cause death based on these prediction models. In the NTSB data, deaths occurred frequently when the aircraft was destroyed or people were performing dangerous missions or maneuver. In the FAA data, deaths were mainly caused by pilots who were less skilled or less qualified when their aircraft were partially destroyed. Several death-related patterns were also found for parachute jumping and aircraft ascending and descending phases. Using the derived patterns, we proposed helpful strategies to prevent death from the aircraft accidents or incidents.

Ensemble Based Optimal Feature Selection Algorithm for Efficient Intrusion Detection in Wireless Sensor Network

  • Shyam Sundar S;R.S. Bhuvaneswaran;SaiRamesh L
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.8
    • /
    • pp.2214-2229
    • /
    • 2024
  • Wireless sensor network (WSN) consists of large number of sensor nodes that are deployed in geographical locations to collect sensed information, process data and communicate it to the control station for further processing. Due the unfriendly environment where the sensors are deployed, there exist many possibilities of malicious nodes which performs malicious activities in the network. Therefore, the security threats affect performance and life time of sensor networks, whereas various security aspects are there to address security issues in WSN namely Cryptography, Trust Management, Intrusion Detection System (IDS) and Intrusion Prevention Systems (IPS). However, IDS detect the malicious activities and produce an alarm. These malicious activities exploit vulnerabilities in the network layer and affect all layers in the network. Existing feature selection methods such as filter-based methods are not considering the redundancy of the selected features and wrapper method has high risk of overfitting the classification of intrusion. Due to overfitting, the classification algorithm fails to detect the intrusion in better manner. The main objective of this paper is to provide the efficient feature selection algorithm which was suitable for any type classification algorithm to detect the intrusion in an effective manner. This paper, the security of the network is addressed by proposing Feature Selection Algorithm using Chi Squared with Ensemble Method (FSChE). The proposed scheme employs the combination of decision tree along with the random forest classification algorithm to form ensemble classifier. The experimental results justify the feasibility of the proposed scheme in terms of attack detection, packet delivery ratio and time analysis by employing NSL KDD cup data Set. The obtained results shows that the proposed ensemble method increases the overall performance by 10% to 25% with respect to mentioned parameters.

Application of Hyperspectral Imagery to Decision Tree Classifier for Assessment of Spring Potato (Solanum tuberosum) Damage by Salinity and Drought (초분광 영상을 이용한 의사결정 트리 기반 봄감자(Solanum tuberosum)의 염해 판별)

  • Kang, Kyeong-Suk;Ryu, Chan-Seok;Jang, Si-Hyeong;Kang, Ye-Seong;Jun, Sae-Rom;Park, Jun-Woo;Song, Hye-Young;Lee, Su Hwan
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.21 no.4
    • /
    • pp.317-326
    • /
    • 2019
  • Salinity which is often detected on reclaimed land is a major detrimental factor to crop growth. It would be advantageous to develop an approach for assessment of salinity and drought damages using a non-destructive method in a large landfills area. The objective of this study was to examine applicability of the decision tree classifier using imagery for classifying for spring potatoes (Solanum tuberosum) damaged by salinity or drought at vegetation growth stages. We focused on comparing the accuracies of OA (Overall accuracy) and KC (Kappa coefficient) between the simple reflectance and the band ratios minimizing the effect on the light unevenness. Spectral merging based on the commercial band width with full width at half maximum (FWHM) such as 10 nm, 25 nm, and 50 nm was also considered to invent the multispectral image sensor. In the case of the classification based on original simple reflectance with 5 nm of FWHM, the selected bands ranged from 3-13 bands with the accuracy of less than 66.7% of OA and 40.8% of KC in all FWHMs. The maximum values of OA and KC values were 78.7% and 57.7%, respectively, with 10 nm of FWHM to classify salinity and drought damages of spring potato. When the classifier was built based on the band ratios, the accuracy was more than 95% of OA and KC regardless of growth stages and FWHMs. If the multispectral image sensor is made with the six bands (the ratios of three bands) with 10 nm of FWHM, it is possible to classify the damaged spring potato by salinity or drought using the reflectance of images with 91.3% of OA and 85.0% of KC.

Design and Implementation of an Intelligent Medical Expert System for TMA(Tissue Mineral Analysis) (TMA 분석을 위한 지능적 의학 전문가 시스템의 설계 및 구현)

  • 조영임;한근식
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.2
    • /
    • pp.137-152
    • /
    • 2004
  • Assesment of 30 nutritional minerals and 8 toxic elements in hair are very important not only for determining adequacy, deficiencies and unbalance, but also for assessing their relative relationships in the body. A test has been developed that serves this purpose exceedingly well. This test is known as tissue mineral analysis(TMA). TMA is very popular method in hair mineral analysis for health care professionals in over 46 countries' medical center. However, there are some problems. First, they do not have database which is suitable for korean to do analyze. Second, as the TMA results from TEI-USA is composed of english documents and graphic files prohibited to open, its usability is very low. Third, some of them has low level database which is related to TMA, so hairs are sent to TEI-USA for analyzing and medical services. it bring about an severe outflow of dollars. Finally, TMA results are based on the database of american health and mineral standards, it is possibly mislead korean mineral standards. The purposes of this research is to develope the first Intelligent Medical Expert System(IMES) of TMA, in Korea, which makes clear the problems mentioned earlier IMES can analyze the tissue mineral data with multiple stage decision tree classifier. It is also constructed with multiple fuzzy rule base and hence analyze the complex data from Korean database by fuzzy inference methods. Pilot test of this systems are increased of business efficiency and business satisfaction 86% and 92% respectively.

Personalized insurance product based on similarity (유사도를 활용한 맞춤형 보험 추천 시스템)

  • Kim, Joon-Sung;Cho, A-Ra;Oh, Hayong
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.11
    • /
    • pp.1599-1607
    • /
    • 2022
  • The data mainly used for the model are as follows: the personal information, the information of insurance product, etc. With the data, we suggest three types of models: content-based filtering model, collaborative filtering model and classification models-based model. The content-based filtering model finds the cosine of the angle between the users and items, and recommends items based on the cosine similarity; however, before finding the cosine similarity, we divide into several groups by their features. Segmentation is executed by K-means clustering algorithm and manually operated algorithm. The collaborative filtering model uses interactions that users have with items. The classification models-based model uses decision tree and random forest classifier to recommend items. According to the results of the research, the contents-based filtering model provides the best result. Since the model recommends the item based on the demographic and user features, it indicates that demographic and user features are keys to offer more appropriate items.

Development and Validation of MRI-Based Radiomics Models for Diagnosing Juvenile Myoclonic Epilepsy

  • Kyung Min Kim;Heewon Hwang;Beomseok Sohn;Kisung Park;Kyunghwa Han;Sung Soo Ahn;Wonwoo Lee;Min Kyung Chu;Kyoung Heo;Seung-Koo Lee
    • Korean Journal of Radiology
    • /
    • v.23 no.12
    • /
    • pp.1281-1289
    • /
    • 2022
  • Objective: Radiomic modeling using multiple regions of interest in MRI of the brain to diagnose juvenile myoclonic epilepsy (JME) has not yet been investigated. This study aimed to develop and validate radiomics prediction models to distinguish patients with JME from healthy controls (HCs), and to evaluate the feasibility of a radiomics approach using MRI for diagnosing JME. Materials and Methods: A total of 97 JME patients (25.6 ± 8.5 years; female, 45.5%) and 32 HCs (28.9 ± 11.4 years; female, 50.0%) were randomly split (7:3 ratio) into a training (n = 90) and a test set (n = 39) group. Radiomic features were extracted from 22 regions of interest in the brain using the T1-weighted MRI based on clinical evidence. Predictive models were trained using seven modeling methods, including a light gradient boosting machine, support vector classifier, random forest, logistic regression, extreme gradient boosting, gradient boosting machine, and decision tree, with radiomics features in the training set. The performance of the models was validated and compared to the test set. The model with the highest area under the receiver operating curve (AUROC) was chosen, and important features in the model were identified. Results: The seven tested radiomics models, including light gradient boosting machine, support vector classifier, random forest, logistic regression, extreme gradient boosting, gradient boosting machine, and decision tree, showed AUROC values of 0.817, 0.807, 0.783, 0.779, 0.767, 0.762, and 0.672, respectively. The light gradient boosting machine with the highest AUROC, albeit without statistically significant differences from the other models in pairwise comparisons, had accuracy, precision, recall, and F1 scores of 0.795, 0.818, 0.931, and 0.871, respectively. Radiomic features, including the putamen and ventral diencephalon, were ranked as the most important for suggesting JME. Conclusion: Radiomic models using MRI were able to differentiate JME from HCs.