• Title/Summary/Keyword: CART Analysis

Search Result 175, Processing Time 0.033 seconds

The empirical comparison of efficiency in classification algorithms (분류 알고리즘의 효율성에 대한 경험적 비교연구)

  • 전홍석;이주영
    • Journal of the Korea Safety Management & Science
    • /
    • v.2 no.3
    • /
    • pp.171-184
    • /
    • 2000
  • We may be given a set of observations with the classes or clusters. The aim of this article is to provide an up-to-date review of different approaches to classification, compare their performance on a wide range of challenging data-sets. In this paper, machine learning algorithm classifiers based on CART, C4.5, CAL5, FACT, QUEST and statistical discriminant analysis are compared on various datasets in classification error rate and algorithms.

  • PDF

Selecting variables for evidence-diagnosis of paralysis disease using CHAID algorithm

  • Shin, Yan-Kyu
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 2001.10a
    • /
    • pp.76-78
    • /
    • 2001
  • Variable selection in oriental medical research is considered. Decision tree analysis algorithms such as CHAID, CART, C4.5 and QUEST have been successfully applied to a medical research. Paralysis disease is a highly dangerous and murderous disease which accompanied with a great deal of severe physical handicap. In this paper, we explore the use of CHAID algorithm for selecting variables for evidence-diagnosis of paralysis, disease. Empirical results comparing our proposed method to the method using Wilks $\lambda$ given.

  • PDF

Spatial Analysis of Oak Wilt Disease in Bukhansan Mountain Park Using Spatial Data of Damaged Trees (피해목 위치자료를 이용한 북한산 국립공원 참나무시들음병 공간분석)

  • Zhu, Yongyan;Piao, Dongfan;Lee, Woo-kyun;Jeon, Seong-Woo
    • Korean Journal of Remote Sensing
    • /
    • v.33 no.5_3
    • /
    • pp.879-888
    • /
    • 2017
  • This study is a preliminary research conducted in Buhansan mountain National Park to develop a management system to predict and control oak wilt disease by indicating spatial factors which affect diffusion of the disease. After analysing altitude factor during the estimation of spatial analysis of damaged area, it is indicated that damaged trees are mainly distributed at altitude of 200-500 m and number decreased drastically over the altitude of 500 m. The result showed that 92% of total damaged trees are on slope between 20~40 degrees and the number decreased drastically on slope steeper than 40 degrees. It is indicated that damaged area is mainly distributed on southern aspect. It is estimated by using CART that slope factor affected the diffusion of disease mostly but aspect factor did not. Surface temperature and altitude showed similar effect.By simulating possible diffusion scenario, it is estimated that the disease could spread to DO-BONG Mt., northeast of Bukhansan mountain.

Artificial Intelligence Techniques for Predicting Online Peer-to-Peer(P2P) Loan Default (인공지능기법을 이용한 온라인 P2P 대출거래의 채무불이행 예측에 관한 실증연구)

  • Bae, Jae Kwon;Lee, Seung Yeon;Seo, Hee Jin
    • The Journal of Society for e-Business Studies
    • /
    • v.23 no.3
    • /
    • pp.207-224
    • /
    • 2018
  • In this article, an empirical study was conducted by using public dataset from Lending Club Corporation, the largest online peer-to-peer (P2P) lending in the world. We explore significant predictor variables related to P2P lending default that housing situation, length of employment, average current balance, debt-to-income ratio, loan amount, loan purpose, interest rate, public records, number of finance trades, total credit/credit limit, number of delinquent accounts, number of mortgage accounts, and number of bank card accounts are significant factors to loan funded successful on Lending Club platform. We developed online P2P lending default prediction models using discriminant analysis, logistic regression, neural networks, and decision trees (i.e., CART and C5.0) in order to predict P2P loan default. To verify the feasibility and effectiveness of P2P lending default prediction models, borrower loan data and credit data used in this study. Empirical results indicated that neural networks outperforms other classifiers such as discriminant analysis, logistic regression, CART, and C5.0. Neural networks always outperforms other classifiers in P2P loan default prediction.

Evaluation on Performance of Accuracy for Analysis and Classification of Data Related to Industrial Accidents (산업재해 데이터의 분석 및 분류를 위한 정확도 성능 평가)

  • Leem Young-Moon;Ryu Chang-Hyun
    • Proceedings of the Safety Management and Science Conference
    • /
    • 2006.04a
    • /
    • pp.51-56
    • /
    • 2006
  • Recently data mining techniques have been used for analysis and classification of data related to industrial accidents. The main objective of this study is to compare performance of algorithms for data analysis of industrial accidents and this paper provides a comparative analysis of 5 kinds of algorithms including CHAID, CART, C4.5, LR (Logistic Regression) and NN (Neural Network) with ROC chart, lift chart and response threshold. In this study, data on 67,278 accidents were analyzed to create risk groups for a number of complications, including the risk of disease and accident. The sample for this work chosen from data related to manufacturing industries during three years $(2002\sim2004)$ in korea. According to the result analysis, NN has excellent performance for data analysis and classification of industrial accidents.

  • PDF

Comparison for Risk Estimate of Aspiration between the Revised Dysphagia Assessment Tool and Videofluoroscopy in Post-Stroke Patients (수정된 연하곤란사정도구와 비디오 연하영상 조영술의 흡인 위험 예측비교)

  • Moon, Kyung-Hee;Sohn, Hyun-Sook;Lee, Eun-Seok;Paek, Eun-Kyung;Kang, Eun-Ju;Lee, Seung-Hee;Han, Na-Ri;Lee, Meen-Hye;Kim, Deog-Young;Park, Chang-Gi;Yoo, Ji-Soo
    • Journal of Korean Academy of Nursing
    • /
    • v.40 no.3
    • /
    • pp.359-366
    • /
    • 2010
  • Purpose: The purpose of this study was to determine the significant factors for risk estimate of aspiration and to evaluate the efficiency of the dysphagia assessment tool. Methods: A consecutive series of 210 stroke patients with aspiration symptoms such as cough and dysphagia who had soft or regular diet without tube feeding were examined. The dysphagia assessment tool for aspiration was compared with videofluoroscopy using Classification and Regression Tree (CART) analysis. Results: In CART analysis, of 34 factors, the significant factors for estimating risk of aspiration were cough during swallowing, oral stasis, facial symmetry, salivary drooling, and cough after swallowing. The risk estimate error of the revised dysphagia assessment tool was 25.2%, equal to that of videofluoroscopy. Conclusion: The results indicate that the dysphagia assessment tool developed and examined in this study was potentially useful in the clinical field and the primary risk estimating factor was cough during swallowing. Oral stasis, facial symmetry, salivary drooling, cough after swallowing were other significant factors, and based on these results, the dysphagia assessment tool for aspiration was revised and complemented.

Exploring the Management Component of Rural Small Business in the 6th Industry at Each Stage of Growth (6차산업 경영체 성장단계별 핵심경영요소 탐색)

  • Kim, Jung-Tae
    • Asia-Pacific Journal of Business Venturing and Entrepreneurship
    • /
    • v.12 no.6
    • /
    • pp.123-138
    • /
    • 2017
  • This study aims to identify the characteristic variables of businesses that would impact the choice of their type in the 6th industry and analyze how they work. To this end, this study analyzed data of 752 businesses certified as belonging to the 6th industry in 2015 through the classification and regression tree (CART) algorithm in decision tree analysis. The results of analysis showed that the type of agricultural product processing affected shaping the type of the 6th industry at the early stage of growth while the type of agricultural product processing, the type of service, region and sales volumes at the stage of growth and service strategy and the type of agricultural product processing at the stage of maturity. These findings empirically identified key business factors that could support businesses in the 6th industry at each stage of growth and presented a direction forward for support of the 6th industry.

  • PDF

Selecting the optimal threshold based on impurity index in imbalanced classification (불균형 자료에서 불순도 지수를 활용한 분류 임계값 선택)

  • Jang, Shuin;Yeo, In-Kwon
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.5
    • /
    • pp.711-721
    • /
    • 2021
  • In this paper, we propose the method of adjusting thresholds using impurity indices in classification analysis on imbalanced data. Suppose the minority category is Positive and the majority category is Negative for the imbalanced binomial data. When categories are determined based on the commonly used 0.5 basis, the specificity tends to be high in unbalanced data while the sensitivity is relatively low. Increasing sensitivity is important when proper classification of objects in minority categories is relatively important. We explore how to increase sensitivity through adjusting thresholds. Existing studies have adjusted thresholds based on measures such as G-Mean and F1-score, but in this paper, we propose a method to select optimal thresholds using the chi-square statistic of CHAID, the Gini index of CART, and the entropy of C4.5. We also introduce how to get a possible unique value when multiple optimal thresholds are obtained. Empirical analysis shows what improvements have been made compared to the results based on 0.5 through classification performance metrics.

An Improved Text Classification Method for Sentiment Classification

  • Wang, Guangxing;Shin, Seong Yoon
    • Journal of information and communication convergence engineering
    • /
    • v.17 no.1
    • /
    • pp.41-48
    • /
    • 2019
  • In recent years, sentiment analysis research has become popular. The research results of sentiment analysis have achieved remarkable results in practical applications, such as in Amazon's book recommendation system and the North American movie box office evaluation system. Analyzing big data based on user preferences and evaluations and recommending hot-selling books and hot-rated movies to users in a targeted manner greatly improve book sales and attendance rate in movies [1, 2]. However, traditional machine learning-based sentiment analysis methods such as the Classification and Regression Tree (CART), Support Vector Machine (SVM), and k-nearest neighbor classification (kNN) had performed poorly in accuracy. In this paper, an improved kNN classification method is proposed. Through the improved method and normalizing of data, the purpose of improving accuracy is achieved. Subsequently, the three classification algorithms and the improved algorithm were compared based on experimental data. Experiments show that the improved method performs best in the kNN classification method, with an accuracy rate of 11.5% and a precision rate of 20.3%.

A Comparative Study of Medical Data Classification Methods Based on Decision Tree and System Reconstruction Analysis

  • Tang, Tzung-I;Zheng, Gang;Huang, Yalou;Shu, Guangfu;Wang, Pengtao
    • Industrial Engineering and Management Systems
    • /
    • v.4 no.1
    • /
    • pp.102-108
    • /
    • 2005
  • This paper studies medical data classification methods, comparing decision tree and system reconstruction analysis as applied to heart disease medical data mining. The data we study is collected from patients with coronary heart disease. It has 1,723 records of 71 attributes each. We use the system-reconstruction method to weight it. We use decision tree algorithms, such as induction of decision trees (ID3), classification and regression tree (C4.5), classification and regression tree (CART), Chi-square automatic interaction detector (CHAID), and exhausted CHAID. We use the results to compare the correction rate, leaf number, and tree depth of different decision-tree algorithms. According to the experiments, we know that weighted data can improve the correction rate of coronary heart disease data but has little effect on the tree depth and leaf number.