• 제목/요약/키워드: decision tree(C4.5)

검색결과 84건 처리시간 0.025초

A study on data mining techniques for soil classification methods using cone penetration test results

  • Junghee Park;So-Hyun Cho;Jong-Sub Lee;Hyun-Ki Kim
    • Geomechanics and Engineering
    • /
    • 제35권1호
    • /
    • pp.67-80
    • /
    • 2023
  • Due to the nature of the conjunctive Cone Penetration Test(CPT), which does not verify the actual sample directly, geotechnical engineers commonly classify the underground geomaterials using CPT results with the classification diagrams proposed by various researchers. However, such classification diagrams may fail to reflect local geotechnical characteristics, potentially resulting in misclassification that does not align with the actual stratification in regions with strong local features. To address this, this paper presents an objective method for more accurate local CPT soil classification criteria, which utilizes C4.5 decision tree models trained with the CPT results from the clay-dominant southern coast of Korea and the sand-dominant region in South Carolina, USA. The results and analyses demonstrate that the C4.5 algorithm, in conjunction with oversampling, outlier removal, and pruning methods, can enhance and optimize the decision tree-based CPT soil classification model.

e-CRM에서 개인화 향상을 위한 의사결정나무 사용에 관한 연구 (Study on the Application of Decision Trees for Personalization based on e-CRM)

  • 양정희;한서정
    • 대한안전경영과학회지
    • /
    • 제5권3호
    • /
    • pp.107-119
    • /
    • 2003
  • Expectation and interest about e-CRM are rising for more efficient customer management in on-line including electronic commerce. The decision-making tree can be used usefully as the data mining technology for e-CRM. In this paper, the representative decision making techniques, CART, C4.5, CHAID analyzed the differences in personalization point of view with actuality customer data through an experiment. With these analysis data, it is proposed a new decision-making tree system that has big advantage in personalization techniques. Through new system, it can get following advantage. First, it can form superior model more qualitatively in personalization by adding individual's weight value. Second it can supply information personalized more to customer. Third, it can have high position about customer's loyalty than other site of similar types of business. Fourth, it can reduce expense that cost marketing and decision-making. Fifth, it becomes possible that know that customer through smooth communication with customer who use personalized service wants and make from goods or service's quality to more worth thing.

이미지 보간을 위한 의사결정나무 분류 기법의 적용 및 구현 (Adopting and Implementation of Decision Tree Classification Method for Image Interpolation)

  • 김동형
    • 디지털산업정보학회논문지
    • /
    • 제16권1호
    • /
    • pp.55-65
    • /
    • 2020
  • With the development of display hardware, image interpolation techniques have been used in various fields such as image zooming and medical imaging. Traditional image interpolation methods, such as bi-linear interpolation, bi-cubic interpolation and edge direction-based interpolation, perform interpolation in the spatial domain. Recently, interpolation techniques in the discrete cosine transform or wavelet domain are also proposed. Using these various existing interpolation methods and machine learning, we propose decision tree classification-based image interpolation methods. In other words, this paper is about the method of adaptively applying various existing interpolation methods, not the interpolation method itself. To obtain the decision model, we used Weka's J48 library with the C4.5 decision tree algorithm. The proposed method first constructs attribute set and select classes that means interpolation methods for classification model. And after training, interpolation is performed using different interpolation methods according to attributes characteristics. Simulation results show that the proposed method yields reasonable performance.

결정트리를 이용하는 불완전한 데이터 처리기법 (Incomplete data handling technique using decision trees)

  • 이종찬
    • 한국융합학회논문지
    • /
    • 제12권8호
    • /
    • pp.39-45
    • /
    • 2021
  • 본 논문은 손실값을 포함하는 불완전한 데이터를 처리하는 방법에 대해 논한다. 손실값을 최적으로 처리한다는 것은 학습 데이터가 가지고 있는 정보들에서 본래값과 가장 근사한 추정치를 구하고, 이 값으로 손실값을 대치하는 것이다. 이것을 실현하기 위한 방안으로 분류기가 정보를 분류하는 과정에서 완성되어가는 결정트리를 이용한다. 다시말해 이 결정트리는 전체 학습 데이터 중에서 손실값을 포함하지 않는 완전한 정보만을 C4.5 분류기에 입력하여 학습하는 과정에서 얻어진다. 이 결정트리의 노드들은 분류 변수의 정보를 가지는데, 루트에 가까운 상위 노드일수록 많은 정보를 포함하게 되고 말단 노드에서는 루트로부터의 경로를 통해 분류 영역을 형성하게 된다. 또한 각 영역에는 분류된 데이터 사건들의 평균이 기록된다. 손실값을 포함하는 사건들은 이러한 결정트리에 입력되어 각 노드의 정보에 따라 순회과정을 통해 사건과 가장 근접한 영역을 찾아가게 된다. 이 영역에 기록된 평균값을 손실값의 추정치로 간주하고, 보상 과정은 완성된다.

퍼지 결정트리를 이용한 패턴분류를 위한 데이터 마이닝 알고리즘 (Data Mining Algorithm Based on Fuzzy Decision Tree for Pattern Classification)

  • 이중근;김명원
    • 한국정보과학회논문지:소프트웨어및응용
    • /
    • 제26권11호
    • /
    • pp.1314-1323
    • /
    • 1999
  • 컴퓨터의 사용이 일반화됨에 따라 데이타를 생성하고 수집하는 것이 용이해졌다. 이에 따라 데이타로부터 자동적으로 유용한 지식을 얻는 기술이 필요하게 되었다. 데이타 마이닝에서 얻어진 지식은 정확성과 이해성을 충족해야 한다. 본 논문에서는 데이타 마이닝을 위하여 퍼지 결정트리에 기반한 효율적인 퍼지 규칙을 생성하는 알고리즘을 제안한다. 퍼지 결정트리는 ID3와 C4.5의 이해성과 퍼지이론의 추론과 표현력을 결합한 방법이다. 특히, 퍼지 규칙은 속성 축에 평행하게 판단 경계선을 결정하는 방법으로는 어려운 속성 축에 평행하지 않는 경계선을 갖는 패턴을 효율적으로 분류한다. 제안된 알고리즘은 첫째, 각 속성 데이타의 히스토그램 분석을 통해 적절한 소속함수를 생성한다. 둘째, 주어진 소속함수를 바탕으로 ID3와 C4.5와 유사한 방법으로 퍼지 결정트리를 생성한다. 또한, 유전자 알고리즘을 이용하여 소속함수를 조율한다. IRIS 데이타, Wisconsin breast cancer 데이타, credit screening 데이타 등 벤치마크 데이타들에 대한 실험 결과 제안된 방법이 C4.5 방법을 포함한 다른 방법보다 성능과 규칙의 이해성에서 보다 효율적임을 보인다.Abstract With an extended use of computers, we can easily generate and collect data. There is a need to acquire useful knowledge from data automatically. In data mining the acquired knowledge needs to be both accurate and comprehensible. In this paper, we propose an efficient fuzzy rule generation algorithm based on fuzzy decision tree for data mining. We combine the comprehensibility of rules generated based on decision tree such as ID3 and C4.5 and the expressive power of fuzzy sets. Particularly, fuzzy rules allow us to effectively classify patterns of non-axis-parallel decision boundaries, which are difficult to do using attribute-based classification methods.In our algorithm we first determine an appropriate set of membership functions for each attribute of data using histogram analysis. Given a set of membership functions then we construct a fuzzy decision tree in a similar way to that of ID3 and C4.5. We also apply genetic algorithm to tune the initial set of membership functions. We have experimented our algorithm with several benchmark data sets including the IRIS data, the Wisconsin breast cancer data, and the credit screening data. The experiment results show that our method is more efficient in performance and comprehensibility of rules compared with other methods including C4.5.

의사결정나무 기법을 이용한 노인들의 자살생각 예측모형 및 의사결정 규칙 개발 (A Development of Suicidal Ideation Prediction Model and Decision Rules for the Elderly: Decision Tree Approach)

  • 김덕현;유동희;정대율
    • 한국정보시스템학회지:정보시스템연구
    • /
    • 제28권3호
    • /
    • pp.249-276
    • /
    • 2019
  • Purpose The purpose of this study is to develop a prediction model and decision rules for the elderly's suicidal ideation based on the Korean Welfare Panel survey data. By utilizing this data, we obtained many decision rules to predict the elderly's suicide ideation. Design/methodology/approach This study used classification analysis to derive decision rules to predict on the basis of decision tree technique. Weka 3.8 is used as the data mining tool in this study. The decision tree algorithm uses J48, also known as C4.5. In addition, 66.6% of the total data was divided into learning data and verification data. We considered all possible variables based on previous studies in predicting suicidal ideation of the elderly. Finally, 99 variables including the target variable were used. Classification analysis was performed by introducing sampling technique through backward elimination and data balancing. Findings As a result, there were significant differences between the data sets. The selected data sets have different, various decision tree and several rules. Based on the decision tree method, we derived the rules for suicide prevention. The decision tree derives not only the rules for the suicidal ideation of the depressed group, but also the rules for the suicidal ideation of the non-depressed group. In addition, in developing the predictive model, the problem of over-fitting due to the data imbalance phenomenon was directly identified through the application of data balancing. We could conclude that it is necessary to balance the data on the target variables in order to perform the correct classification analysis without over-fitting. In addition, although data balancing is applied, it is shown that performance is not inferior in prediction rate when compared with a biased prediction model.

팔체질 진단을 위한 단계별 설문지 개발 연구 (A Study on Stage Classification of Eight Constitution Questionnaire)

  • 이주호;김민용;김희주;신용섭;오환섭;박영배;박영재
    • 대한한의진단학회지
    • /
    • 제16권2호
    • /
    • pp.59-70
    • /
    • 2012
  • Objectives : Pulse diagnosis by Expert is the only way to classify 8 Constitutions so the study to supplement classifying method by the questionnaire has developed and modified and ECM-32 System has designed in 2010. But analyzing with Decision tree had many nodes and 32 important questions omitted while processing the data. So this study was to classify the 8 constitution patients into 2 groups first and analyze its characters in consecutive order. Methods : The participants of this study were 1027 patients who classified into one of the 8 constitutions according to pulse diagnosis and answered 251 questionnaires in 2010. They were divided into sympathetic nerve acceleration constitution and parasympathetic nerve acceleration constitution and analyzed with decision tree. Results : The reponses of the questionnaire were analyzed with 4 methods of 5 scales interval method from 0 to 5, Na, Low(1,2), Medium(3), High(4,5), average value, Y/N dichotomy. Average Value had no significance. 1. From the 5 scale interval method 6 questionnaires with 7 nodes (F5e, B1d, F7f, F2a, F1b, C4L) were significant. The accuracy was 92.5%. 2. From L, M, H method 7 questionnaires with 7 nodes(F5e, B1d, F7f, F1a, B1c, C4L, P3d) were significant. The accuracy was 92.5%. 3. From Y/N dichotomy 9 questionnaires with 9 nodes( F5e, B1d, F7f, F1a, B1c, C4L, B1b, P1i, B2a) were significant. The accuracy was 93.18%. Conclusions : Based on this study, Yes or No dichotomy method was most significant and categorized among the 4 methods. Unlike previous studies which used interval scale method only, Y/N dichotomy method was more statistically significant with the questionnaire to supplement the method of pulse diagnosis. For further study by analyzing decision tree method in consecutive order, the patients can be divided into 8 Constitutions with higher significance with less questionnaires.

건설업의 산업재해 특성분석을 위한 의사결정나무 기법의 상용 최적 알고리즘 선정 (Selection of an Optimal Algorithm among Decision Tree Techniques for Feature Analysis of Industrial Accidents in Construction Industries)

  • 임영문;최요한
    • 대한안전경영과학회지
    • /
    • 제7권5호
    • /
    • pp.1-8
    • /
    • 2005
  • The consequences of rapid industrial advancement, diversified types of business and unexpected industrial accidents have caused a lot of damage to many unspecified persons both in a human way and a material way Although various previous studies have been analyzed to prevent industrial accidents, these studies only provide managerial and educational policies using frequency analysis and comparative analysis based on data from past industrial accidents. The main objective of this study is to find an optimal algorithm for data analysis of industrial accidents and this paper provides a comparative analysis of 4 kinds of algorithms including CHAID, CART, C4.5, and QUEST. Decision tree algorithm is utilized to predict results using objective and quantified data as a typical technique of data mining. Enterprise Miner of SAS and AnswerTree of SPSS will be used to evaluate the validity of the results of the four algorithms. The sample for this work chosen from 19,574 data related to construction industries during three years ($2002\sim2004$) in Korea.

하이브리드 데이터마이닝을 이용한 지능형 이상 진단 시스템 (Intelligent Fault Diagnosis System Using Hybrid Data Mining)

  • 백준걸;허준
    • 한국경영과학회:학술대회논문집
    • /
    • 한국경영과학회/대한산업공학회 2005년도 춘계공동학술대회 발표논문
    • /
    • pp.960-968
    • /
    • 2005
  • The high cost in maintaining complex manufacturing process makes it necessary to enhance an efficient maintenance system. For the effective maintenance of manufacturing process, precise fault diagnosis should be performed and an appropriate maintenance action should be executed. This paper suggests an intelligent fault diagnosis system using hybrid data mining. In this system, the rules for the fault diagnosis are generated by hybrid decision tree/genetic algorithm and the most effective maintenance action is selected by decision network and AHP. To verify the proposed intelligent fault diagnosis system, we compared the accuracy of the hybrid decision tree/genetic algorithm with one of the general decision tree learning algorithm(C4.5) by data collected from a coil-spring manufacturing process.

  • PDF

Naive Bayes 분석기법을 이용한 유방암 진단 (Breast Cancer Diagnosis using Naive Bayes Analysis Techniques)

  • 박나영;김장일;정용규
    • 서비스연구
    • /
    • 제3권1호
    • /
    • pp.87-93
    • /
    • 2013
  • 선진국형 질병으로만 알려져 있던 유방암이 우리나라 현대 여성들에게 발병률이 꾸준히 증가하고 있다. 유방암은 보통 50대 이상의 여성에서 발병하는 병으로 알려져 있지만 우리나라의 경우 40대의 서양보다 젊은 여성들에게 발병률이 꾸준히 증가하고 있다. 따라서 우리나라 성인여성을 기준으로 유방암에 대한 정확한 진단을 할 수 있는 매뉴얼을 구축하는 것이 시급한 과제이다. 본 논문에서는 데이터마이닝기법을 이용하여 유방암을 예측하는 방법을 제시한다. 데이터마이닝이란 데이터베이스 내에 숨어 있는 일정한 패턴이나 변수들 간의 관계를 정교한 분석모형을 이용하여 쉽게 드러나지 않은 유용한 정보를 찾아내는 과정을 말한다. 실험을 통하여 Deicion Tree와 Naive Bayes 분석기법을 사용하여 유방암을 진단하는 분석기법을 비교분석을 하였다. Deicison Tree는 C4.5 알고리즘을 적용하여 분석하였고 두 알고리즘이 상당히 좋은 분류 정확도를 나타냈다. 그러나 Naive Bayes 분류방법이 Decision Tree방법보다 더 상회하는 정확도를 보였고 이는 의료데이터의 특성에 많이 기인한다고 볼 수 있다.

  • PDF