• Title/Summary/Keyword: classification tree

Search Result 911, Processing Time 0.033 seconds

A study on data mining techniques for soil classification methods using cone penetration test results

  • Junghee Park;So-Hyun Cho;Jong-Sub Lee;Hyun-Ki Kim
    • Geomechanics and Engineering
    • /
    • v.35 no.1
    • /
    • pp.67-80
    • /
    • 2023
  • Due to the nature of the conjunctive Cone Penetration Test(CPT), which does not verify the actual sample directly, geotechnical engineers commonly classify the underground geomaterials using CPT results with the classification diagrams proposed by various researchers. However, such classification diagrams may fail to reflect local geotechnical characteristics, potentially resulting in misclassification that does not align with the actual stratification in regions with strong local features. To address this, this paper presents an objective method for more accurate local CPT soil classification criteria, which utilizes C4.5 decision tree models trained with the CPT results from the clay-dominant southern coast of Korea and the sand-dominant region in South Carolina, USA. The results and analyses demonstrate that the C4.5 algorithm, in conjunction with oversampling, outlier removal, and pruning methods, can enhance and optimize the decision tree-based CPT soil classification model.

Rule Selection Method in Decision Tree Models (의사결정나무 모델에서의 중요 룰 선택기법)

  • Son, Jieun;Kim, Seoung Bum
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.40 no.4
    • /
    • pp.375-381
    • /
    • 2014
  • Data mining is a process of discovering useful patterns or information from large amount of data. Decision tree is one of the data mining algorithms that can be used for both classification and prediction and has been widely used for various applications because of its flexibility and interpretability. Decision trees for classification generally generate a number of rules that belong to one of the predefined category and some rules may belong to the same category. In this case, it is necessary to determine the significance of each rule so as to provide the priority of the rule with users. The purpose of this paper is to propose a rule selection method in classification tree models that accommodate the umber of observation, accuracy, and effectiveness in each rule. Our experiments demonstrate that the proposed method produce better performance compared to other existing rule selection methods.

A review of tree-based Bayesian methods

  • Linero, Antonio R.
    • Communications for Statistical Applications and Methods
    • /
    • v.24 no.6
    • /
    • pp.543-559
    • /
    • 2017
  • Tree-based regression and classification ensembles form a standard part of the data-science toolkit. Many commonly used methods take an algorithmic view, proposing greedy methods for constructing decision trees; examples include the classification and regression trees algorithm, boosted decision trees, and random forests. Recent history has seen a surge of interest in Bayesian techniques for constructing decision tree ensembles, with these methods frequently outperforming their algorithmic counterparts. The goal of this article is to survey the landscape surrounding Bayesian decision tree methods, and to discuss recent modeling and computational developments. We provide connections between Bayesian tree-based methods and existing machine learning techniques, and outline several recent theoretical developments establishing frequentist consistency and rates of convergence for the posterior distribution. The methodology we present is applicable for a wide variety of statistical tasks including regression, classification, modeling of count data, and many others. We illustrate the methodology on both simulated and real datasets.

A study of constitution diagnosis using decision tree method (의사결정나무법을 이용한 체질진단에 관한 연구)

  • Lee, Yong-Seop;Park, Seong-Sik;Park, Eun-Kyung
    • Journal of Sasang Constitutional Medicine
    • /
    • v.13 no.2
    • /
    • pp.144-155
    • /
    • 2001
  • By the increasing concern about Sasang Constitution Medicine, its practical use is considered very important in disease prevention and medical treatment. However, the method of constitution classification is depending on the doctor's clinical trials because of the lack of the objective test criteria. This study is trying to improve the objectiveness of diagnosis using a new statistical method, decision tree. Decision tree method-a classification technique in the statistical analysis- was used to analyze the result of QSCCII instead of using discriminant analysis. As a result, 16 among 121 QSCCII questions was selected as important questions and 21 terminal nodes was built to classify the constitution. Using only 16 questions shown in the result of decision tree, we can diagnose and interpret the constitution easily and effectively.

  • PDF

Analysis of PD Distribution Characteristics and Comparison of Classification Methods according to Electrical Tree Source in Power Cable (전력용 케이블 시편에서 전기트리 발생원에 따른 부분방전 분포 특성 및 발생원 분류기법 비교)

  • Park, Seong-Hee;Jeong, Hae-Eun;Lim, Kee-Joe;Kang, Seong-Hwa
    • Journal of the Korean Institute of Electrical and Electronic Material Engineers
    • /
    • v.20 no.1
    • /
    • pp.57-64
    • /
    • 2007
  • One of the cause of insulation failure in power cable is well known by electrical treeing discharge. This is occurred for imposed continuous stress at cable. And this event is related to safety, reliability and maintenance. In this paper, throughout analysis of partial discharge(PD) distribution when occurring the electrical tree, is studied for the purpose of knowing of electrical treeing discharge characteristics according to defects. Own characteristic of tree will be differently processed in each defect and this reason is the first purpose of this paper. To acquire PD data, three defective tree models were made. And their own data is shown by the phase-resolved partial discharge method (PRPD). As a result of PRPD, tree discharge sources have their own characteristics. And if other defects (void, metal particle) exist internal power cable then their characteristics are shown very different. This result Is related to the time of breakdown and this is importance of cable diagnosis. And classification method of PD sources was studied in this paper. It needs select the most useful method to apply PD data classification one of the proposed method. To meet the requirement, we select methods of different type. That is, neural network(NN-BP), adaptive neuro-fuzzy inference system and PCA-LDA were applied to result. As a result of, ANFIS shows the highest rate which value is 98 %. Generally, PCA-LDA and ANFIS are better than BP. Finally, we performed classification of tree progress using ANFIS and that result is 92 %.

Game Traffic Classification Using Statistical Characteristics at the Transport Layer

  • Han, Young-Tae;Park, Hong-Shik
    • ETRI Journal
    • /
    • v.32 no.1
    • /
    • pp.22-32
    • /
    • 2010
  • The pervasive game environments have activated explosive growth of the Internet over recent decades. Thus, understanding Internet traffic characteristics and precise classification have become important issues in network management, resource provisioning, and game application development. Naturally, much attention has been given to analyzing and modeling game traffic. Little research, however, has been undertaken on the classification of game traffic. In this paper, we perform an interpretive traffic analysis of popular game applications at the transport layer and propose a new classification method based on a simple decision tree, called an alternative decision tree (ADT), which utilizes the statistical traffic characteristics of game applications. Experimental results show that ADT precisely classifies game traffic from other application traffic types with limited traffic features and a small number of packets, while maintaining low complexity by utilizing a simple decision tree.

Development of the forest type classification technique for the mixed forest with coniferous and broad-leaved species using the high resolution satellite data

  • Sasakawa, Hiroshi;Tsuyuki, Satoshi
    • Proceedings of the KSRS Conference
    • /
    • 2003.11a
    • /
    • pp.467-469
    • /
    • 2003
  • This research aimed to develop forest type classification technique for the mixed forest with coniferous and broad-leaved species using the high resolution satellite data. QuickBird data was used as satellite data. The method of this research was to extract satellite data for every single tree crown using image segmentation technique, then to evaluate the accuracy of classification by changing grouping criteria such as tree species, families, coniferous or broad-leaved species, and timber prices. As a result, the classification of tree species and families level was inaccurate, on the other hand, coniferous or broad-leaved species and timber price level was high accurate.

  • PDF

Decision Tree Classifier for Multiple Abstraction Levels of Data (다중 추상화 수준의 데이터를 위한 결정 트리 분류기)

  • Jeong, Min-A;Lee, Do-Heon
    • The KIPS Transactions:PartD
    • /
    • v.10D no.1
    • /
    • pp.23-32
    • /
    • 2003
  • Since the data is collected from disparate sources in many actual data mining environments, it is common to have data values in different abstraction levels. This paper shows that such multiple abstraction levels of data can cause undesirable effects in decision tree classification. After explaining that equalizing abstraction levels by force cannot provide satisfactory solutions of this problem, it presents a method to utilize the data as it is. The proposed method accommodates the generalization/specialization relationship between data values in both of the construction and the class assignment phase of decision tree classification. The experimental results show that the proposed method reduces classification error rates significantly when multiple abstraction levels of data are involved.

Detection of Individual Tree Species Using Object-Based Classification Method with Unmanned Aerial Vehicle (UAV) Imagery

  • Park, Jeongmook;Sim, Woodam;Lee, Jungsoo
    • Journal of Forest and Environmental Science
    • /
    • v.35 no.3
    • /
    • pp.181-188
    • /
    • 2019
  • This study was performed to construct tree species classification map according to three information types (spectral information, texture information, and spectral and texture information) by altitude (30 m, 60 m, 90 m) using the unmanned aerial vehicle images and the object-based classification method, and to evaluate the concordance rate through field survey data. The object-based, optimal weighted values by altitude were 176 for 30 m images, 111 for 60 m images, and 108 for 90 m images in the case of Scale while 0.4/0.6, 0.5/0.5, in the case of the shape/color and compactness/smoothness respectively regardless of the altitude. The overall accuracy according to the type of information by altitude, the information on spectral and texture information was about 88% in the case of 30 m and the spectral information was about 98% and about 86% in the case of 60 m and 90 m respectively showing the highest rates. The concordance rate with the field survey data per tree species was the highest with about 92% in the case of Pinus densiflora at 30 m, about 100% in the case of Prunus sargentii Rehder tree at 60 m, and about 89% in the case of Robinia pseudoacacia L. at 90 m.

Optimization of Decision Tree for Classification Using a Particle Swarm

  • Cho, Yun-Ju;Lee, Hye-Seon;Jun, Chi-Hyuck
    • Industrial Engineering and Management Systems
    • /
    • v.10 no.4
    • /
    • pp.272-278
    • /
    • 2011
  • Decision tree as a classification tool is being used successfully in many areas such as medical diagnosis, customer churn prediction, signal detection and so on. The main advantage of decision tree classifiers is their capability to break down a complex structure into a collection of simpler structures, thus providing a solution that is easy to interpret. Since decision tree is a top-down algorithm using a divide and conquer induction process, there is a risk of reaching a local optimal solution. This paper proposes a procedure of optimally determining thresholds of the chosen variables for a decision tree using an adaptive particle swarm optimization (APSO). The proposed algorithm consists of two phases. First, we construct a decision tree and choose the relevant variables. Second, we find the optimum thresholds simultaneously using an APSO for those selected variables. To validate the proposed algorithm, several artificial and real datasets are used. We compare our results with the original CART results and show that the proposed algorithm is promising for improving prediction accuracy.