• Title/Summary/Keyword: classification trees

Search Result 313, Processing Time 0.025 seconds

Classification of Ovarian Cancer Microarray Data based on Intelligent Systems with Marker gene (선별 시스템 기반 표지 유전자를 포함한 난소암 마이크로어레이 데이터 분류)

  • Park, Su-Young;Jung, Chai-Yeoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.15 no.3
    • /
    • pp.747-752
    • /
    • 2011
  • Microarray classification typically possesses two striking attributes: (1) classifier design and error estimation are based on remarkably small samples and (2) cross-validation error estimation is employed in the majority of the papers. A Microarray data of ovarian cancer consists of the expressions of thens of thousands of genes, and there is no systematic procedure to analyze this information instantaneously. In this paper, gene markers are selected by ranking genes according to statistics, popular classification rules - linear discriminant analysis, k-nearest-neighbor and decision trees - has been performed comparing classification accuracy of data selecting gene markers and not selecting gene markers. The Result that apply linear classification analysis at Microarray data set including marker gene that are selected using ANOVA method represent the highest classification accuracy of 97.78% and the lowest prediction error estimate.

The Modelling of Prosodic Phrasing and Segmental Duration using CART (CART를 이용한 운율구 추출 및 음소 지속 시간 모델링)

  • 이상호
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1998.06c
    • /
    • pp.135-138
    • /
    • 1998
  • 본 논문에서는 트리 기반 모델링 기법 중 하나인 CART(Classification And Regression Trees) 방법을 이용하여, 운율구 추출, 운율구 사이의 휴지 기간, 음소 지속 시간을 모델링 하고자 한다. 총 400문장(약 33분)의 코퍼스를 수집한 후, 그 중 240문장(약 20분)을 이용하여 결정 트리와 회귀 트리를 학습시키고 160문장(약 13분)에 대해 실험하였다. 운율구 경계를 결정하는 결정 트리의 오류율은 14.6%이었고, 운율구 사이의 휴지 기간과 음소 지속 시간을 예측하는 회귀 트리들의 평균 제곱 오류근(RMSE)이 각각 132.61msec, 21.97msec이었다.

A research on the key factors for classification of diabetes based on random forest

  • Shin, Yong sub;Lee, Namju;Hwang, Chigon
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.12 no.3
    • /
    • pp.102-107
    • /
    • 2020
  • Recently, the number of people visiting the hospital is increasing due to diabetes. According to the Korean Diabetes Association, statistically, 1 in 7 adults over the age of 30 are suffering from diabetes. As such, diabetes is one of the most common diseases among modern people. In this paper, in addition to blood sugar, which is widely used for diabetes awareness, BMI, which is known to be related to diabetes, triglycerides and cholesterol that cause various complications in diabetics it was studied using random forest techniques and decision trees known to be effective for classification. The importance of each element was confirmed using the results and characteristic importance derived using two techniques. Through this, we studied the diabetes-related relationship between BMI, triglyceride, and cholesterol as well as blood sugar, a factor that diabetic patients should pay much attention to.

An Application of Decision Tree Method for Fault Diagnosis of Induction Motors

  • Tran, Van Tung;Yang, Bo-Suk;Oh, Myung-Suck
    • Proceedings of the Korea Committee for Ocean Resources and Engineering Conference
    • /
    • 2006.11a
    • /
    • pp.54-59
    • /
    • 2006
  • Decision tree is one of the most effective and widely used methods for building classification model. Researchers from various disciplines such as statistics, machine learning, pattern recognition, and data mining have considered the decision tree method as an effective solution to their field problems. In this paper, an application of decision tree method to classify the faults of induction motors is proposed. The original data from experiment is dealt with feature calculation to get the useful information as attributes. These data are then assigned the classes which are based on our experience before becoming data inputs for decision tree. The total 9 classes are defined. An implementation of decision tree written in Matlab is used for these data.

  • PDF

Splitting Decision Tree Nodes with Multiple Target Variables (의사결정나무에서 다중 목표변수를 고려한)

  • 김성준
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2003.05a
    • /
    • pp.243-246
    • /
    • 2003
  • Data mining is a process of discovering useful patterns for decision making from an amount of data. It has recently received much attention in a wide range of business and engineering fields Classifying a group into subgroups is one of the most important subjects in data mining Tree-based methods, known as decision trees, provide an efficient way to finding classification models. The primary concern in tree learning is to minimize a node impurity, which is evaluated using a target variable in the data set. However, there are situations where multiple target variables should be taken into account, for example, such as manufacturing process monitoring, marketing science, and clinical and health analysis. The purpose of this article is to present several methods for measuring the node impurity, which are applicable to data sets with multiple target variables. For illustrations, numerical examples are given with discussion.

  • PDF

A methodology for Internet Customer segmentation using Decision Trees

  • Cho, Y.B.;Kim, S.H.
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2003.05a
    • /
    • pp.206-213
    • /
    • 2003
  • Application of existing decision tree algorithms for Internet retail customer classification is apt to construct a bushy tree due to imprecise source data. Even excessive analysis may not guarantee the effectiveness of the business although the results are derived from fully detailed segments. Thus, it is necessary to determine the appropriate number of segments with a certain level of abstraction. In this study, we developed a stopping rule that considers the total amount of information gained while generating a rule tree. In addition to forwarding from root to intermediate nodes with a certain level of abstraction, the decision tree is investigated by the backtracking pruning method with misclassification loss information.

  • PDF

An Approach to the Spectral Signature Analysis and Supervised Classification for Forest Damages - An Assessment of Low Altitued Airborne MSS Data -

  • Kim, Choen
    • Korean Journal of Remote Sensing
    • /
    • v.7 no.2
    • /
    • pp.149-163
    • /
    • 1991
  • This paper discusses the capabilities of airborne remotely sensed data to detect and classify forest damades. In this work the AMS (Aircraft Multiband Scanner) was used to obtain digital imagery at 300m altitude for forest damage inventory in the Black Forest of Germany. MSS(Multispectral Scanner) digital numbers were converted to spectral emittance and radiance values in 8 spectral bands from the visible to the thermal infrared and submitted to a maximum-likelihood classification for : (1) tree species ; and. (2) damage classes. As expected, the resulted, the results of MSS data with high spatial resolution 0.75m$\times$0.75m enabled the detection and identification of single trees with different damages and were nearly equivalent to the truth information of ground checked data.

Ensemble Methods Applied to Classification Problem

  • Kim, ByungJoo
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.11 no.1
    • /
    • pp.47-53
    • /
    • 2019
  • The idea of ensemble learning is to train multiple models, each with the objective to predict or classify a set of results. Most of the errors from a model's learning are from three main factors: variance, noise, and bias. By using ensemble methods, we're able to increase the stability of the final model and reduce the errors mentioned previously. By combining many models, we're able to reduce the variance, even when they are individually not great. In this paper we propose an ensemble model and applied it to classification problem. In iris, Pima indian diabeit and semiconductor fault detection problem, proposed model classifies well compared to traditional single classifier that is logistic regression, SVM and random forest.

Abnormality Detection Control System using Charging Data (충전데이터를 이용한 이상감지 제어시스템)

  • Moon, Sang-Ho
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.2
    • /
    • pp.313-316
    • /
    • 2022
  • In this paper, we implement a system that detects abnormalities in the charging data transmitted from the charger during the charging process of electric vehicles and controls them remotely. Using classification algorithms such as logistic regression, KNN, SVM, and decision trees, to do this, an analysis model is created that judges the data received from the charger as normal and abnormal. In addition, a model is created to determine the cause of the abnormality using the existing charging data based on the analysis of the type of charger abnormality. Finally, it is solved using unsupervised learning method to find new patterns of abnormal data.

Spectral Mixture Analysis Using Modified IEA Algorithm for Forest Classification (수정된 IEA 기반의 분광혼합분석 기법을 이용한 임상분류)

  • Song, Ahram;Han, Youkyung;Kim, Younghyun;Kim, Yongil
    • Korean Journal of Remote Sensing
    • /
    • v.30 no.2
    • /
    • pp.219-226
    • /
    • 2014
  • Fractional values resulted from the spectral mixture analysis could be used to classify not only urban area with various materials but also forest area in more detailed spatial scale. Especially South Korea is largely consist of mixed forest, so the spectral mixture analysis is suitable as a classification method. For the successful classification using spectral mixture analysis, extraction of optimal endmembers is prerequisite process. Though geometric endmember selection has been widely used, it is barely suitable for forest area. Therefore, in this study, we modified Iterative Error Analysis (IEA), one of the most famous algorithms of image endmember selection which extracts pure pixel directly from the image. The endmembers which represent deciduous and coniferous trees are automatically extracted. The experiments were implemented on two sites of Compact Airborne Spectrographic Imager (CASI) and classified forest area into two types. Accuracies of each classification results were 86% and 90%, which mean proposed algorithm effectively extracted proper endmembers. For the more accurate classification, another substances like forest gap should be considered.