• 제목/요약/키워드: classification trees

검색결과 313건 처리시간 0.021초

선별 시스템 기반 표지 유전자를 포함한 난소암 마이크로어레이 데이터 분류 (Classification of Ovarian Cancer Microarray Data based on Intelligent Systems with Marker gene)

  • 박수영;정채영
    • 한국정보통신학회논문지
    • /
    • 제15권3호
    • /
    • pp.747-752
    • /
    • 2011
  • 마이크로어레이 분류는 전형적으로 분류기 디자인과 에러 추정이 현저하게 작은 샘플에 기반한다는 것과 교차 검증 에러 추정이 대다수의 논문에 사용된다는 주목할 만한 두 가지 특징을 소유한다. 마이크로어레이 난소 암 데이터는 수 만개의 유전자 발현으로 구성되어 있고, 이러한 정보를 동시에 분석하기 위한 어떤 체계적인 절차도 없다. 본 논문에서는, 통계에 따라 유전자의 우선순위를 정함으로써 표지유전자를 선택하였고, 널리 보급되어 있는 분류 규칙인 선형 분류 분석, 3-nearest-neighbor와 결정 트리 알고리즘은 표지 유전자를 선택한 데이터와 선택하지 않는 데이터의 분류 정확도 비교를 위해 사용되어졌다. ANOVA를 이용하여 선택된 표지 유전자를 포함하는 마이크로어레이 데이터 셋에 선영 분류분석 규칙을 적용한 결과 97.78%의 가장 높은 분류 정확도와 가장 낮은 예측 에러 추정치를 나타내었다.

CART를 이용한 운율구 추출 및 음소 지속 시간 모델링 (The Modelling of Prosodic Phrasing and Segmental Duration using CART)

  • 이상호
    • 한국음향학회:학술대회논문집
    • /
    • 한국음향학회 1998년도 학술발표대회 논문집 제17권 1호
    • /
    • pp.135-138
    • /
    • 1998
  • 본 논문에서는 트리 기반 모델링 기법 중 하나인 CART(Classification And Regression Trees) 방법을 이용하여, 운율구 추출, 운율구 사이의 휴지 기간, 음소 지속 시간을 모델링 하고자 한다. 총 400문장(약 33분)의 코퍼스를 수집한 후, 그 중 240문장(약 20분)을 이용하여 결정 트리와 회귀 트리를 학습시키고 160문장(약 13분)에 대해 실험하였다. 운율구 경계를 결정하는 결정 트리의 오류율은 14.6%이었고, 운율구 사이의 휴지 기간과 음소 지속 시간을 예측하는 회귀 트리들의 평균 제곱 오류근(RMSE)이 각각 132.61msec, 21.97msec이었다.

A research on the key factors for classification of diabetes based on random forest

  • Shin, Yong sub;Lee, Namju;Hwang, Chigon
    • International Journal of Internet, Broadcasting and Communication
    • /
    • 제12권3호
    • /
    • pp.102-107
    • /
    • 2020
  • Recently, the number of people visiting the hospital is increasing due to diabetes. According to the Korean Diabetes Association, statistically, 1 in 7 adults over the age of 30 are suffering from diabetes. As such, diabetes is one of the most common diseases among modern people. In this paper, in addition to blood sugar, which is widely used for diabetes awareness, BMI, which is known to be related to diabetes, triglycerides and cholesterol that cause various complications in diabetics it was studied using random forest techniques and decision trees known to be effective for classification. The importance of each element was confirmed using the results and characteristic importance derived using two techniques. Through this, we studied the diabetes-related relationship between BMI, triglyceride, and cholesterol as well as blood sugar, a factor that diabetic patients should pay much attention to.

An Application of Decision Tree Method for Fault Diagnosis of Induction Motors

  • Tran, Van Tung;Yang, Bo-Suk;Oh, Myung-Suck
    • 한국해양공학회:학술대회논문집
    • /
    • 한국해양공학회 2006년 창립20주년기념 정기학술대회 및 국제워크샵
    • /
    • pp.54-59
    • /
    • 2006
  • Decision tree is one of the most effective and widely used methods for building classification model. Researchers from various disciplines such as statistics, machine learning, pattern recognition, and data mining have considered the decision tree method as an effective solution to their field problems. In this paper, an application of decision tree method to classify the faults of induction motors is proposed. The original data from experiment is dealt with feature calculation to get the useful information as attributes. These data are then assigned the classes which are based on our experience before becoming data inputs for decision tree. The total 9 classes are defined. An implementation of decision tree written in Matlab is used for these data.

  • PDF

의사결정나무에서 다중 목표변수를 고려한 (Splitting Decision Tree Nodes with Multiple Target Variables)

  • 김성준
    • 한국지능시스템학회:학술대회논문집
    • /
    • 한국퍼지및지능시스템학회 2003년도 춘계 학술대회 학술발표 논문집
    • /
    • pp.243-246
    • /
    • 2003
  • Data mining is a process of discovering useful patterns for decision making from an amount of data. It has recently received much attention in a wide range of business and engineering fields Classifying a group into subgroups is one of the most important subjects in data mining Tree-based methods, known as decision trees, provide an efficient way to finding classification models. The primary concern in tree learning is to minimize a node impurity, which is evaluated using a target variable in the data set. However, there are situations where multiple target variables should be taken into account, for example, such as manufacturing process monitoring, marketing science, and clinical and health analysis. The purpose of this article is to present several methods for measuring the node impurity, which are applicable to data sets with multiple target variables. For illustrations, numerical examples are given with discussion.

  • PDF

A methodology for Internet Customer segmentation using Decision Trees

  • Cho, Y.B.;Kim, S.H.
    • 한국지능정보시스템학회:학술대회논문집
    • /
    • 한국지능정보시스템학회 2003년도 춘계학술대회
    • /
    • pp.206-213
    • /
    • 2003
  • Application of existing decision tree algorithms for Internet retail customer classification is apt to construct a bushy tree due to imprecise source data. Even excessive analysis may not guarantee the effectiveness of the business although the results are derived from fully detailed segments. Thus, it is necessary to determine the appropriate number of segments with a certain level of abstraction. In this study, we developed a stopping rule that considers the total amount of information gained while generating a rule tree. In addition to forwarding from root to intermediate nodes with a certain level of abstraction, the decision tree is investigated by the backtracking pruning method with misclassification loss information.

  • PDF

An Approach to the Spectral Signature Analysis and Supervised Classification for Forest Damages - An Assessment of Low Altitued Airborne MSS Data -

  • Kim, Choen
    • 대한원격탐사학회지
    • /
    • 제7권2호
    • /
    • pp.149-163
    • /
    • 1991
  • This paper discusses the capabilities of airborne remotely sensed data to detect and classify forest damades. In this work the AMS (Aircraft Multiband Scanner) was used to obtain digital imagery at 300m altitude for forest damage inventory in the Black Forest of Germany. MSS(Multispectral Scanner) digital numbers were converted to spectral emittance and radiance values in 8 spectral bands from the visible to the thermal infrared and submitted to a maximum-likelihood classification for : (1) tree species ; and. (2) damage classes. As expected, the resulted, the results of MSS data with high spatial resolution 0.75m$\times$0.75m enabled the detection and identification of single trees with different damages and were nearly equivalent to the truth information of ground checked data.

Ensemble Methods Applied to Classification Problem

  • Kim, ByungJoo
    • International Journal of Internet, Broadcasting and Communication
    • /
    • 제11권1호
    • /
    • pp.47-53
    • /
    • 2019
  • The idea of ensemble learning is to train multiple models, each with the objective to predict or classify a set of results. Most of the errors from a model's learning are from three main factors: variance, noise, and bias. By using ensemble methods, we're able to increase the stability of the final model and reduce the errors mentioned previously. By combining many models, we're able to reduce the variance, even when they are individually not great. In this paper we propose an ensemble model and applied it to classification problem. In iris, Pima indian diabeit and semiconductor fault detection problem, proposed model classifies well compared to traditional single classifier that is logistic regression, SVM and random forest.

충전데이터를 이용한 이상감지 제어시스템 (Abnormality Detection Control System using Charging Data)

  • Moon, Sang-Ho
    • 한국정보통신학회논문지
    • /
    • 제26권2호
    • /
    • pp.313-316
    • /
    • 2022
  • In this paper, we implement a system that detects abnormalities in the charging data transmitted from the charger during the charging process of electric vehicles and controls them remotely. Using classification algorithms such as logistic regression, KNN, SVM, and decision trees, to do this, an analysis model is created that judges the data received from the charger as normal and abnormal. In addition, a model is created to determine the cause of the abnormality using the existing charging data based on the analysis of the type of charger abnormality. Finally, it is solved using unsupervised learning method to find new patterns of abnormal data.

수정된 IEA 기반의 분광혼합분석 기법을 이용한 임상분류 (Spectral Mixture Analysis Using Modified IEA Algorithm for Forest Classification)

  • 송아람;한유경;김용현;김용일
    • 대한원격탐사학회지
    • /
    • 제30권2호
    • /
    • pp.219-226
    • /
    • 2014
  • 분광혼합분석 결과로 얻어지는 각 물체의 점유비율을 활용하면 보다 세밀한 분류가 가능하다. 이는 복잡한 도심지역의 피복분류 뿐만 아니라 혼효림이 많은 한반도 임상분류에 적합한 분류기법이 될 수 있다. 효과적인 임상분류를 위해서는 무엇보다 적절한 endmember의 추출이 선행되어야 하는데, 기존에 주로 사용되었던 기하학적 방법(geometric endmember selection)은 분광특성이 유사한 산림지역에 적합하지 않다. 본 연구에서는 영상에서 직접 순수한 화소를 추출하는 기법 중의 하나인 IEA(Iterative Error Analysis)와 침엽수와 활엽수의 분광특성을 이용하여 실험지역을 대표할 수 있는 각각의 endmember를 자동으로 추출하였다. CASI(Compact Airborne Spectrographic Imager) 영상의 두 지역에 대하여 분광혼합분석을 이용한 분류를 수행한 결과, 분류 정확도는 각각 86%와 90%로, 제안한 기법이 실험대상지역을 대표하는 침엽수와 활엽수의 endmember를 적절하게 추출한 것으로 나타났다. 분광혼합분석 기법을 이용한 보다 효과적인 분류를 위해서 분류항목 외 기타물질을 endmember로 고려하는 연구가 필요할 것으로 보인다.