• 제목/요약/키워드: classification trees

검색결과 313건 처리시간 0.024초

Modeling of Environmental Survey by Decision Trees

  • 박희창;조광현
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 한국데이터정보과학회 2004년도 추계학술대회
    • /
    • pp.63-75
    • /
    • 2004
  • The decision tree approach is most useful in classification problems and to divide the search space into rectangular regions. Decision tree algorithms are used extensively for data mining in many domains such as retail target marketing, fraud dection, data reduction and variable screening, category merging, etc. We analyze Gyeongnam social indicator survey data using decision tree techniques for environmental information. We can use these decision tree outputs for environmental preservation and improvement.

  • PDF

Modeling of Environmental Survey by Decision Trees

  • Park, Hee-Chang;Cho, Kwang-Hyun
    • Journal of the Korean Data and Information Science Society
    • /
    • 제15권4호
    • /
    • pp.759-771
    • /
    • 2004
  • The decision tree approach is most useful in classification problems and to divide the search space into rectangular regions. Decision tree algorithms are used extensively for data mining in many domains such as retail target marketing, fraud dection, data reduction and variable screening, category merging, etc. We analyze Gyeongnam social indicator survey data using decision tree techniques for environmental information. We can use these decision tree outputs for environmental preservation and improvement.

  • PDF

의사결정트리의 분류 정확도 향상 (Classification Accuracy Improvement for Decision Tree)

  • 메하리 마르타 레제네;박상현
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2017년도 춘계학술발표대회
    • /
    • pp.787-790
    • /
    • 2017
  • Data quality is the main issue in the classification problems; generally, the presence of noisy instances in the training dataset will not lead to robust classification performance. Such instances may cause the generated decision tree to suffer from over-fitting and its accuracy may decrease. Decision trees are useful, efficient, and commonly used for solving various real world classification problems in data mining. In this paper, we introduce a preprocessing technique to improve the classification accuracy rates of the C4.5 decision tree algorithm. In the proposed preprocessing method, we applied the naive Bayes classifier to remove the noisy instances from the training dataset. We applied our proposed method to a real e-commerce sales dataset to test the performance of the proposed algorithm against the existing C4.5 decision tree classifier. As the experimental results, the proposed method improved the classification accuracy by 8.5% and 14.32% using training dataset and 10-fold crossvalidation, respectively.

사상체질 분류모형 개발 및 진단시스템의 구현에 관한 연구 (Study on Development of Classification Model and Implementation for Diagnosis System of Sasang Constitution)

  • 범수균;전미란;오암석
    • 한국정보통신학회:학술대회논문집
    • /
    • 한국해양정보통신학회 2008년도 지능정보 및 응용 학술대회
    • /
    • pp.155-159
    • /
    • 2008
  • 본 논문에서는 사상체질분류검사 설문지를 이용하여 사상체질을 진단할 때 진단의 정확도를 향상시키기 위한 사상체질 분류모형을 개발하기 위하여 데이터마이닝의 주요 분류기법인 판별분석(discriminant analysis), 의사결정나무(decision tree analysis), 신경망분석(neural network analysis), 로지스틱 회귀분석(logistic regression analysis), 군집분석(clustering analysis) 등 다양한 분류분석모형을 이용한다. 본 연구에서는 분류의 비교적 정확도가 우수하며, 특히 분석과정을 쉽게 이해하고 설명할 수 있다는 점과 구현이 용이하다는 장점을 가지고 있는 판별분석모형과 의사결정나무분석모형을 기반으로 사상체질 분류모형을 개발하고, 두 분류모형을 적용한 사상체질 진단시스템을 구현하였다.

  • PDF

UAV 영상과 SfM 기술을 이용한 가로수의 탄소저장량 추정 (Estimation Carbon Storage of Urban Street trees Using UAV Imagery and SfM Technique)

  • 김다슬;이동근;허한결
    • 한국환경복원기술학회지
    • /
    • 제22권6호
    • /
    • pp.1-14
    • /
    • 2019
  • Carbon storage is one of the regulating ecosystem services provided by urban street trees. It is important that evaluating the economic value of ecosystem services accurately. The carbon storage of street trees was calculated by measuring the morphological parameter on the field. As the method is labor-intensive and time-consuming for the macro-scale research, remote sensing has been more widely used. The airborne Light Detection And Ranging (LiDAR) is used in obtaining the point clouds data of a densely planted area and extracting individual trees for the carbon storage estimation. However, the LiDAR has limitations such as high cost and complicated operations. In addition, trees change over time they need to be frequently. Therefore, Structure from Motion (SfM) photogrammetry with unmanned Aerial Vehicle (UAV) is a more suitable method for obtaining point clouds data. In this paper, a UAV loaded with a digital camera was employed to take oblique aerial images for generating point cloud of street trees. We extracted the diameter of breast height (DBH) from generated point cloud data to calculate the carbon storage. We compared DBH calculated from UAV data and measured data from the field in the selected area. The calculated DBH was used to estimate the carbon storage of street trees in the study area using a regression model. The results demonstrate the feasibility and effectiveness of applying UAV imagery and SfM technique to the carbon storage estimation of street trees. The technique can contribute to efficiently building inventories of the carbon storage of street trees in urban areas.

수중 표적 식별을 위한 앙상블 학습 (Ensemble Learning for Underwater Target Classification)

  • 석종원
    • 한국멀티미디어학회논문지
    • /
    • 제18권11호
    • /
    • pp.1261-1267
    • /
    • 2015
  • The problem of underwater target detection and classification has been attracted a substantial amount of attention and studied from many researchers for both military and non-military purposes. The difficulty is complicate due to various environmental conditions. In this paper, we study classifier ensemble methods for active sonar target classification to improve the classification performance. In general, classifier ensemble method is useful for classifiers whose variances relatively large such as decision trees and neural networks. Bagging, Random selection samples, Random subspace and Rotation forest are selected as classifier ensemble methods. Using the four ensemble methods based on 31 neural network classifiers, the classification tests were carried out and performances were compared.

Tree size determination for classification ensemble

  • Choi, Sung Hoon;Kim, Hyunjoong
    • Journal of the Korean Data and Information Science Society
    • /
    • 제27권1호
    • /
    • pp.255-264
    • /
    • 2016
  • Classification is a predictive modeling for a categorical target variable. Various classification ensemble methods, which predict with better accuracy by combining multiple classifiers, became a powerful machine learning and data mining paradigm. Well-known methodologies of classification ensemble are boosting, bagging and random forest. In this article, we assume that decision trees are used as classifiers in the ensemble. Further, we hypothesized that tree size affects classification accuracy. To study how the tree size in uences accuracy, we performed experiments using twenty-eight data sets. Then we compare the performances of ensemble algorithms; bagging, double-bagging, boosting and random forest, with different tree sizes in the experiment.

Effectiveness of Repeated Examination to Diagnose Enterobiasis in Nursery School Groups

  • Remm, Mare;Remm, Kalle
    • Parasites, Hosts and Diseases
    • /
    • 제47권3호
    • /
    • pp.235-241
    • /
    • 2009
  • The aim of this study was to estimate the benefit from repeated examinations in the diagnosis of enterobiasis in nursery school groups, and to test the effectiveness of individual-based risk predictions using different methods. A total of 604 children were examined using double, and 96 using triple, anal swab examinations. The questionnaires for parents, structured observations, and interviews with supervisors were used to identify factors of possible infection risk. In order to model the risk of enterobiasis at individual level, a similarity-based machine learning and prediction software Constud was compared with data mining methods in the Statistica 8 Data Miner software package. Prevalence according to a single examination was 22.5%; the increase as a result of double examinations was 8.2%. Single swabs resulted in an estimated prevalence of 20.1% among children examined 3 times; double swabs increased this by 10.1%, and triple swabs by 7.3%. Random forest classification, boosting classification trees, and Constud correctly predicted about 2/3 of the results of the second examination. Constud estimated a mean prevalence of 31.5% in groups. Constud was able to yield the highest overall fit of individual-based predictions while boosting classification tree and random forest models were more effective in recognizing Enterobius positive persons. As a rule, the actual prevalence of enterobiasis is higher than indicated by a single examination. We suggest using either the values of the mean increase in prevalence after double examinations compared to single examinations or group estimations deduced from individual-level modelled risk predictions.

최근 MODIS 식생지수 자료(2006-2008)를 이용한 동아시아 지역 지면피복 분류 (Land Cover Classification over East Asian Region Using Recent MODIS NDVI Data (2006-2008))

  • 강전호;서명석;곽종흠
    • 대기
    • /
    • 제20권4호
    • /
    • pp.415-426
    • /
    • 2010
  • A Land cover map over East Asian region (Kongju national university Land Cover map: KLC) is classified by using support vector machine (SVM) and evaluated with ground truth data. The basic input data are the recent three years (2006-2008) of MODIS (MODerate Imaging Spectriradiometer) NDVI (normalized difference vegetation index) data. The spatial resolution and temporal frequency of MODIS NDVI are 1km and 16 days, respectively. To minimize the number of cloud contaminated pixels in the MODIS NDVI data, the maximum value composite is applied to the 16 days data. And correction of cloud contaminated pixels based on the spatiotemporal continuity assumption are applied to the monthly NDVI data. To reduce the dataset and improve the classification quality, 9 phenological data, such as, NDVI maximum, amplitude, average, and others, derived from the corrected monthly NDVI data. The 3 types of land cover maps (International Geosphere Biosphere Programme: IGBP, University of Maryland: UMd, and MODIS) were used to build up a "quasi" ground truth data set, which were composed of pixels where the three land cover maps classified as the same land cover type. The classification results show that the fractions of broadleaf trees and grasslands are greater, but those of the croplands and needleleaf trees are smaller compared to those of the IGBP or UMd. The validation results using in-situ observation database show that the percentages of pixels in agreement with the observations are 80%, 77%, 63%, 57% in MODIS, KLC, IGBP, UMd land cover data, respectively. The significant differences in land cover types among the MODIS, IGBP, UMd and KLC are mainly occurred at the southern China and Manchuria, where most of pixels are contaminated by cloud and snow during summer and winter, respectively. It shows that the quality of raw data is one of the most important factors in land cover classification.

다층 퍼셉트론과 마코프 랜덤 필드 모델을 이용한 베이지안 결 분할 (Bayesian Texture Segmentation Using Multi-layer Perceptron and Markov Random Field Model)

  • 김태형;엄일규;김유신
    • 대한전자공학회논문지SP
    • /
    • 제44권1호
    • /
    • pp.40-48
    • /
    • 2007
  • 이 논문은 다중 스케일 베이지안 관점에서 다층 퍼셉트론과 마코프 랜덤 필드를 사용한 새로운 결 분할 방법을 제안한다. 다층 퍼셉트론의 출력은 사후 확률을 모델링하므로 본 논문에서는 다중 스케일 웨이블릿 계수들을 다층 퍼셉트론의 입력으로 사용한다. 다층 퍼셉트론으로부터 구한 사후 확률과 MAP (maximum a posterior) 분류를 이용하여 각 스케일에서 결 분류를 수행한다. 또한 가장 섬세한 스케일에서 더 개선된 분할 결과를 얻기 위하여 모든 스케일에서 MAP 분류 결과들을 거친 스케일에서 섬세한 스케일까지 차례로 융합한다. 이런 과정은 한 스케일에서의 분류 정보와 그 인접한 보다 거친 스케일에서 얻어지는 문맥과 관련한 연역적 정보를 이용하여 MAP 분류를 행함으로써 이루어진다. 이 융합 과정에서, MRF (Markov random fields) 사전 모델이 평탄화 제한자로서 동작하고, 깁스 샘플러 (Gibbs sampler)는 MAP 분류기로서 동작한다. 제안한 분할 방법은 HMT (Hidden Markov Trees) 모델과 HMTseg 알고리즘을 이용한 결 분할 방법보다 더 좋은 성능을 보인다.