• Title/Summary/Keyword: Classification trees

Search Result 317, Processing Time 0.023 seconds

Effectiveness of Repeated Examination to Diagnose Enterobiasis in Nursery School Groups

  • Remm, Mare;Remm, Kalle
    • Parasites, Hosts and Diseases
    • /
    • v.47 no.3
    • /
    • pp.235-241
    • /
    • 2009
  • The aim of this study was to estimate the benefit from repeated examinations in the diagnosis of enterobiasis in nursery school groups, and to test the effectiveness of individual-based risk predictions using different methods. A total of 604 children were examined using double, and 96 using triple, anal swab examinations. The questionnaires for parents, structured observations, and interviews with supervisors were used to identify factors of possible infection risk. In order to model the risk of enterobiasis at individual level, a similarity-based machine learning and prediction software Constud was compared with data mining methods in the Statistica 8 Data Miner software package. Prevalence according to a single examination was 22.5%; the increase as a result of double examinations was 8.2%. Single swabs resulted in an estimated prevalence of 20.1% among children examined 3 times; double swabs increased this by 10.1%, and triple swabs by 7.3%. Random forest classification, boosting classification trees, and Constud correctly predicted about 2/3 of the results of the second examination. Constud estimated a mean prevalence of 31.5% in groups. Constud was able to yield the highest overall fit of individual-based predictions while boosting classification tree and random forest models were more effective in recognizing Enterobius positive persons. As a rule, the actual prevalence of enterobiasis is higher than indicated by a single examination. We suggest using either the values of the mean increase in prevalence after double examinations compared to single examinations or group estimations deduced from individual-level modelled risk predictions.

Land Cover Classification over East Asian Region Using Recent MODIS NDVI Data (2006-2008) (최근 MODIS 식생지수 자료(2006-2008)를 이용한 동아시아 지역 지면피복 분류)

  • Kang, Jeon-Ho;Suh, Myoung-Seok;Kwak, Chong-Heum
    • Atmosphere
    • /
    • v.20 no.4
    • /
    • pp.415-426
    • /
    • 2010
  • A Land cover map over East Asian region (Kongju national university Land Cover map: KLC) is classified by using support vector machine (SVM) and evaluated with ground truth data. The basic input data are the recent three years (2006-2008) of MODIS (MODerate Imaging Spectriradiometer) NDVI (normalized difference vegetation index) data. The spatial resolution and temporal frequency of MODIS NDVI are 1km and 16 days, respectively. To minimize the number of cloud contaminated pixels in the MODIS NDVI data, the maximum value composite is applied to the 16 days data. And correction of cloud contaminated pixels based on the spatiotemporal continuity assumption are applied to the monthly NDVI data. To reduce the dataset and improve the classification quality, 9 phenological data, such as, NDVI maximum, amplitude, average, and others, derived from the corrected monthly NDVI data. The 3 types of land cover maps (International Geosphere Biosphere Programme: IGBP, University of Maryland: UMd, and MODIS) were used to build up a "quasi" ground truth data set, which were composed of pixels where the three land cover maps classified as the same land cover type. The classification results show that the fractions of broadleaf trees and grasslands are greater, but those of the croplands and needleleaf trees are smaller compared to those of the IGBP or UMd. The validation results using in-situ observation database show that the percentages of pixels in agreement with the observations are 80%, 77%, 63%, 57% in MODIS, KLC, IGBP, UMd land cover data, respectively. The significant differences in land cover types among the MODIS, IGBP, UMd and KLC are mainly occurred at the southern China and Manchuria, where most of pixels are contaminated by cloud and snow during summer and winter, respectively. It shows that the quality of raw data is one of the most important factors in land cover classification.

Bayesian Texture Segmentation Using Multi-layer Perceptron and Markov Random Field Model (다층 퍼셉트론과 마코프 랜덤 필드 모델을 이용한 베이지안 결 분할)

  • Kim, Tae-Hyung;Eom, Il-Kyu;Kim, Yoo-Shin
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.44 no.1
    • /
    • pp.40-48
    • /
    • 2007
  • This paper presents a novel texture segmentation method using multilayer perceptron (MLP) networks and Markov random fields in multiscale Bayesian framework. Multiscale wavelet coefficients are used as input for the neural networks. The output of the neural network is modeled as a posterior probability. Texture classification at each scale is performed by the posterior probabilities from MLP networks and MAP (maximum a posterior) classification. Then, in order to obtain the more improved segmentation result at the finest scale, our proposed method fuses the multiscale MAP classifications sequentially from coarse to fine scales. This process is done by computing the MAP classification given the classification at one scale and a priori knowledge regarding contextual information which is extracted from the adjacent coarser scale classification. In this fusion process, the MRF (Markov random field) prior distribution and Gibbs sampler are used, where the MRF model serves as the smoothness constraint and the Gibbs sampler acts as the MAP classifier. The proposed segmentation method shows better performance than texture segmentation using the HMT (Hidden Markov trees) model and HMTseg.

A Study on the Link Between Knowledge and Classification (지식과 분류의 연관성에 관한 연구)

  • 정연경
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.11 no.2
    • /
    • pp.5-23
    • /
    • 2000
  • This study explores the relationships between knowledge and classification. Classification schemes have properties that show the representation of entities and relationships in structures that reflect knowledge being classified. Four representative classifying methods. i. e. hierarchies, trees, paradigms, and faceted analysis those brings new knowledge are analyzed and those strengths and weaknesses are described. Based upon the analysis, the links between knowledge and classification are verified. Finally a better way of representing knowledge structure through classification schemes in the future is suggested.

  • PDF

A Comparative Study of Medical Data Classification Methods Based on Decision Tree and System Reconstruction Analysis

  • Tang, Tzung-I;Zheng, Gang;Huang, Yalou;Shu, Guangfu;Wang, Pengtao
    • Industrial Engineering and Management Systems
    • /
    • v.4 no.1
    • /
    • pp.102-108
    • /
    • 2005
  • This paper studies medical data classification methods, comparing decision tree and system reconstruction analysis as applied to heart disease medical data mining. The data we study is collected from patients with coronary heart disease. It has 1,723 records of 71 attributes each. We use the system-reconstruction method to weight it. We use decision tree algorithms, such as induction of decision trees (ID3), classification and regression tree (C4.5), classification and regression tree (CART), Chi-square automatic interaction detector (CHAID), and exhausted CHAID. We use the results to compare the correction rate, leaf number, and tree depth of different decision-tree algorithms. According to the experiments, we know that weighted data can improve the correction rate of coronary heart disease data but has little effect on the tree depth and leaf number.

Application of Random Forests to Assessment of Importance of Variables in Multi-sensor Data Fusion for Land-cover Classification

  • Park No-Wook;Chi kwang-Hoon
    • Korean Journal of Remote Sensing
    • /
    • v.22 no.3
    • /
    • pp.211-219
    • /
    • 2006
  • A random forests classifier is applied to multi-sensor data fusion for supervised land-cover classification in order to account for the importance of variable. The random forests approach is a non-parametric ensemble classifier based on CART-like trees. The distinguished feature is that the importance of variable can be estimated by randomly permuting the variable of interest in all the out-of-bag samples for each classifier. Two different multi-sensor data sets for supervised classification were used to illustrate the applicability of random forests: one with optical and polarimetric SAR data and the other with multi-temporal Radarsat-l and ENVISAT ASAR data sets. From the experimental results, the random forests approach could extract important variables or bands for land-cover discrimination and showed reasonably good performance in terms of classification accuracy.

Individual-based Competition Analysis for Secondary Forest in Northeast China

  • Li, Fengri;Chen, Dongsheng;Lu, Jun
    • Journal of Korean Society of Forest Science
    • /
    • v.97 no.5
    • /
    • pp.501-507
    • /
    • 2008
  • The data of crown width with 4 directions, DBH, tree height, and coordinate for sample trees were collected from 30 permanent sample plots in secondary fore st of the Maoershan Experimental Forestry Farm, Northeast China. In this paper, the competition of individual trees in stand were discussed for secondary forest by using iterative Hegyi competition index and crown overlap index that represented the competitive and cooperative interactions among neighboring trees. Active competitors of subject tree in the competition zone were selected to calculate the iterative competition index. Using the results of crown classification based on the equal crown projection area, a new distance dependent competition index called crown overlap index (COI) was developed for secondary forest. The COI performed well in describing the crown competition rather than crown competition factor (CCF). The individual-based competition index discussed in this paper will provide more precise for developing individual tree growth models for secondary forest and it can also use to adjust the stand structure for spatial optimal management.

Stream-based Biomedical Classification Algorithms for Analyzing Biosignals

  • Fong, Simon;Hang, Yang;Mohammed, Sabah;Fiaidhi, Jinan
    • Journal of Information Processing Systems
    • /
    • v.7 no.4
    • /
    • pp.717-732
    • /
    • 2011
  • Classification in biomedical applications is an important task that predicts or classifies an outcome based on a given set of input variables such as diagnostic tests or the symptoms of a patient. Traditionally the classification algorithms would have to digest a stationary set of historical data in order to train up a decision-tree model and the learned model could then be used for testing new samples. However, a new breed of classification called stream-based classification can handle continuous data streams, which are ever evolving, unbound, and unstructured, for instance--biosignal live feeds. These emerging algorithms can potentially be used for real-time classification over biosignal data streams like EEG and ECG, etc. This paper presents a pioneer effort that studies the feasibility of classification algorithms for analyzing biosignals in the forms of infinite data streams. First, a performance comparison is made between traditional and stream-based classification. The results show that accuracy declines intermittently for traditional classification due to the requirement of model re-learning as new data arrives. Second, we show by a simulation that biosignal data streams can be processed with a satisfactory level of performance in terms of accuracy, memory requirement, and speed, by using a collection of stream-mining algorithms called Optimized Very Fast Decision Trees. The algorithms can effectively serve as a corner-stone technology for real-time classification in future biomedical applications.

A Decision Tree Algorithm using Genetic Programming

  • Park, Chongsun;Ko, Young Kyong
    • Communications for Statistical Applications and Methods
    • /
    • v.10 no.3
    • /
    • pp.845-857
    • /
    • 2003
  • We explore the use of genetic programming to evolve decision trees directly for classification problems with both discrete and continuous predictors. We demonstrate that the derived hypotheses of standard algorithms can substantially deviated from the optimum. This deviation is partly due to their top-down style procedures. The performance of the system is measured on a set of real and simulated data sets and compared with the performance of well-known algorithms like CHAID, CART, C5.0, and QUEST. Proposed algorithm seems to be effective in handling problems caused by top-down style procedures of existing algorithms.

Bias Reduction in Split Variable Selection in C4.5

  • Shin, Sung-Chul;Jeong, Yeon-Joo;Song, Moon Sup
    • Communications for Statistical Applications and Methods
    • /
    • v.10 no.3
    • /
    • pp.627-635
    • /
    • 2003
  • In this short communication we discuss the bias problem of C4.5 in split variable selection and suggest a method to reduce the variable selection bias among categorical predictor variables. A penalty proportional to the number of categories is applied to the splitting criterion gain of C4.5. The results of empirical comparisons show that the proposed modification of C4.5 reduces the size of classification trees.