• Title/Summary/Keyword: classification trees

Search Result 313, Processing Time 0.026 seconds

Classification of Class-Imbalanced Data: Effect of Over-sampling and Under-sampling of Training Data (계급불균형자료의 분류: 훈련표본 구성방법에 따른 효과)

  • 김지현;정종빈
    • The Korean Journal of Applied Statistics
    • /
    • v.17 no.3
    • /
    • pp.445-457
    • /
    • 2004
  • Given class-imbalanced data in two-class classification problem, we often do over-sampling and/or under-sampling of training data to make it balanced. We investigate the validity of such practice. Also we study the effect of such sampling practice on boosting of classification trees. Through experiments on twelve real datasets it is observed that keeping the natural distribution of training data is the best way if you plan to apply boosting methods to class-imbalanced data.

A Estimation on the Annual Growth in Diameter of Pitch Pine (Pinus rigida Mill.) Stand (리기다 소나무림(林)의 직경연년성장량(直徑連年成長量) 추정(推定))

  • Lee, Yeo Ha
    • Journal of Korean Society of Forest Science
    • /
    • v.17 no.1
    • /
    • pp.23-28
    • /
    • 1973
  • In this survey, to estimate volume growth of pitch pine (Pinus rigida Mill) stand, diameter growth was estimated. Among 223 sample trees, the number of rejected trees was 12, about 5 percent of total sample trees. The stand showed uniform growth and rejected trees included insect-damaged trees. Compared with reports made on forest classification basis, pitch pine a single species showed faster growth. Minimum, average and maximum value of D.B.H and mean annual diameter growth, of 16 year old pitch pine stand, were as follow. The correlation of each factors estimated by 9590 of confidence interval is shown in the table of correlation as bellow.

  • PDF

Fast Scene Understanding in Urban Environments for an Autonomous Vehicle equipped with 2D Laser Scanners (무인 자동차의 2차원 레이저 거리 센서를 이용한 도시 환경에서의 빠른 주변 환경 인식 방법)

  • Ahn, Seung-Uk;Choe, Yun-Geun;Chung, Myung-Jin
    • The Journal of Korea Robotics Society
    • /
    • v.7 no.2
    • /
    • pp.92-100
    • /
    • 2012
  • A map of complex environment can be generated using a robot carrying sensors. However, representation of environments directly using the integration of sensor data tells only spatial existence. In order to execute high-level applications, robots need semantic knowledge of the environments. This research investigates the design of a system for recognizing objects in 3D point clouds of urban environments. The proposed system is decomposed into five steps: sequential LIDAR scan, point classification, ground detection and elimination, segmentation, and object classification. This method could classify the various objects in urban environment, such as cars, trees, buildings, posts, etc. The simple methods minimizing time-consuming process are developed to guarantee real-time performance and to perform data classification on-the-fly as data is being acquired. To evaluate performance of the proposed methods, computation time and recognition rate are analyzed. Experimental results demonstrate that the proposed algorithm has efficiency in fast understanding the semantic knowledge of a dynamic urban environment.

A Machine learning Approach for Knowledge Base Construction Incorporating GIS Data for land Cover Classification of Landsat ETM+ Image (지식 기반 시스템에서 GIS 자료를 활용하기 위한 기계 학습 기법에 관한 연구 - Landsat ETM+ 영상의 토지 피복 분류를 사례로)

  • Kim, Hwa-Hwan;Ku, Cha-Yang
    • Journal of the Korean Geographical Society
    • /
    • v.43 no.5
    • /
    • pp.761-774
    • /
    • 2008
  • Integration of GIS data and human expert knowledge into digital image processing has long been acknowledged as a necessity to improve remote sensing image analysis. We propose inductive machine learning algorithm for GIS data integration and rule-based classification method for land cover classification. Proposed method is tested with a land cover classification of a Landsat ETM+ multispectral image and GIS data layers including elevation, aspect, slope, distance to water bodies, distance to road network, and population density. Decision trees and production rules for land cover classification are generated by C5.0 inductive machine learning algorithm with 350 stratified random point samples. Production rules are used for land cover classification integrated with unsupervised ISODATA classification. Result shows that GIS data layers such as elevation, distance to water bodies and population density can be effectively integrated for rule-based image classification. Intuitive production rules generated by inductive machine learning are easy to understand. Proposed method demonstrates how various GIS data layers can be integrated with remotely sensed imagery in a framework of knowledge base construction to improve land cover classification.

Development of medical/electrical convergence software for classification between normal and pathological voices (장애 음성 판별을 위한 의료/전자 융복합 소프트웨어 개발)

  • Moon, Ji-Hye;Lee, JiYeoun
    • Journal of Digital Convergence
    • /
    • v.13 no.12
    • /
    • pp.187-192
    • /
    • 2015
  • If the software is developed to analyze the speech disorder, the application of various converged areas will be very high. This paper implements the user-friendly program based on CART(Classification and regression trees) analysis to distinguish between normal and pathological voices utilizing combination of the acoustical and HOS(Higher-order statistics) parameters. It means convergence between medical information and signal processing. Then the acoustical parameters are Jitter(%) and Shimmer(%). The proposed HOS parameters are means and variances of skewness(MOS and VOS) and kurtosis(MOK and VOK). Database consist of 53 normal and 173 pathological voices distributed by Kay Elemetrics. When the acoustical and proposed parameters together are used to generate the decision tree, the average accuracy is 83.11%. Finally, we developed a program with more user-friendly interface and frameworks.

Development of Predictive Model of Social Activity for the Elderly in Korea using CRT Algorithm (CRT 알고리즘을 이용한 우리나라 노인의 사회활동 영향요인 예측 모형 개발)

  • Byeon, Haewon
    • Journal of the Korea Convergence Society
    • /
    • v.9 no.10
    • /
    • pp.243-248
    • /
    • 2018
  • The social activities of the elderly are important in successfully achieving aging by providing opportunities for social interaction to enhance life satisfaction. The purpose of this study is to identify the related factors of the elderly social activities and build a statistical classification model to predict social activities. Subjects were 1,864 elderly people (829 males, 1,035 females) who completed the community health survey in 2015. Outcome variables were defined as the experience of social activity during the past month(yes, no). The prediction model was constructed using decision tree model based on Classification and Regression Trees (CRT) algorithm. The results of this study were subjective health, frequency of meeting with neighbors, frequency of meeting with relatives, and living with spouse were significant variables of social participation. The most prevalent predictor was the subjective health level. In order to prepare for the successful aging of the super aged society based on the results of this study, social attention and support for the social activities of the elderly are required.

Asian Ethnic Group Classification Model Using Data Mining (데이터마이닝 방법을 이용한 아시아 민족 분류 모형 구축)

  • Kim, Yoon Geon;Lee, Ji Hyun;Cho, Sohee;Kim, Moon Young;Lee, Soong Deok;Ha, Eun Ho;Ahn, Jae Joon
    • The Korean Journal of Legal Medicine
    • /
    • v.41 no.2
    • /
    • pp.32-40
    • /
    • 2017
  • In addition to identifying genetic differences between target populations, it is also important to determine the impact of genetic differences with regard to the respective target populations. In recent years, there has been an increasing number of cases where this approach is needed, and thus various statistical methods must be considered. In this study, genetic data from populations of Southeast and Southwest Asia were collected, and several statistical approaches were evaluated on the Y-chromosome short tandem repeat data. In order to develop a more accurate and practical classification model, we applied gradient boosting and ensemble techniques. To infer between the Southeast and Southwest Asian populations, the overall performance of the classification models was better than that of the decision trees and regression models used in the past. In conclusion, this study suggests that additional statistical approaches, such as data mining techniques, could provide more useful interpretations for forensic analyses. These trials are expected to be the basis for further studies extending from target regions to the entire continent of Asia as well as the use of additional genes such as mitochondrial genes.

Regression Trees with. Unbiased Variable Selection (변수선택 편향이 없는 회귀나무를 만들기 위한 알고리즘)

  • 김진흠;김민호
    • The Korean Journal of Applied Statistics
    • /
    • v.17 no.3
    • /
    • pp.459-473
    • /
    • 2004
  • It has well known that an exhaustive search algorithm suggested by Breiman et. a1.(1984) has a trend to select the variable having relatively many possible splits as an splitting rule. We propose an algorithm to overcome this variable selection bias problem and then construct unbiased regression trees based on the algorithm. The proposed algorithm runs two steps of selecting a split variable and determining a split rule for binary split based on the split variable. Simulation studies were performed to compare the proposed algorithm with Breiman et a1.(1984)'s CART(Classification and Regression Tree) in terms of degree of variable selection bias, variable selection power, and MSE(Mean Squared Error). Also, we illustrate the proposed algorithm with real data sets.

Analysis of effects of burning in grasslands with quantifying succession stages by life-history traits in Kirigamine, central Japan

  • Kato, Jun;Kawakami, Mihoko
    • Journal of Ecology and Environment
    • /
    • v.36 no.1
    • /
    • pp.101-112
    • /
    • 2013
  • To quantitatively analyze the effects of burning, we conducted a vegetation survey in the grasslands in Kirigamine, central Japan. We classified each species into stages of succession based on the life-history traits of the species and defined the score of the species in each stand based on the classification. We weighted the scores with a v-value, the product of coverage and height in the quadrat, and summed them to calculate the index of dynamic status. With these indices, we were able to quantitatively compare the stands in the study area and discern minute differences between the stands with different lengths of restoration periods since the disturbance of burning. These indices correlated with the v-value of trees, suggesting that the disturbance of burning seemed to affect the trees in the stand. We then calculated the growth of the tree species Pinus densiflora to evaluate its contribution to the index of dynamic status.

Current Status of Phytoplasmas and their Related Diseases in Korea

  • Jung, Hee-Young;Win, Nang Kyu Kyu;Kim, Young-Hwan
    • The Plant Pathology Journal
    • /
    • v.28 no.3
    • /
    • pp.239-247
    • /
    • 2012
  • Phytoplasmas have been associated with more than 46 plant species in Korea. Several vegetables, ornamentals, fruit trees and other crop species are affected by phytoplasma diseases. Six 16Sr groups of phytoplasmas have been identified and these phytoplasmas are associated with 63 phytoplasma diseases. Aster yellows phytoplasmas are the most prevalent group and has been associated with more than 25 diseases in Korea. Jujube witches' broom, paulownia witches' broom and mulberry dwarf diseases cause economic losses to host trees throughout the country. So far, Korean phytoplasmas belong to six species of 'Candidatus Phytoplasma'; 'Ca. P. asteris', 'Ca. P. pruni$^*$', 'Ca. P. ziziphi', 'Ca. P. trifolii', 'Ca. P. solani$^*$' and 'Ca. P. castaneae'. The diseases are distributed throughout the country and most of them were observed in Gyeongbuk and Chonbuk provinces. At least four insect vectors; Cyrtopeltis tenuis, Hishimonus sellatus, Macrosteles striifrons and Ophiola flavopicta have been identified for phytoplasma transmission.