• Title/Summary/Keyword: Decision-trees

Search Result 311, Processing Time 0.026 seconds

Analysis of Leaf Node Ranking Methods for Spatial Event Prediction (의사결정트리에서 공간사건 예측을 위한 리프노드 등급 결정 방법 분석)

  • Yeon, Young-Kwang
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.17 no.4
    • /
    • pp.101-111
    • /
    • 2014
  • Spatial events are predictable using data mining classification algorithms. Decision trees have been used as one of representative classification algorithms. And they were normally used in the classification tasks that have label class values. However since using rule ranking methods, spatial prediction have been applied in the spatial prediction problems. This paper compared rule ranking methods for the spatial prediction application using a decision tree. For the comparison experiment, C4.5 decision tree algorithm, and rule ranking methods such as Laplace, M-estimate and m-branch were implemented. As a spatial prediction case study, landslide which is one of representative spatial event occurs in the natural environment was applied. Among the rule ranking methods, in the results of accuracy evaluation, m-branch showed the better accuracy than other methods. However in case of m-brach and M-estimate required additional time-consuming procedure for searching optimal parameter values. Thus according to the application areas, the methods can be selectively used. The spatial prediction using a decision tree can be used not only for spatial predictions, but also for causal analysis in the specific event occurrence location.

MRI Predictors of Malignant Transformation in Patients with Inverted Papilloma: A Decision Tree Analysis Using Conventional Imaging Features and Histogram Analysis of Apparent Diffusion Coefficients

  • Chong Hyun Suh;Jeong Hyun Lee;Mi Sun Chung;Xiao Quan Xu;Yu Sub Sung;Sae Rom Chung;Young Jun Choi;Jung Hwan Baek
    • Korean Journal of Radiology
    • /
    • v.22 no.5
    • /
    • pp.751-758
    • /
    • 2021
  • Objective: Preoperative differentiation between inverted papilloma (IP) and its malignant transformation to squamous cell carcinoma (IP-SCC) is critical for patient management. We aimed to determine the diagnostic accuracy of conventional imaging features and histogram parameters obtained from whole tumor apparent diffusion coefficient (ADC) values to predict IP-SCC in patients with IP, using decision tree analysis. Materials and Methods: In this retrospective study, we analyzed data generated from the records of 180 consecutive patients with histopathologically diagnosed IP or IP-SCC who underwent head and neck magnetic resonance imaging, including diffusion-weighted imaging and 62 patients were included in the study. To obtain whole tumor ADC values, the region of interest was placed to cover the entire volume of the tumor. Classification and regression tree analyses were performed to determine the most significant predictors of IP-SCC among multiple covariates. The final tree was selected by cross-validation pruning based on minimal error. Results: Of 62 patients with IP, 21 (34%) had IP-SCC. The decision tree analysis revealed that the loss of convoluted cerebriform pattern and the 20th percentile cutoff of ADC were the most significant predictors of IP-SCC. With these decision trees, the sensitivity, specificity, accuracy, and C-statistics were 86% (18 out of 21; 95% confidence interval [CI], 65-95%), 100% (41 out of 41; 95% CI, 91-100%), 95% (59 out of 61; 95% CI, 87-98%), and 0.966 (95% CI, 0.912-1.000), respectively. Conclusion: Decision tree analysis using conventional imaging features and histogram analysis of whole volume ADC could predict IP-SCC in patients with IP with high diagnostic accuracy.

Satellite-based Hybrid Drought Assessment using Vegetation Drought Response Index in South Korea (VegDRI-SKorea) (식생가뭄반응지수 (VegDRI)를 활용한 위성영상 기반 가뭄 평가)

  • Nam, Won-Ho;Tadesse, Tsegaye;Wardlow, Brian D.;Jang, Min-Won;Hong, Suk-Young
    • Journal of The Korean Society of Agricultural Engineers
    • /
    • v.57 no.4
    • /
    • pp.1-9
    • /
    • 2015
  • The development of drought index that provides detailed-spatial-resolution drought information is essential for improving drought planning and preparedness. The objective of this study was to develop the concept of using satellite-based hybrid drought index called the Vegetation Drought Response Index in South Korea (VegDRI-SKorea) that could improve spatial resolution for monitoring local and regional drought. The VegDRI-SKorea was developed using the Classification And Regression Trees (CART) algorithm based on remote sensing data such as Normalized Difference Vegetation Index (NDVI) from MODIS satellite images, climate drought indices such as Self Calibrating Palmer Drought Severity Index (SC-PDSI) and Standardized Precipitation Index (SPI), and the biophysical data such as land cover, eco region, and soil available water capacity. A case study has been done for the 2012 drought to evaluate the VegDRI-SKorea model for South Korea. The VegDRI-SKorea represented the drought areas from the end of May and to the severe drought at the end of June. Results show that the integration of satellite imageries and various associated data allows us to get improved both spatially and temporally drought information using a data mining technique and get better understanding of drought condition. In addition, VegDRI-SKorea is expected to contribute to monitor the current drought condition for evaluating local and regional drought risk assessment and assisting drought-related decision making.

Development of Predictive Model of Social Activity for the Elderly in Korea using CRT Algorithm (CRT 알고리즘을 이용한 우리나라 노인의 사회활동 영향요인 예측 모형 개발)

  • Byeon, Haewon
    • Journal of the Korea Convergence Society
    • /
    • v.9 no.10
    • /
    • pp.243-248
    • /
    • 2018
  • The social activities of the elderly are important in successfully achieving aging by providing opportunities for social interaction to enhance life satisfaction. The purpose of this study is to identify the related factors of the elderly social activities and build a statistical classification model to predict social activities. Subjects were 1,864 elderly people (829 males, 1,035 females) who completed the community health survey in 2015. Outcome variables were defined as the experience of social activity during the past month(yes, no). The prediction model was constructed using decision tree model based on Classification and Regression Trees (CRT) algorithm. The results of this study were subjective health, frequency of meeting with neighbors, frequency of meeting with relatives, and living with spouse were significant variables of social participation. The most prevalent predictor was the subjective health level. In order to prepare for the successful aging of the super aged society based on the results of this study, social attention and support for the social activities of the elderly are required.

Analysis of Traffic Accidents Injury Severity in Seoul using Decision Trees and Spatiotemporal Data Visualization (의사결정나무와 시공간 시각화를 통한 서울시 교통사고 심각도 요인 분석)

  • Kang, Youngok;Son, Serin;Cho, Nahye
    • Journal of Cadastre & Land InformatiX
    • /
    • v.47 no.2
    • /
    • pp.233-254
    • /
    • 2017
  • The purpose of this study is to analyze the main factors influencing the severity of traffic accidents and to visualize spatiotemporal characteristics of traffic accidents in Seoul. To do this, we collected the traffic accident data that occurred in Seoul for four years from 2012 to 2015, and classified as slight, serious, and death traffic accidents according to the severity of traffic accidents. The analysis of spatiotemporal characteristics of traffic accidents was performed by kernel density analysis, hotspot analysis, space time cube analysis, and Emerging HotSpot Analysis. The factors affecting the severity of traffic accidents were analyzed using decision tree model. The results show that traffic accidents in Seoul are more frequent in suburbs than in central areas. Especially, traffic accidents concentrated in some commercial and entertainment areas in Seocho and Gangnam, and the traffic accidents were more and more intense over time. In the case of death traffic accidents, there were statistically significant hotspot areas in Yeongdeungpo-gu, Guro-gu, Jongno-gu, Jung-gu and Seongbuk. However, hotspots of death traffic accidents by time zone resulted in different patterns. In terms of traffic accident severity, the type of accident is the most important factor. The type of the road, the type of the vehicle, the time of the traffic accident, and the type of the violation of the regulations were ranked in order of importance. Regarding decision rules that cause serious traffic accidents, in case of van or truck, there is a high probability that a serious traffic accident will occur at a place where the width of the road is wide and the vehicle speed is high. In case of bicycle, car, motorcycle or the others there is a high probability that a serious traffic accident will occur under the same circumstances in the dawn time.

Data Cude Index to Support Integrated Multi-dimensional Concept Hierarchies in Spatial Data Warehouse (공간 데이터웨어하우스에서 통합된 다차원 개념 계층 지원을 위한 데이터 큐브 색인)

  • Lee, Dong-Wook;Baek, Sung-Ha;Kim, Gyoung-Bae;Bae, Hae-Young
    • Journal of Korea Multimedia Society
    • /
    • v.12 no.10
    • /
    • pp.1386-1396
    • /
    • 2009
  • Most decision support functions of spatial data warehouse rely on the OLAP operations upon a spatial cube. Meanwhile, higher performance is always guaranteed by indexing the cube, which stores huge amount of pre-aggregated information. Hierarchical Dwarf was proposed as a solution, which can be taken as an extension of the Dwarf, a compressed index for cube structures. However, it does not consider the spatial dimension and even aggregates incorrectly if there are redundant values at the lower levels. OLAP-favored Searching was proposed as a spatial hierarchy based OLAP operation, which employs the advantages of R-tree. Although it supports aggregating functions well against specified areas, it ignores the operations on the spatial dimensions. In this paper, an indexing approach, which aims at utilizing the concept hierarchy of the spatial cube for decision support, is proposed. The index consists of concept hierarchy trees of all dimensions, which are linked according to the tuples stored in the fact table. It saves storage cost by preventing identical trees from being created redundantly. Also, it reduces the OLAP operation cost by integrating the spatial and aspatial dimensions in the virtual concept hierarchy.

  • PDF

Artificial Intelligence Techniques for Predicting Online Peer-to-Peer(P2P) Loan Default (인공지능기법을 이용한 온라인 P2P 대출거래의 채무불이행 예측에 관한 실증연구)

  • Bae, Jae Kwon;Lee, Seung Yeon;Seo, Hee Jin
    • The Journal of Society for e-Business Studies
    • /
    • v.23 no.3
    • /
    • pp.207-224
    • /
    • 2018
  • In this article, an empirical study was conducted by using public dataset from Lending Club Corporation, the largest online peer-to-peer (P2P) lending in the world. We explore significant predictor variables related to P2P lending default that housing situation, length of employment, average current balance, debt-to-income ratio, loan amount, loan purpose, interest rate, public records, number of finance trades, total credit/credit limit, number of delinquent accounts, number of mortgage accounts, and number of bank card accounts are significant factors to loan funded successful on Lending Club platform. We developed online P2P lending default prediction models using discriminant analysis, logistic regression, neural networks, and decision trees (i.e., CART and C5.0) in order to predict P2P loan default. To verify the feasibility and effectiveness of P2P lending default prediction models, borrower loan data and credit data used in this study. Empirical results indicated that neural networks outperforms other classifiers such as discriminant analysis, logistic regression, CART, and C5.0. Neural networks always outperforms other classifiers in P2P loan default prediction.

Development of Predictive Models for Rights Issues Using Financial Analysis Indices and Decision Tree Technique (경영분석지표와 의사결정나무기법을 이용한 유상증자 예측모형 개발)

  • Kim, Myeong-Kyun;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.4
    • /
    • pp.59-77
    • /
    • 2012
  • This study focuses on predicting which firms will increase capital by issuing new stocks in the near future. Many stakeholders, including banks, credit rating agencies and investors, performs a variety of analyses for firms' growth, profitability, stability, activity, productivity, etc., and regularly report the firms' financial analysis indices. In the paper, we develop predictive models for rights issues using these financial analysis indices and data mining techniques. This study approaches to building the predictive models from the perspective of two different analyses. The first is the analysis period. We divide the analysis period into before and after the IMF financial crisis, and examine whether there is the difference between the two periods. The second is the prediction time. In order to predict when firms increase capital by issuing new stocks, the prediction time is categorized as one year, two years and three years later. Therefore Total six prediction models are developed and analyzed. In this paper, we employ the decision tree technique to build the prediction models for rights issues. The decision tree is the most widely used prediction method which builds decision trees to label or categorize cases into a set of known classes. In contrast to neural networks, logistic regression and SVM, decision tree techniques are well suited for high-dimensional applications and have strong explanation capabilities. There are well-known decision tree induction algorithms such as CHAID, CART, QUEST, C5.0, etc. Among them, we use C5.0 algorithm which is the most recently developed algorithm and yields performance better than other algorithms. We obtained data for the rights issue and financial analysis from TS2000 of Korea Listed Companies Association. A record of financial analysis data is consisted of 89 variables which include 9 growth indices, 30 profitability indices, 23 stability indices, 6 activity indices and 8 productivity indices. For the model building and test, we used 10,925 financial analysis data of total 658 listed firms. PASW Modeler 13 was used to build C5.0 decision trees for the six prediction models. Total 84 variables among financial analysis data are selected as the input variables of each model, and the rights issue status (issued or not issued) is defined as the output variable. To develop prediction models using C5.0 node (Node Options: Output type = Rule set, Use boosting = false, Cross-validate = false, Mode = Simple, Favor = Generality), we used 60% of data for model building and 40% of data for model test. The results of experimental analysis show that the prediction accuracies of data after the IMF financial crisis (59.04% to 60.43%) are about 10 percent higher than ones before IMF financial crisis (68.78% to 71.41%). These results indicate that since the IMF financial crisis, the reliability of financial analysis indices has increased and the firm intention of rights issue has been more obvious. The experiment results also show that the stability-related indices have a major impact on conducting rights issue in the case of short-term prediction. On the other hand, the long-term prediction of conducting rights issue is affected by financial analysis indices on profitability, stability, activity and productivity. All the prediction models include the industry code as one of significant variables. This means that companies in different types of industries show their different types of patterns for rights issue. We conclude that it is desirable for stakeholders to take into account stability-related indices and more various financial analysis indices for short-term prediction and long-term prediction, respectively. The current study has several limitations. First, we need to compare the differences in accuracy by using different data mining techniques such as neural networks, logistic regression and SVM. Second, we are required to develop and to evaluate new prediction models including variables which research in the theory of capital structure has mentioned about the relevance to rights issue.

Development of Thinning Effect Analysis Model (TEAM) Using Individual-Tree Distance-Independent Growth Model of Pinus koraiensis Stands (잣나무 임분의 개체목 거리독립생장모델을 이용한 간벌효과 분석모델 개발)

  • Kwon, Soonduk;Kim, Seonyoung;Chung, Joosang;Kim, Hyung-Ho
    • Journal of Korean Society of Forest Science
    • /
    • v.96 no.6
    • /
    • pp.742-749
    • /
    • 2007
  • The objective of this study was to develop thinning effect analysis model (TEAM) using individual-tree distance-independent growth model of Pinus koraiensis Stands. The TEAM was designed to analyze thinning effects associated with such thinning prescriptions as the number, timing, intensity, and method of thinnings. To testing TEAM application, stand growth effects were compared with seven scenarios according to thinning prescription plan. In the results, it was possible to estimate the number of trees, height, volume with diameter (DBH) class of individual trees, and average diameter growth, height growth, the number of trees and volume growth per ha of stands. The result of sensitivity analysis on one Pinus koraiensis stand, it was not sure to expect the much more volume at the rotation age by stand density control applying thinning prescription. In the case of thinning, total yield volume has much more $40{\sim}75m^3$ per ha, within 5 cm in average diameter growth and within 1 m in average height growth than thats of non-thinning over increasing stand age. TEAM, as decision making support system, can be used for selecting the thinning prescription trial and determining one of some thinning prescription plan in different site specific stand environments.

Geospatial Assessment of Frost and Freeze Risk in 'Changhowon Hwangdo' Peach (Prunus persica) Trees as Affected by the Projected Winter Warming in South Korea: III. Identifying Freeze Risk Zones in the Future Using High-Definition Climate Scenarios (겨울기온 상승에 따른 복숭아 나무 '장호원황도' 품종의 결과지에 대한 동상해위험 공간분석: III. 고해상도 기후시나리오에 근거한 동해위험의 미래분포)

  • Chung, U-Ran;Kim, Jin-Hee;Kim, Soo-Ock;Seo, Hee-Cheol;Yun, Jin-I.
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.11 no.4
    • /
    • pp.221-232
    • /
    • 2009
  • The geographical distribution of freeze risk determines the latitudinal and altitudinal limits and the maximum acreage suitable for fruit production. Any changes in its pattern can affect the policy for climate change adaptation in fruit industry. High-definition digital maps for such applications are not available yet due to uncertainty in the combined responses of temperature and dormancy depth under the future climate scenarios. We applied an empirical freeze risk index, which was derived from the combination of the dormancy depth and threshold temperature inducing freeze damage to dormant buds of 'Changhowon Hwangdo' peach trees, to the high-definition digital climate maps prepared for the current (1971-2000), the near future (2011-2040) and the far future (2071-2100) climate scenarios. According to the geospatial analysis at a landscape scale, both the safe and risky areas will be expanded in the future and some of the major peach cultivation areas may encounter difficulty in safe overwintering due to weakening cold tolerance resulting from insufficient chilling. Our test of this method for the two counties representing the major peach cultivation areas in South Korea demonstrated that the migration of risky areas could be detected at a sub-grid scale. The method presented in this study can contribute significantly to climate change adaptation planning in agriculture as a decision aids tool.