• Title/Summary/Keyword: decision trees

Search Result 305, Processing Time 0.021 seconds

Development of Predictive Model of Social Activity for the Elderly in Korea using CRT Algorithm (CRT 알고리즘을 이용한 우리나라 노인의 사회활동 영향요인 예측 모형 개발)

  • Byeon, Haewon
    • Journal of the Korea Convergence Society
    • /
    • v.9 no.10
    • /
    • pp.243-248
    • /
    • 2018
  • The social activities of the elderly are important in successfully achieving aging by providing opportunities for social interaction to enhance life satisfaction. The purpose of this study is to identify the related factors of the elderly social activities and build a statistical classification model to predict social activities. Subjects were 1,864 elderly people (829 males, 1,035 females) who completed the community health survey in 2015. Outcome variables were defined as the experience of social activity during the past month(yes, no). The prediction model was constructed using decision tree model based on Classification and Regression Trees (CRT) algorithm. The results of this study were subjective health, frequency of meeting with neighbors, frequency of meeting with relatives, and living with spouse were significant variables of social participation. The most prevalent predictor was the subjective health level. In order to prepare for the successful aging of the super aged society based on the results of this study, social attention and support for the social activities of the elderly are required.

Analysis of Traffic Accidents Injury Severity in Seoul using Decision Trees and Spatiotemporal Data Visualization (의사결정나무와 시공간 시각화를 통한 서울시 교통사고 심각도 요인 분석)

  • Kang, Youngok;Son, Serin;Cho, Nahye
    • Journal of Cadastre & Land InformatiX
    • /
    • v.47 no.2
    • /
    • pp.233-254
    • /
    • 2017
  • The purpose of this study is to analyze the main factors influencing the severity of traffic accidents and to visualize spatiotemporal characteristics of traffic accidents in Seoul. To do this, we collected the traffic accident data that occurred in Seoul for four years from 2012 to 2015, and classified as slight, serious, and death traffic accidents according to the severity of traffic accidents. The analysis of spatiotemporal characteristics of traffic accidents was performed by kernel density analysis, hotspot analysis, space time cube analysis, and Emerging HotSpot Analysis. The factors affecting the severity of traffic accidents were analyzed using decision tree model. The results show that traffic accidents in Seoul are more frequent in suburbs than in central areas. Especially, traffic accidents concentrated in some commercial and entertainment areas in Seocho and Gangnam, and the traffic accidents were more and more intense over time. In the case of death traffic accidents, there were statistically significant hotspot areas in Yeongdeungpo-gu, Guro-gu, Jongno-gu, Jung-gu and Seongbuk. However, hotspots of death traffic accidents by time zone resulted in different patterns. In terms of traffic accident severity, the type of accident is the most important factor. The type of the road, the type of the vehicle, the time of the traffic accident, and the type of the violation of the regulations were ranked in order of importance. Regarding decision rules that cause serious traffic accidents, in case of van or truck, there is a high probability that a serious traffic accident will occur at a place where the width of the road is wide and the vehicle speed is high. In case of bicycle, car, motorcycle or the others there is a high probability that a serious traffic accident will occur under the same circumstances in the dawn time.

Data Cude Index to Support Integrated Multi-dimensional Concept Hierarchies in Spatial Data Warehouse (공간 데이터웨어하우스에서 통합된 다차원 개념 계층 지원을 위한 데이터 큐브 색인)

  • Lee, Dong-Wook;Baek, Sung-Ha;Kim, Gyoung-Bae;Bae, Hae-Young
    • Journal of Korea Multimedia Society
    • /
    • v.12 no.10
    • /
    • pp.1386-1396
    • /
    • 2009
  • Most decision support functions of spatial data warehouse rely on the OLAP operations upon a spatial cube. Meanwhile, higher performance is always guaranteed by indexing the cube, which stores huge amount of pre-aggregated information. Hierarchical Dwarf was proposed as a solution, which can be taken as an extension of the Dwarf, a compressed index for cube structures. However, it does not consider the spatial dimension and even aggregates incorrectly if there are redundant values at the lower levels. OLAP-favored Searching was proposed as a spatial hierarchy based OLAP operation, which employs the advantages of R-tree. Although it supports aggregating functions well against specified areas, it ignores the operations on the spatial dimensions. In this paper, an indexing approach, which aims at utilizing the concept hierarchy of the spatial cube for decision support, is proposed. The index consists of concept hierarchy trees of all dimensions, which are linked according to the tuples stored in the fact table. It saves storage cost by preventing identical trees from being created redundantly. Also, it reduces the OLAP operation cost by integrating the spatial and aspatial dimensions in the virtual concept hierarchy.

  • PDF

Artificial Intelligence Techniques for Predicting Online Peer-to-Peer(P2P) Loan Default (인공지능기법을 이용한 온라인 P2P 대출거래의 채무불이행 예측에 관한 실증연구)

  • Bae, Jae Kwon;Lee, Seung Yeon;Seo, Hee Jin
    • The Journal of Society for e-Business Studies
    • /
    • v.23 no.3
    • /
    • pp.207-224
    • /
    • 2018
  • In this article, an empirical study was conducted by using public dataset from Lending Club Corporation, the largest online peer-to-peer (P2P) lending in the world. We explore significant predictor variables related to P2P lending default that housing situation, length of employment, average current balance, debt-to-income ratio, loan amount, loan purpose, interest rate, public records, number of finance trades, total credit/credit limit, number of delinquent accounts, number of mortgage accounts, and number of bank card accounts are significant factors to loan funded successful on Lending Club platform. We developed online P2P lending default prediction models using discriminant analysis, logistic regression, neural networks, and decision trees (i.e., CART and C5.0) in order to predict P2P loan default. To verify the feasibility and effectiveness of P2P lending default prediction models, borrower loan data and credit data used in this study. Empirical results indicated that neural networks outperforms other classifiers such as discriminant analysis, logistic regression, CART, and C5.0. Neural networks always outperforms other classifiers in P2P loan default prediction.

Development of Predictive Models for Rights Issues Using Financial Analysis Indices and Decision Tree Technique (경영분석지표와 의사결정나무기법을 이용한 유상증자 예측모형 개발)

  • Kim, Myeong-Kyun;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.4
    • /
    • pp.59-77
    • /
    • 2012
  • This study focuses on predicting which firms will increase capital by issuing new stocks in the near future. Many stakeholders, including banks, credit rating agencies and investors, performs a variety of analyses for firms' growth, profitability, stability, activity, productivity, etc., and regularly report the firms' financial analysis indices. In the paper, we develop predictive models for rights issues using these financial analysis indices and data mining techniques. This study approaches to building the predictive models from the perspective of two different analyses. The first is the analysis period. We divide the analysis period into before and after the IMF financial crisis, and examine whether there is the difference between the two periods. The second is the prediction time. In order to predict when firms increase capital by issuing new stocks, the prediction time is categorized as one year, two years and three years later. Therefore Total six prediction models are developed and analyzed. In this paper, we employ the decision tree technique to build the prediction models for rights issues. The decision tree is the most widely used prediction method which builds decision trees to label or categorize cases into a set of known classes. In contrast to neural networks, logistic regression and SVM, decision tree techniques are well suited for high-dimensional applications and have strong explanation capabilities. There are well-known decision tree induction algorithms such as CHAID, CART, QUEST, C5.0, etc. Among them, we use C5.0 algorithm which is the most recently developed algorithm and yields performance better than other algorithms. We obtained data for the rights issue and financial analysis from TS2000 of Korea Listed Companies Association. A record of financial analysis data is consisted of 89 variables which include 9 growth indices, 30 profitability indices, 23 stability indices, 6 activity indices and 8 productivity indices. For the model building and test, we used 10,925 financial analysis data of total 658 listed firms. PASW Modeler 13 was used to build C5.0 decision trees for the six prediction models. Total 84 variables among financial analysis data are selected as the input variables of each model, and the rights issue status (issued or not issued) is defined as the output variable. To develop prediction models using C5.0 node (Node Options: Output type = Rule set, Use boosting = false, Cross-validate = false, Mode = Simple, Favor = Generality), we used 60% of data for model building and 40% of data for model test. The results of experimental analysis show that the prediction accuracies of data after the IMF financial crisis (59.04% to 60.43%) are about 10 percent higher than ones before IMF financial crisis (68.78% to 71.41%). These results indicate that since the IMF financial crisis, the reliability of financial analysis indices has increased and the firm intention of rights issue has been more obvious. The experiment results also show that the stability-related indices have a major impact on conducting rights issue in the case of short-term prediction. On the other hand, the long-term prediction of conducting rights issue is affected by financial analysis indices on profitability, stability, activity and productivity. All the prediction models include the industry code as one of significant variables. This means that companies in different types of industries show their different types of patterns for rights issue. We conclude that it is desirable for stakeholders to take into account stability-related indices and more various financial analysis indices for short-term prediction and long-term prediction, respectively. The current study has several limitations. First, we need to compare the differences in accuracy by using different data mining techniques such as neural networks, logistic regression and SVM. Second, we are required to develop and to evaluate new prediction models including variables which research in the theory of capital structure has mentioned about the relevance to rights issue.

Development of Thinning Effect Analysis Model (TEAM) Using Individual-Tree Distance-Independent Growth Model of Pinus koraiensis Stands (잣나무 임분의 개체목 거리독립생장모델을 이용한 간벌효과 분석모델 개발)

  • Kwon, Soonduk;Kim, Seonyoung;Chung, Joosang;Kim, Hyung-Ho
    • Journal of Korean Society of Forest Science
    • /
    • v.96 no.6
    • /
    • pp.742-749
    • /
    • 2007
  • The objective of this study was to develop thinning effect analysis model (TEAM) using individual-tree distance-independent growth model of Pinus koraiensis Stands. The TEAM was designed to analyze thinning effects associated with such thinning prescriptions as the number, timing, intensity, and method of thinnings. To testing TEAM application, stand growth effects were compared with seven scenarios according to thinning prescription plan. In the results, it was possible to estimate the number of trees, height, volume with diameter (DBH) class of individual trees, and average diameter growth, height growth, the number of trees and volume growth per ha of stands. The result of sensitivity analysis on one Pinus koraiensis stand, it was not sure to expect the much more volume at the rotation age by stand density control applying thinning prescription. In the case of thinning, total yield volume has much more $40{\sim}75m^3$ per ha, within 5 cm in average diameter growth and within 1 m in average height growth than thats of non-thinning over increasing stand age. TEAM, as decision making support system, can be used for selecting the thinning prescription trial and determining one of some thinning prescription plan in different site specific stand environments.

Geospatial Assessment of Frost and Freeze Risk in 'Changhowon Hwangdo' Peach (Prunus persica) Trees as Affected by the Projected Winter Warming in South Korea: III. Identifying Freeze Risk Zones in the Future Using High-Definition Climate Scenarios (겨울기온 상승에 따른 복숭아 나무 '장호원황도' 품종의 결과지에 대한 동상해위험 공간분석: III. 고해상도 기후시나리오에 근거한 동해위험의 미래분포)

  • Chung, U-Ran;Kim, Jin-Hee;Kim, Soo-Ock;Seo, Hee-Cheol;Yun, Jin-I.
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.11 no.4
    • /
    • pp.221-232
    • /
    • 2009
  • The geographical distribution of freeze risk determines the latitudinal and altitudinal limits and the maximum acreage suitable for fruit production. Any changes in its pattern can affect the policy for climate change adaptation in fruit industry. High-definition digital maps for such applications are not available yet due to uncertainty in the combined responses of temperature and dormancy depth under the future climate scenarios. We applied an empirical freeze risk index, which was derived from the combination of the dormancy depth and threshold temperature inducing freeze damage to dormant buds of 'Changhowon Hwangdo' peach trees, to the high-definition digital climate maps prepared for the current (1971-2000), the near future (2011-2040) and the far future (2071-2100) climate scenarios. According to the geospatial analysis at a landscape scale, both the safe and risky areas will be expanded in the future and some of the major peach cultivation areas may encounter difficulty in safe overwintering due to weakening cold tolerance resulting from insufficient chilling. Our test of this method for the two counties representing the major peach cultivation areas in South Korea demonstrated that the migration of risky areas could be detected at a sub-grid scale. The method presented in this study can contribute significantly to climate change adaptation planning in agriculture as a decision aids tool.

Effective Normalization Method for Fraud Detection Using a Decision Tree (의사결정나무를 이용한 이상금융거래 탐지 정규화 방법에 관한 연구)

  • Park, Jae Hoon;Kim, Huy Kang;Kim, Eunjin
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.25 no.1
    • /
    • pp.133-146
    • /
    • 2015
  • Ever sophisticated e-finance fraud techniques have led to an increasing number of reported phishing incidents. Financial authorities, in response, have recommended that we enhance existing Fraud Detection Systems (FDS) of banks and other financial institutions. FDSs are systems designed to prevent e-finance accidents through real-time access and validity checks on client transactions. The effectiveness of an FDS depends largely on how fast it can analyze and detect abnormalities in large amounts of customer transaction data. In this study we detect fraudulent transaction patterns and establish detection rules through e-finance accident data analyses. Abnormalities are flagged by comparing individual client transaction patterns with client profiles, using the ruleset. We propose an effective flagging method that uses decision trees to normalize detection rules. In demonstration, we extracted customer usage patterns, customer profile informations and detection rules from the e-finance accident data of an actual domestic(Korean) bank. We then compared the results of our decision tree-normalized detection rules with the results of a sequential detection and confirmed the efficiency of our methods.

Decision Tree Induction with Imbalanced Data Set: A Case of Health Insurance Bill Audit in a General Hospital (불균형 데이터 집합에서의 의사결정나무 추론: 종합 병원의 건강 보험료 청구 심사 사례)

  • Hur, Joon;Kim, Jong-Woo
    • Information Systems Review
    • /
    • v.9 no.1
    • /
    • pp.45-65
    • /
    • 2007
  • In medical industry, health insurance bill audit is unique and essential process in general hospitals. The health insurance bill audit process is very important because not only for hospital's profit but also hospital's reputation. Particularly, at the large general hospitals many related workers including analysts, nurses, and etc. have engaged in the health insurance bill audit process. This paper introduces a case of health insurance bill audit for finding reducible health insurance bill cases using decision tree induction techniques at a large general hospital in Korea. When supervised learning methods had been tried to be applied, one of major problems was data imbalance problem in the health insurance bill audit data. In other words, there were many normal(passing) cases and relatively small number of reduction cases in a bill audit dataset. To resolve the problem, in this study, well-known methods for imbalanced data sets including over sampling of rare cases, under sampling of major cases, and adjusting the misclassification cost are combined in several ways to find appropriate decision trees that satisfy required conditions in health insurance bill audit situation.

Refining Rules of Decision Tree Using Extended Data Expression (확장형 데이터 표현을 이용하는 이진트리의 룰 개선)

  • Jeon, Hae Sook;Lee, Won Don
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.18 no.6
    • /
    • pp.1283-1293
    • /
    • 2014
  • In ubiquitous environment, data are changing rapidly and new data is coming as times passes. And sometimes all of the past data will be lost if there is not sufficient space in memory. Therefore, there is a need to make rules and combine it with new data not to lose all the past data or to deal with large amounts of data. In making decision trees and extracting rules, the weight of each of rules is generally determined by the total number of the class at leaf. The computational problem of finding a minimum finite state acceptor compatible with given data is NP-hard. We assume that rules extracted are not correct and may have the loss of some information. Because of this precondition. this paper presents a new approach for refining rules. It controls their weight of rules of previous knowledge or data. In solving rule refinement, this paper tries to make a variety of rules with pruning method with majority and minority properties, control weight of each of rules and observe the change of performances. In this paper, the decision tree classifier with extended data expression having static weight is used for this proposed study. Experiments show that performances conducted with a new policy of refining rules may get better.