• Title/Summary/Keyword: Decision trees

Search Result 299, Processing Time 0.03 seconds

Ensemble of Nested Dichotomies for Activity Recognition Using Accelerometer Data on Smartphone (Ensemble of Nested Dichotomies 기법을 이용한 스마트폰 가속도 센서 데이터 기반의 동작 인지)

  • Ha, Eu Tteum;Kim, Jeongmin;Ryu, Kwang Ryel
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.4
    • /
    • pp.123-132
    • /
    • 2013
  • As the smartphones are equipped with various sensors such as the accelerometer, GPS, gravity sensor, gyros, ambient light sensor, proximity sensor, and so on, there have been many research works on making use of these sensors to create valuable applications. Human activity recognition is one such application that is motivated by various welfare applications such as the support for the elderly, measurement of calorie consumption, analysis of lifestyles, analysis of exercise patterns, and so on. One of the challenges faced when using the smartphone sensors for activity recognition is that the number of sensors used should be minimized to save the battery power. When the number of sensors used are restricted, it is difficult to realize a highly accurate activity recognizer or a classifier because it is hard to distinguish between subtly different activities relying on only limited information. The difficulty gets especially severe when the number of different activity classes to be distinguished is very large. In this paper, we show that a fairly accurate classifier can be built that can distinguish ten different activities by using only a single sensor data, i.e., the smartphone accelerometer data. The approach that we take to dealing with this ten-class problem is to use the ensemble of nested dichotomy (END) method that transforms a multi-class problem into multiple two-class problems. END builds a committee of binary classifiers in a nested fashion using a binary tree. At the root of the binary tree, the set of all the classes are split into two subsets of classes by using a binary classifier. At a child node of the tree, a subset of classes is again split into two smaller subsets by using another binary classifier. Continuing in this way, we can obtain a binary tree where each leaf node contains a single class. This binary tree can be viewed as a nested dichotomy that can make multi-class predictions. Depending on how a set of classes are split into two subsets at each node, the final tree that we obtain can be different. Since there can be some classes that are correlated, a particular tree may perform better than the others. However, we can hardly identify the best tree without deep domain knowledge. The END method copes with this problem by building multiple dichotomy trees randomly during learning, and then combining the predictions made by each tree during classification. The END method is generally known to perform well even when the base learner is unable to model complex decision boundaries As the base classifier at each node of the dichotomy, we have used another ensemble classifier called the random forest. A random forest is built by repeatedly generating a decision tree each time with a different random subset of features using a bootstrap sample. By combining bagging with random feature subset selection, a random forest enjoys the advantage of having more diverse ensemble members than a simple bagging. As an overall result, our ensemble of nested dichotomy can actually be seen as a committee of committees of decision trees that can deal with a multi-class problem with high accuracy. The ten classes of activities that we distinguish in this paper are 'Sitting', 'Standing', 'Walking', 'Running', 'Walking Uphill', 'Walking Downhill', 'Running Uphill', 'Running Downhill', 'Falling', and 'Hobbling'. The features used for classifying these activities include not only the magnitude of acceleration vector at each time point but also the maximum, the minimum, and the standard deviation of vector magnitude within a time window of the last 2 seconds, etc. For experiments to compare the performance of END with those of other methods, the accelerometer data has been collected at every 0.1 second for 2 minutes for each activity from 5 volunteers. Among these 5,900 ($=5{\times}(60{\times}2-2)/0.1$) data collected for each activity (the data for the first 2 seconds are trashed because they do not have time window data), 4,700 have been used for training and the rest for testing. Although 'Walking Uphill' is often confused with some other similar activities, END has been found to classify all of the ten activities with a fairly high accuracy of 98.4%. On the other hand, the accuracies achieved by a decision tree, a k-nearest neighbor, and a one-versus-rest support vector machine have been observed as 97.6%, 96.5%, and 97.6%, respectively.

Development of a Detection Model for the Companies Designated as Administrative Issue in KOSDAQ Market (KOSDAQ 시장의 관리종목 지정 탐지 모형 개발)

  • Shin, Dong-In;Kwahk, Kee-Young
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.3
    • /
    • pp.157-176
    • /
    • 2018
  • The purpose of this research is to develop a detection model for companies designated as administrative issue in KOSDAQ market using financial data. Administration issue designates the companies with high potential for delisting, which gives them time to overcome the reasons for the delisting under certain restrictions of the Korean stock market. It acts as an alarm to inform investors and market participants of which companies are likely to be delisted and warns them to make safe investments. Despite this importance, there are relatively few studies on administration issues prediction model in comparison with the lots of studies on bankruptcy prediction model. Therefore, this study develops and verifies the detection model of the companies designated as administrative issue using financial data of KOSDAQ companies. In this study, logistic regression and decision tree are proposed as the data mining models for detecting administrative issues. According to the results of the analysis, the logistic regression model predicted the companies designated as administrative issue using three variables - ROE(Earnings before tax), Cash flows/Shareholder's equity, and Asset turnover ratio, and its overall accuracy was 86% for the validation dataset. The decision tree (Classification and Regression Trees, CART) model applied the classification rules using Cash flows/Total assets and ROA(Net income), and the overall accuracy reached 87%. Implications of the financial indictors selected in our logistic regression and decision tree models are as follows. First, ROE(Earnings before tax) in the logistic detection model shows the profit and loss of the business segment that will continue without including the revenue and expenses of the discontinued business. Therefore, the weakening of the variable means that the competitiveness of the core business is weakened. If a large part of the profits is generated from one-off profit, it is very likely that the deterioration of business management is further intensified. As the ROE of a KOSDAQ company decreases significantly, it is highly likely that the company can be delisted. Second, cash flows to shareholder's equity represents that the firm's ability to generate cash flow under the condition that the financial condition of the subsidiary company is excluded. In other words, the weakening of the management capacity of the parent company, excluding the subsidiary's competence, can be a main reason for the increase of the possibility of administrative issue designation. Third, low asset turnover ratio means that current assets and non-current assets are ineffectively used by corporation, or that asset investment by corporation is excessive. If the asset turnover ratio of a KOSDAQ-listed company decreases, it is necessary to examine in detail corporate activities from various perspectives such as weakening sales or increasing or decreasing inventories of company. Cash flow / total assets, a variable selected by the decision tree detection model, is a key indicator of the company's cash condition and its ability to generate cash from operating activities. Cash flow indicates whether a firm can perform its main activities(maintaining its operating ability, repaying debts, paying dividends and making new investments) without relying on external financial resources. Therefore, if the index of the variable is negative(-), it indicates the possibility that a company has serious problems in business activities. If the cash flow from operating activities of a specific company is smaller than the net profit, it means that the net profit has not been cashed, indicating that there is a serious problem in managing the trade receivables and inventory assets of the company. Therefore, it can be understood that as the cash flows / total assets decrease, the probability of administrative issue designation and the probability of delisting are increased. In summary, the logistic regression-based detection model in this study was found to be affected by the company's financial activities including ROE(Earnings before tax). However, decision tree-based detection model predicts the designation based on the cash flows of the company.

A Study on the System and Process of the Construction and Management for the Royal Garden and Landscape in the Late Choson Dynasty (조선 후기 원유의 영선체제와 과정에 관한 연구)

  • 전영옥
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.26 no.2
    • /
    • pp.73-90
    • /
    • 1998
  • The construction and management of the royal garden and landscape were the most significant project in Choson Dynasty. The kind of the royal garden and landscape were the rear garden of the palace, the groves of the royal shrine and orchard, etc. As the important project of the country, these constructions were controlled by the administrative system without division into the fields of engineering, building and landscaping. The purpose of this study is to investigate the administrative system. In particular, this study is focused on the construction and management of the royal garden and landscape in Hanyang from the 18th century to the late 19th century. This study is based on the analysis of historic documents and a survey of the relics. The results were summarized as follows : 1) The administrative system of the construction and management of the royal garden and landscape was composed of the government offices under Industry Board as a permanent organization - Yongjosa, Santaeksa, Chunchonsa, Songonggam, Changwonso - and Togam as a temporary organization. In addition to these organizations, there were Revenue Board, Ceremony Board, Military Board, which served as supporting organizations. The control of the construction and management of the royal garden and landscape was held by decision makers, executors of works and management. 2) The general process of the construction and management of the royal garden and landscape included Sangji and Kyuho다 as the first step; In case of buildings and facilities, according to former examples and drawings, the most of the planning and design was already fixed. In the case of landscape, those things aimed at construction according to the existing lie of the land. The works in the 2nd step; This process was divided into the construction of facilities and planting. In case of construction of facilities, those works were done by Togam and Songonggam. The high cost works were carried out through Togam and normal repairing works were completed by Songonggam. In case of planting, those works were carried out through Chunchonsa and the military. The management in the 3rd step; This process was done by two parts like the process of works. In case of facilities, management was done 효 the officers of Pongshim. In case of groves of newly - planted trees, this management was done by Tongsanbyonlgam and Tongsanjik who served cultivation and harvest of fruit trees as an expert.

  • PDF

Spatial Dispersion and Sampling of Adults of Citrus Red Mite, Panonychus citri(McGregor) (Acari: Tetranychidae) in Citrus Orchard in Autumn Season (감귤원에서 가을철 귤응애 성충의 공간분포와 표본조사)

  • 송정흡;김수남;류기중
    • Korean journal of applied entomology
    • /
    • v.42 no.1
    • /
    • pp.29-34
    • /
    • 2003
  • Dispersion pattern for adult citrus red mite (CRM), Panonychus citri (McGregor) using by Taylor's power law (TPL) and Iwao's patchiness regression (IPR) was determined to develop a monitoring method on citrus orchards, on Jeju, in Autumn season, during 2001 and 2002.CRM population was sampled by collecting leaves and fruits. The relationships of CRM adults between leaf and fruit were analyzed by different season. The regression equation for CRM adults between leaf (X) and fruit (Y) was ln(Y+1) : 1.029 ln(X+1) ( $r^2$ : 0.80). The density of CRM was higher on fruit than on leaf according to fruit maturing level. TPL provided better description of mean-variance relation-ship for the dispersion indices compared to IPR. Slopes and intercepts of TPL from leaf and fruit samples did not differ between sample units and surveyed years. Fixed-precision levels (D) of a sequential sampling plan were developed using Taylor's power law parameters generated from adults of CRM in leaf sample. Sequential sampling plans for adults of CRM were developed for decision making CRM population level based on the different action threshold levels (2.0,2.5 and 3.0 mites per leaf) with 0.25 precision. The maximum number of trees and required number of trees sampled on fixed sample size plan on 2.0,2.5 and 3.0 thresholds with 0.25 precision level were 19, 16 and 15 and their critical values T$_{critical}$ at were 554,609 and 659, respectively. were 554,609 and 659, respectively.

Analysis of Survivability for Combatants during Offensive Operations at the Tactical Level (전술제대 공격작전간 전투원 생존성에 관한 연구)

  • Kim, Jaeoh;Cho, HyungJun;Kim, GakGyu
    • The Korean Journal of Applied Statistics
    • /
    • v.28 no.5
    • /
    • pp.921-932
    • /
    • 2015
  • This study analyzed military personnel survivability in regards to offensive operations according to the scientific military training data of a reinforced infantry battalion. Scientific battle training was conducted at the Korea Combat Training Center (KCTC) training facility and utilized scientific military training equipment that included MILES and the main exercise control system. The training audience freely engaged an OPFOR who is an expert at tactics and weapon systems. It provides a statistical analysis of data in regards to state-of-the-art military training because the scientific battle training system saves and utilizes all training zone data for analysis and after action review as well as offers training control during the training period. The methodologies used the Cox PH modeling (which does not require parametric distribution assumptions) and decision tree modeling for survival data such as CART, GUIDE, and CTREE for richer and easier interpretation. The variables that violate the PH assumption were stratified and analyzed. Since the Cox PH model result was not easy to interpret the period of service, additional interpretation was attempted through univariate local regression. CART, GUIDE, and CTREE formed different tree models which allow for various interpretations.

Relationship Between Above-and Below-Ground Biomass for Norway Spruce (Picea abies) : Estimating Root System Biomass from Breast Height Diameter (독일가문비나무(Picea abies [L.] Karst)의 지상부(地上部)와 지하부(地下部) 생체량(生體量)에 관(關)한 연구(硏究) : 흉고직경(胸高直徑)에 의한 뿌리생체량(生體量) 추정(推定))

  • Lee, Do-Hyung
    • Journal of Korean Society of Forest Science
    • /
    • v.90 no.3
    • /
    • pp.338-345
    • /
    • 2001
  • This study was conducted to elucidate the relationship between the root structure and the crown structure of Norway spruce(Picea abies [L.] Karst), and thereafter to obtain the regression equation for the estimation of relative root and needle biomass using the tree height and diameter at breast height(DBH) without measurement of root and needle biomass. The study site was Barbis stands of Harz region located in central part of Germany. Five dominant and three co-dominant trees of 30 to 40 year-old Norway spruce were selected and tree height, diameter at breast height, clear bole length, weight of total needle and branch, cross section and sapwood area at breast height for biomass of above ground part and also the length of root, the number of root, the weight of root, the cross section area of root etc. by dividing the horizontal and vertical roots for below ground part of tree were measured. The significantly correlation was shown between the biomass of most of variables of above ground parts and those of below ground parts. For the diameter of breast height to the weight of total root, regression equation was Y = 3.56X - 45.94 and decision coefficient was 0.96 showing highly correlation. The weight of total branches and needles, and the tree height etc. of above ground parts showed highly positive relationship with below ground biomass. The results obtained from this study can be used to the estimating of biomass of below ground using variables of above ground such as DBH in the 30 to 40 year-old Norway spruce stands.

  • PDF

Landslide Susceptibility Mapping by Comparing GIS-based Spatial Models in the Java, Indonesia (GIS 기반 공간예측모델 비교를 통한 인도네시아 자바지역 산사태 취약지도 제작)

  • Kim, Mi-Kyeong;Kim, Sangpil;Nho, Hyunju;Sohn, Hong-Gyoo
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.37 no.5
    • /
    • pp.927-940
    • /
    • 2017
  • Landslide has been a major disaster in Indonesia, and recent climate change and indiscriminate urban development around the mountains have increased landslide risks. Java Island, Indonesia, where more than half of Indonesia's population lives, is experiencing a great deal of damage due to frequent landslides. However, even in such a dangerous situation, the number of inhabitants residing in the landslide-prone area increases year by year, and it is necessary to develop a technique for analyzing landslide-hazardous and vulnerable areas. In this regard, this study aims to evaluate landslide susceptibility of Java, an island of Indonesia, by using GIS-based spatial prediction models. We constructed the geospatial database such as landslide locations, topography, hydrology, soil type, and land cover over the study area and created spatial prediction models by applying Weight of Evidence (WoE), decision trees algorithm and artificial neural network. The three models showed prediction accuracy of 66.95%, 67.04%, and 69.67%, respectively. The results of the study are expected to be useful for prevention of landslide damage for the future and landslide disaster management policies in Indonesia.

Severity-Adjusted LOS Model of AMI patients based on the Korean National Hospital Discharge in-depth Injury Survey Data (퇴원손상심층조사 자료를 기반으로 한 급성심근경색환자 재원일수의 중증도 보정 모형 개발)

  • Kim, Won-Joong;Kim, Sung-Soo;Kim, Eun-Ju;Kang, Sung-Hong
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.14 no.10
    • /
    • pp.4910-4918
    • /
    • 2013
  • This study aims to design a Severity-Adjusted LOS(Length of Stay) Model in order to efficiently manage LOS of AMI(Acute Myocardial Infarction) patients. We designed a Severity-Adjusted LOS Model with using data-mining methods(multiple regression analysis, decision trees, and neural network) which covered 6,074 AMI patients who showed the diagnosis of I21 from 2004-2009 Korean National Hospital Discharge in-depth Injury Survey. A decision tree model was chosen for the final model that produced superior results. This study discovered that the execution of CABG, status at discharge(alive or dead), comorbidity index, etc. were major factors affecting a Sevirity-Adjustment of LOS of AMI patients. The difference between real LOS and adjusted LOS resulted from hospital location and bed size. The efficient management of LOS of AMI patients requires that we need to perform various activities after identifying differentiating factors. These factors can be specified by applying each hospital's data into this newly designed Severity-Adjusted LOS Model.

Dynamic Growth Model for Pinus densiflora Stands in Anmyun-Island (안면도(安眠島) 소나무 임분(林分)의 동적(動的) 생장(生長)모델)

  • Seo, Jeong-Ho;Lee, Woo-Kyun;Son, Yowhan;Ham, Bo-Young
    • Journal of Korean Society of Forest Science
    • /
    • v.90 no.6
    • /
    • pp.725-733
    • /
    • 2001
  • In this study, the relationship between growth factors for Pinus densiflora stands in Anmyun-Island was analyzed and dynamic growth model was prepared. A total of 96 sample plots was investigated in which dbh and height of individual trees were measured. From these plot data, quadratic mean dbh, mean height, dominant tree height, stem number per ha, basal area per ha and volume per ha were estimated. Several regression equations between growth factors were derived using NLIN and REG procedure of SAS. And dynamic growth model, in which the equations were interactively linked, was prepared for the prediction of stand growth and yield under different management regime. The predictions of dynamic growth model were found to be coincided with general growth principles. The dynamic growth model was considered as adequate for predicting growth and yield of Pinus densiflora stand in Anmyun-Island. In practice, the dynamic growth model can be applied for predicting the growth and development of stand for various forest treatments and for decision-making in forest management.

  • PDF

Convergence Analysis of Risk factors for Readmission in Cardiovascular Disease: A Machine Learning Approach (의사결정나무분석을 이용한 심혈관질환자의 재입원 위험 요인에 대한 융합적 분석)

  • Kim, Hyun-Su
    • Journal of Convergence for Information Technology
    • /
    • v.9 no.12
    • /
    • pp.115-123
    • /
    • 2019
  • This is descriptive study to 2nd analysis data KNHANES IV-VI about risk factors of readmission among patients with cardiovascular disease. Among the total 65,973 adults, 1,037 with angina or myocardial infarction were analyzed. The analysis was conducted using SPSS window 21 Program and CHAID decision tree was used in the classification analysis. Root nodes are economic activity(χ2=12.063, p=.001), children's nodes are personal income(χ2=6.575, p=.031), weight change(χ2=12.758, p=.001), residential area(χ2=4.025, p=.045), direct smoking(χ2=3.884, p=.031). p=.049), level of education(χ2=9.630, p=.024). Terminal nodes are hypertension(χ2=3.854, p=.050), diabetes mellitus(χ2=6.056, p=.014), occupation type(χ2=7.799, p=.037). We suggest that the development and operation of programs considering the integrated approach of various factors is necessary for the readmission management of cardiovascular patients.