• Title/Summary/Keyword: Decision tree method

Search Result 621, Processing Time 0.025 seconds

A Study for Feature Selection in the Intrusion Detection System (침입탐지시스템에서의 특징 선택에 대한 연구)

  • Han, Myung-Mook
    • Convergence Security Journal
    • /
    • v.6 no.3
    • /
    • pp.87-95
    • /
    • 2006
  • An intrusion can be defined as any set of actors that attempt to compromise the integrity, confidentiality and availability of computer resource and destroy the security policy of computer system. The Intrusion Detection System that detects the intrusion consists of data collection, data reduction, analysis and detection, and report and response. It is important for feature selection to detect the intrusion efficiently after collecting the large set of data of Intrusion Detection System. In this paper, the feature selection method using Genetic Algorithm and Decision Tree is proposed. Also the method is verified by the simulation with KDD data.

  • PDF

EEG Classification for depression patients using decision tree and possibilistic support vector machines (뇌파의 의사 결정 트리 분석과 가능성 기반 서포트 벡터 머신 분석을 통한 우울증 환자의 분류)

  • Sim, Woo-Hyeon;Lee, Gi-Yeong;Chae, Jeong-Ho;Jeong, Jae-Seung;Lee, Do-Heon
    • Bioinformatics and Biosystems
    • /
    • v.1 no.2
    • /
    • pp.134-138
    • /
    • 2006
  • Depression is the most common and widespread mood disorder. About 20% of the population might suffer a major, incapacitating episode of depression during their lifetime. This disorder can be classified into two types: major depressive disorders and bipolar disorder. Since pharmaceutical treatments are different according to types of depression disorders, correct and fast classification is quite critical for depression patients. Yet, classical statistical method, such as minnesota multiphasic personality inventory (MMPI), have some difficulties in applying to depression patients, because the patients suffer from concentration. We used electroencephalogram (EEG) analysis method fer classification of depression. We extracted nonlinearity of information flows between channels and estimated approximate entropy (ApEn) for the EEG at each channel. Using these attributes, we applied two types of data mining classification methods: decision tree and possibilistic support vector machines (PSVM). We found that decision tree showed 85.19% accuracy and PSVM exhibited 77.78% accuracy for classification of depression, 30 patients with major depressive disorder and 24 patients having bipolar disorder.

  • PDF

An Investigation of Factors Affecting Management Efficiency in Korean General Hospitals Using DEA Model (DEA모형을 이용한 종합병원의 효율성 측정과 영향요인)

  • Ahn, In-Whan;Yang, Dong-Hyun
    • Korea Journal of Hospital Management
    • /
    • v.10 no.1
    • /
    • pp.71-92
    • /
    • 2005
  • The purpose of this study is to analyze the efficiency in management of general hospitals and investigate the major factors on efficiency. Specifically, the management of each general hospital is evaluated by using Data Envelopment Analysis(DEA) technique which is a nonparametric statistical method for measurement of efficiency. Then, the influencing factors are investigated through analyses of Decision-Tree Model and Tobit Regression. The target hospitals were general hospitals in which bed sizes are between 200 and 500 among a total of 276 general hospitals. The main data of financial indicators were collected from 48 hospitals, and it was analyzed by using two statistical models. For Model I, three input and two output variables were used for efficiency evaluation. In particular, three input variables were the number of medical doctors, the number of paramedical personnel, and the bed size. And, two output variables were the numbers of inpatients and outpatients per year, adjusted by bed-size. The results of DEA analysis showed that only seven out of 48 hospitals(15%) turned out to be efficient. The decision-tree analysis also showed that there were six significant influencing factors for Model I. Six factors for Model I were Bed Occupancy Rate, Cost per Adjusted Inpatient, New Visit Ratio of Outpatients, Retired Ratio, Net Profit to Gross Revenues, Net Profit to Total Assets. In addition, the management efficiency of hospital is proved to increase as profit and patient-induced indicators increase and cost-related indicators decrease, by the Tobit regression model of independent variables derived from the decision-tree analysis. This study may be contributable to the development of analytic methodology regarding the efficiency of hospital management in that it suggests the synthetic measures by utilizing DEA model instead of suggesting simple ratio-analyzing results.

  • PDF

A Study on the Crash Severity of Expressway Work Zones Using Decision Tree (의사결정나무를 이용한 고속도로 공사구간 사고 심각도에 관한 연구)

  • PARK, Yong Woo;BACK, Sehum;PARK, Shin Hyoung;KWON, Oh Hoon
    • Journal of Korean Society of Transportation
    • /
    • v.34 no.6
    • /
    • pp.535-547
    • /
    • 2016
  • This study aims to identify factors that affect the degree of injury severity sustained in traffic crashes on work zone of Korean expressways. To this end, decision tree method was applied to identify influential factors on injury severity and compare characteristics of those factors between work zone and non-work zone. The results from the comparison show that the risk of severity was low when traffic volume and heavy vehicle ratio are high because the factors lower the overall section speed. On the other hand, when the traffic volume and the heavy vehicle ratio are low, the section speed increased and the tendency for high injury severity was confirmed. These findings are expected to help transportation planners and engineers understand which risk factors contribute more to severe injury in the work zones such that they can effectively prepare and implement safety countermeasures.

Study on Detection Technique for Cochlodinium polykrikoides Red tide using Logistic Regression Model and Decision Tree Model (로지스틱 회귀모형과 의사결정나무 모형을 이용한 Cochlodinium polykrikoides 적조 탐지 기법 연구)

  • Bak, Su-Ho;Kim, Heung-Min;Kim, Bum-Kyu;Hwang, Do-Hyun;Unuzaya, Enkhjargal;Yoon, Hong-Joo
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.13 no.4
    • /
    • pp.777-786
    • /
    • 2018
  • This study propose a new method to detect Cochlodinium polykrikoides on satellite images using logistic regression and decision tree. We used spectral profiles(918) extracted from red tide, clear water and turbid water as training data. The 70% of the entire data set was extracted and used for model training, and the classification accuracy of the model was evaluated by using the remaining 30%. As a result of the accuracy evaluation, the logistic regression model showed about 97% classification accuracy, and the decision tree model showed about 86% classification accuracy.

Bayesian Network-based Probabilistic Safety Assessment for Multi-Hazard of Earthquake-Induced Fire and Explosion (베이지안 네트워크를 이용한 지진 유발 화재・폭발 복합재해 확률론적 안전성 평가)

  • Se-Hyeok Lee;Uichan Seok;Junho Song
    • Journal of the Computational Structural Engineering Institute of Korea
    • /
    • v.37 no.3
    • /
    • pp.205-216
    • /
    • 2024
  • Recently, seismic Probabilistic Safety Assessment (PSA) methods have been developed for process plants, such as gas plants, oil refineries, and chemical plants. The framework originated from the PSA of nuclear power plants, which aims to assess the risk of reactor core damage. The original PSA method was modified to adopt the characteristics of a process plant whose purpose is continuous operation without shutdown. Therefore, a fault tree, whose top event is shut down, was constructed and transformed into a Bayesian Network (BN), a probabilistic graph model, for efficient risk-informed decision-making. In this research, the fault tree-based BN from the previous research is further developed to consider the multi-hazard of earthquake-induced fire and explosion (EQ-induced F&E). For this purpose, an event tree describing the occurrence of fire and explosion from a release is first constructed and transformed into a BN. And then, this BN is connected to the previous BN model developed for seismic PSA. A virtual plot plan of a gas plant is introduced as a basis for the construction of the specific EQ-induced F&E BN to test the proposed BN framework. The paper demonstrates the method through two examples of risk-informed decision-making. In particular, the second example verifies how the proposed method can establish a repair and retrofit strategy when a shutdown occurs in a process plant.

Feature Selection Effect of Classification Tree Using Feature Importance : Case of Credit Card Customer Churn Prediction (특성중요도를 활용한 분류나무의 입력특성 선택효과 : 신용카드 고객이탈 사례)

  • Yoon Hanseong
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.20 no.2
    • /
    • pp.1-10
    • /
    • 2024
  • For the purpose of predicting credit card customer churn accurately through data analysis, a model can be constructed with various machine learning algorithms, including decision tree. And feature importance has been utilized in selecting better input features that can improve performance of data analysis models for several application areas. In this paper, a method of utilizing feature importance calculated from the MDI method and its effects are investigated in the credit card customer churn prediction problem with classification trees. Compared with several random feature selections from case data, a set of input features selected from higher value of feature importance shows higher predictive power. It can be an efficient method for classifying and choosing input features necessary for improving prediction performance. The method organized in this paper can be an alternative to the selection of input features using feature importance in composing and using classification trees, including credit card customer churn prediction.

Predicting Stock Liquidity by Using Ensemble Data Mining Methods

  • Bae, Eun Chan;Lee, Kun Chang
    • Journal of the Korea Society of Computer and Information
    • /
    • v.21 no.6
    • /
    • pp.9-19
    • /
    • 2016
  • In finance literature, stock liquidity showing how stocks can be cashed out in the market has received rich attentions from both academicians and practitioners. The reasons are plenty. First, it is known that stock liquidity affects significantly asset pricing. Second, macroeconomic announcements influence liquidity in the stock market. Therefore, stock liquidity itself affects investors' decision and managers' decision as well. Though there exist a great deal of literature about stock liquidity in finance literature, it is quite clear that there are no studies attempting to investigate the stock liquidity issue as one of decision making problems. In finance literature, most of stock liquidity studies had dealt with limited views such as how much it influences stock price, which variables are associated with describing the stock liquidity significantly, etc. However, this paper posits that stock liquidity issue may become a serious decision-making problem, and then be handled by using data mining techniques to estimate its future extent with statistical validity. In this sense, we collected financial data set from a number of manufacturing companies listed in KRX (Korea Exchange) during the period of 2010 to 2013. The reason why we selected dataset from 2010 was to avoid the after-shocks of financial crisis that occurred in 2008. We used Fn-GuidPro system to gather total 5,700 financial data set. Stock liquidity measure was computed by the procedures proposed by Amihud (2002) which is known to show best metrics for showing relationship with daily return. We applied five data mining techniques (or classifiers) such as Bayesian network, support vector machine (SVM), decision tree, neural network, and ensemble method. Bayesian networks include GBN (General Bayesian Network), NBN (Naive BN), TAN (Tree Augmented NBN). Decision tree uses CART and C4.5. Regression result was used as a benchmarking performance. Ensemble method uses two types-integration of two classifiers, and three classifiers. Ensemble method is based on voting for the sake of integrating classifiers. Among the single classifiers, CART showed best performance with 48.2%, compared with 37.18% by regression. Among the ensemble methods, the result from integrating TAN, CART, and SVM was best with 49.25%. Through the additional analysis in individual industries, those relatively stabilized industries like electronic appliances, wholesale & retailing, woods, leather-bags-shoes showed better performance over 50%.

The Study on Improving Accuracy of Land Cover Classification using Spectral Library of Hyperspectral Image (초분광영상의 분광라이브러리를 이용한 토지피복분류의 정확도 향상에 관한 연구)

  • Park, Jung-Seo;Seo, Jin-Jae;Go, Je-Woong;Cho, Gi-Sung
    • Journal of Cadastre & Land InformatiX
    • /
    • v.46 no.2
    • /
    • pp.239-251
    • /
    • 2016
  • Hyperspectral image is widely used for land cover classification because it has a number of narrow bands and allow each pixel to include much more information in comparison with previous multi-spectral image. However, Higher spectral resolution of hyperspectral image results in an increase in data volumes and a decrease in noise efficiency. SAM(Spectral Angle Mapping), a method based on vector inner product to compare spectrum distribution, is a highly valuable and popular way to analyze continuous spectrum of hyperspectral image. SAM is shown to be less accurate when it is used to analyze hyperspectral image for land cover classification using spectral library. this inaccuracy is due to the effects of atmosphere. We suggest a decision tree based method to compensate the defect and show that the method improved accuracy of land cover classification.

A study on removal of unnecessary input variables using multiple external association rule (다중외적연관성규칙을 이용한 불필요한 입력변수 제거에 관한 연구)

  • Cho, Kwang-Hyun;Park, Hee-Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.22 no.5
    • /
    • pp.877-884
    • /
    • 2011
  • The decision tree is a representative algorithm of data mining and used in many domains such as retail target marketing, fraud detection, data reduction, variable screening, category merging, etc. This method is most useful in classification problems, and to make predictions for a target group after dividing it into several small groups. When we create a model of decision tree with a large number of input variables, we suffer difficulties in exploration and analysis of the model because of complex trees. And we can often find some association exist between input variables by external variables despite of no intrinsic association. In this paper, we study on the removal method of unnecessary input variables using multiple external association rules. And then we apply the removal method to actual data for its efficiencies.