• Title/Summary/Keyword: Decision Tree analysis

Search Result 725, Processing Time 0.033 seconds

A study on the development of severity-adjusted mortality prediction model for discharged patient with acute stroke using machine learning (머신러닝을 이용한 급성 뇌졸중 퇴원 환자의 중증도 보정 사망 예측 모형 개발에 관한 연구)

  • Baek, Seol-Kyung;Park, Jong-Ho;Kang, Sung-Hong;Park, Hye-Jin
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.19 no.11
    • /
    • pp.126-136
    • /
    • 2018
  • The purpose of this study was to develop a severity-adjustment model for predicting mortality in acute stroke patients using machine learning. Using the Korean National Hospital Discharge In-depth Injury Survey from 2006 to 2015, the study population with disease code I60-I63 (KCD 7) were extracted for further analysis. Three tools were used for the severity-adjustment of comorbidity: the Charlson Comorbidity Index (CCI), the Elixhauser comorbidity index (ECI), and the Clinical Classification Software (CCS). The severity-adjustment models for mortality prediction in patients with acute stroke were developed using logistic regression, decision tree, neural network, and support vector machine methods. The most common comorbid disease in stroke patients were hypertension, uncomplicated (43.8%) in the ECI, and essential hypertension (43.9%) in the CCS. Among the CCI, ECI, and CCS, CCS had the highest AUC value. CCS was confirmed as the best severity correction tool. In addition, the AUC values for variables of CCS including main diagnosis, gender, age, hospitalization route, and existence of surgery were 0.808 for the logistic regression analysis, 0.785 for the decision tree, 0.809 for the neural network and 0.830 for the support vector machine. Therefore, the best predictive power was achieved by the support vector machine technique. The results of this study can be used in the establishment of health policy in the future.

A Comparative Study of Machine Learning Algorithms Using LID-DS DataSet (LID-DS 데이터 세트를 사용한 기계학습 알고리즘 비교 연구)

  • Park, DaeKyeong;Ryu, KyungJoon;Shin, DongIl;Shin, DongKyoo;Park, JeongChan;Kim, JinGoog
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.10 no.3
    • /
    • pp.91-98
    • /
    • 2021
  • Today's information and communication technology is rapidly developing, the security of IT infrastructure is becoming more important, and at the same time, cyber attacks of various forms are becoming more advanced and sophisticated like intelligent persistent attacks (Advanced Persistent Threat). Early defense or prediction of increasingly sophisticated cyber attacks is extremely important, and in many cases, the analysis of network-based intrusion detection systems (NIDS) related data alone cannot prevent rapidly changing cyber attacks. Therefore, we are currently using data generated by intrusion detection systems to protect against cyber attacks described above through Host-based Intrusion Detection System (HIDS) data analysis. In this paper, we conducted a comparative study on machine learning algorithms using LID-DS (Leipzig Intrusion Detection-Data Set) host-based intrusion detection data including thread information, metadata, and buffer data missing from previously used data sets. The algorithms used were Decision Tree, Naive Bayes, MLP (Multi-Layer Perceptron), Logistic Regression, LSTM (Long Short-Term Memory model), and RNN (Recurrent Neural Network). Accuracy, accuracy, recall, F1-Score indicators and error rates were measured for evaluation. As a result, the LSTM algorithm had the highest accuracy.

The Study on Hypertension Cure Rate Management Centering around Wellness Local Community : With GwangJu as a Central Figure (웰니스 지역사회 중심의 고혈압 치료율 관리 방안에 관한 연구 : 광주광역시 중심으로)

  • Yang, Yu-Jeong;Park, Jong-Ho
    • Journal of Korea Entertainment Industry Association
    • /
    • v.15 no.8
    • /
    • pp.351-361
    • /
    • 2021
  • This study was conducted to identify the factors of hypertension treatment in Gwangju and to establish a hypertension cure rate management plan by using local community health surveys to provide the hypertension cure rate management plan centering around the wellness local community. The research collected 13,714 Gwangju research data among a total of 685,820 local community health surveys of KDCA (Korea Disease Control and Prevention Agency) from 2017 to 2019. Among the data, 2,941 subjects, those with diagnosed hypertension aged over 30, were selected and analyzed through SAS 9.4, SAS Enterprise Miner 15.1. The results are as follows. The differences in hypertension diagnosis cure rate in Gwangju based on the subjects' socioeconomic characteristics were shown in gender, age, marital status, level of educational attainment, economic activity status, and monthly income. The significant differences in hypertension cure rate based on health behavior characteristics were shown in current smoking, monthly alcohol consumption, high-risk drinking, breakfast, recognition of good health level, diabetes and treatment, annual unmet medical needs, and annual health center use. As a result of the logistic regression analysis and interactive decision tree analysis to identify the factors affecting hypertension treatment, the research found that the factors that appear are age, marital status, diabetes and treatment, and annual unmet medical needs. Accordingly, to increase the recognition of the importance of hypertension treatment to people of young ages and not to develop complications, public health-educational effort in Gwangju is needed with an effective preparation plan.

A Study on Korean Local Governments' Operation of Participatory Budgeting System : Classification by Support Vector Machine Technique (한국 지방자치단체의 주민참여예산제도 운영에 관한 연구 - Support Vector Machine 기법을 이용한 유형 구분)

  • Junhyun Han;Jaemin Ryou;Jayon Bae;Chunghyeok Im
    • The Journal of the Convergence on Culture Technology
    • /
    • v.10 no.3
    • /
    • pp.461-466
    • /
    • 2024
  • Korean local governments operates the participatory budgeting system autonomously. This study is to classify these entities into clusters. Among the diverse machine learning methodologies(Neural Network, Rule Induction(CN2), KNN, Decision Tree, Random Forest, Gradient Boosting, SVM, Naïve Bayes), the Support Vector Machine technique emerged as the most efficacious in the analysis of 2022 Korean municipalities data. The first cluster C1 is characterized by minimal committee activity but a substantial allocation of participatory budgeting; another cluster C3 comprises cities that exhibit a passive stance. The majority of cities falls into the final cluster C2 which is noted for its proactive engagement in. Overall, most Korean local government operates the participatory busgeting system in good shape. Only a small number of cities is less active in this system. We anticipate that analyzing time-series data from the past decade in follow-up studies will further enhance the reliability of classifying local government types regarding participatory budgeting.

A Method for Business Process Analysis by using Decision Tree (의사결정나무를 활용한 비즈니스 프로세스 분석)

  • Hur, Won-Chang;Bae, Hye-Rim;Kim, Seung;Jeong, Ki-Seong
    • The Journal of Society for e-Business Studies
    • /
    • v.13 no.3
    • /
    • pp.51-66
    • /
    • 2008
  • The Business Process Management System(BPMS) has received more attentions as companies increasingly realize the importance of business processes. However, traditional BPMS has focused mainly on correct modeling and exact automation of process flow, and paid little attention to the achievement of final goals of improving process efficiency and innovating processes. BPMS usually generates enormous amounts of log data during and after execution of processes, where numerous meaningful rules and patterns are hidden. In the present study we employ the data mining technique to find out useful knowledge from the complicated process log data. A data model and a system framework for process mining are provided to help understand the existing BPMS. Experiments with the simulated data demonstrate the effectiveness of the model and the framework.

  • PDF

Korean Traditional Music Genre Classification Using Sample and MIDI Phrases

  • Lee, JongSeol;Lee, MyeongChun;Jang, Dalwon;Yoon, Kyoungro
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.4
    • /
    • pp.1869-1886
    • /
    • 2018
  • This paper proposes a MIDI- and audio-based music genre classification method for Korean traditional music. There are many traditional instruments in Korea, and most of the traditional songs played using the instruments have similar patterns and rhythms. Although music information processing such as music genre classification and audio melody extraction have been studied, most studies have focused on pop, jazz, rock, and other universal genres. There are few studies on Korean traditional music because of the lack of datasets. This paper analyzes raw audio and MIDI phrases in Korean traditional music, performed using Korean traditional musical instruments. The classified samples and MIDI, based on our classification system, will be used to construct a database or to implement our Kontakt-based instrument library. Thus, we can construct a management system for a Korean traditional music library using this classification system. Appropriate feature sets for raw audio and MIDI phrases are proposed and the classification results-based on machine learning algorithms such as support vector machine, multi-layer perception, decision tree, and random forest-are outlined in this paper.

A Study on Methods to Prevent the Spread of COVID-19 Based on Machine Learning

  • KWAK, Youngsang;KANG, Min Soo
    • Korean Journal of Artificial Intelligence
    • /
    • v.8 no.1
    • /
    • pp.7-9
    • /
    • 2020
  • In this paper, a study was conducted to find a self-diagnosis method to prevent the spread of COVID-19 based on machine learning. COVID-19 is an infectious disease caused by a newly discovered coronavirus. According to WHO(World Health Organization)'s situation report published on May 18th, 2020, COVID-19 has already affected 4,600,000 cases and 310,000 deaths globally and still increasing. The most severe problem of COVID-19 virus is that it spreads primarily through droplets of saliva or discharge from the nose when an infected person coughs or sneezes, which occurs in everyday life. And also, at this time, there are no specific vaccines or treatments for COVID-19. Because of the secure diffusion method and the absence of a vaccine, it is essential to self-diagnose or do a self-diagnosis questionnaire whenever possible. But self-diagnosing has too many questions, and ambiguous standards also take time. Therefore, in this study, using SVM(Support Vector Machine), Decision Tree and correlation analysis found two vital factors to predict the infection of the COVID-19 virus with an accuracy of 80%. Applying the result proposed in this paper, people can self-diagnose quickly to prevent COVID-19 and further prevent the spread of COVID-19.

A Study on the Database Marketing using Data Mining in the Traditional Medicine (데이터마이닝을 활용한 한방분야에서의 데이터베이스 마케팅에 대한 연구)

  • Lee Sang-Young;Lee Yun-Seok
    • Journal of the Korea Society of Computer and Information
    • /
    • v.10 no.5 s.37
    • /
    • pp.271-280
    • /
    • 2005
  • This study is to elicit the factors affected on the medical examination in the tra야tional medicine using the technical method of the decision tree and characterize the Patient subject by clustering analysis technique. And to draw results from the association analysis between the form of diseases in the re-hospitalized Patient group. The obtained results were analyzed for their effect on the hospital Profits. Thus. through application of the database marketing to the data mining technique in the tradition리 medicine, the characteristics of patient clients for the objective induction of factors affected on the hospital Fronts can be identified. Practical application of the database marketing as presented in this study will bring about a fundamental efficiency of hospital management and vitalization.

  • PDF

Automated condition assessment of concrete bridges with digital imaging

  • Adhikari, Ram S.;Bagchi, Ashutosh;Moselhi, Osama
    • Smart Structures and Systems
    • /
    • v.13 no.6
    • /
    • pp.901-925
    • /
    • 2014
  • The reliability of a Bridge management System depends on the quality of visual inspection and the reliable estimation of bridge condition rating. However, the current practices of visual inspection have been identified with several limitations, such as: they are time-consuming, provide incomplete information, and their reliance on inspectors' experience. To overcome such limitations, this paper presents an approach of automating the prediction of condition rating for bridges based on digital image analysis. The proposed methodology encompasses image acquisition, development of 3D visualization model, image processing, and condition rating model. Under this method, scaling defect in concrete bridge components is considered as a candidate defect and the guidelines in the Ontario Structure Inspection Manual (OSIM) have been adopted for developing and testing the proposed method. The automated algorithms for scaling depth prediction and mapping of condition ratings are based on training of back propagation neural networks. The result of developed models showed better prediction capability of condition rating over the existing methods such as, Naïve Bayes Classifiers and Bagged Decision Tree.

A Study on the Analysis of Fire Patterns using the Decision Tree Analysis Method (의사결정분석방법을 활용한 화재유형분석에 관한 연구)

  • Lee, Hae-Pyeong;Lee, Seung-Chul;Hwang, Me-Jung;Park, Young-Ju;Moon, Kyong-Ae;Kim, Hyo-Beom
    • Proceedings of the Korea Institute of Fire Science and Engineering Conference
    • /
    • 2010.10a
    • /
    • pp.349-353
    • /
    • 2010
  • 본 연구에서는 통계분석방법 가운데 하나인 의사결정분석방법을 활용하여 소방방재청 국가화재정보시스템의 2007년부터 2009년까지 강원지역에서 발생한 화재발생 데이터를 대상으로 화재발생에 대한 유형을 분석하였다. 이와 같은 분석결과는 체계적이고 효율적인 소방정책을 수립하기 위한 기초 자료로 활용될 수 있을 것으로 사료된다. 변수선정은 NFDS의 변수들 가운데 화재유형분석에 영향을 줄 것으로 판단되는 변수들만을 대상으로 요인변수들을 발화환경, 화재원인, 진화요인 등 3개의 그룹으로 분류하였으며, 목적변수로는 화재피해와 화재건수를 선정하였다. 또한 NFDS의 제공 데이터들 이외에도 분석의 신뢰도와 정확도를 높이고자 통계청에서 제공하는 2007년과 2008년 외부데이터를 포함시켰다. 분석방법은 대분류 차원의 화재유형별 분석을 수행하고자 인명피해, 재산피해, 화재건수 등 3개의 목적변수를 대상으로 변수들의 영향력을 고찰하였다.

  • PDF