• Title/Summary/Keyword: Decision Tree analysis

Search Result 725, Processing Time 0.033 seconds

Analysis of the Factors and Patterns Associated with Death in Aircraft Accidents and Incidents Using Data Mining Techniques (데이터 마이닝 기법을 활용한 항공기 사고 및 준사고로 인한 사망 발생 요인 및 패턴 분석)

  • Kim, Jeong-Hun;Kim, Tae-Un;Yoo, Dong-Hee
    • Journal of Digital Convergence
    • /
    • v.17 no.9
    • /
    • pp.79-88
    • /
    • 2019
  • This study analyzes the influential factors and patterns associated with death from aircraft accidents and incidents using data mining techniques. To this end, we used two datasets for aircraft accidents and incidents, one from the National Transportation Safety Board (NTSB) and the other from the Federal Aviation Administration (FAA). We developed our prediction models using the decision tree classifier to predict death from aircraft accidents or aircraft incidents and thereby derive the main cause factors and patterns that can cause death based on these prediction models. In the NTSB data, deaths occurred frequently when the aircraft was destroyed or people were performing dangerous missions or maneuver. In the FAA data, deaths were mainly caused by pilots who were less skilled or less qualified when their aircraft were partially destroyed. Several death-related patterns were also found for parachute jumping and aircraft ascending and descending phases. Using the derived patterns, we proposed helpful strategies to prevent death from the aircraft accidents or incidents.

A recommendation system for assisting devices in long-term care insurance (의사결정나무기법을 활용한 장기요양 복지용구 권고모형 개발)

  • Han, Eun-Jeong;Park, Sanghee;Lee, JungSuk;Kim, Dong-Geon
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.6
    • /
    • pp.693-706
    • /
    • 2018
  • It is very important to support the elderly with disability ageing in place. Assisting devices can help them to live independently in their community; however, they have to be used appropriately to meet care needs. This study develops an assisting device recommendation system for the beneficiaries of long-term care insurance that include algorithms to decide the most appropriate type of assisting device for beneficiaries. We used long-term care (LTC) insurance data for grade assessment including 8,084 beneficiaries from July 2015 to June 2016. In addition, we collected standard care plans for assisting devices, that power-assessors made, considering their performance and ability that could subsequently be matched with grade assessment data. We used a decision-tree model in data-mining to develop the model. Finally, we developed 15 algorithms for recommending assisting devices. The findings might be useful in evidence-based care planning for assisting devices and can contribute to enhancing independence and safety in LTC.

Factors affecting success and failure of Internet company business model using inductive learning based on ID3 algorithm (ID3 알고리즘 기반의 귀납적 추론을 활용한 인터넷 기업 비즈니스 모델의 성공과 실패에 영향을 미치는 요인에 관한 연구)

  • Jin, Dong-su
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.23 no.2
    • /
    • pp.111-116
    • /
    • 2019
  • New technologies such as the IoT, Big Data, and Artificial Intelligence, starting from the Web, mobile, and smart device, enable new business models that did not exist before, and various types of Internet companies based on these business models has been emerged. In this research, we examine the factors that influence the success and failure of Internet companies. To do this, we review the recent studies on business model and examine the variables affecting the success of Internet companies in terms of network effect, user interface, cooperation with actors, creating value for users. Using the five derived variables, we will select 14 Internet companies that succeeded and failed in seven commercial business model categories. We derive decision tree by applying inductive learning based on ID3 algorithm to the analysis result and derive rules that affect success and failure based on derived decision tree. With these rules, we want to present the strategic implications for actors to succeed in Internet companies.

Comparative Analysis of the Binary Classification Model for Improving PM10 Prediction Performance (PM10 예측 성능 향상을 위한 이진 분류 모델 비교 분석)

  • Jung, Yong-Jin;Lee, Jong-Sung;Oh, Chang-Heon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.1
    • /
    • pp.56-62
    • /
    • 2021
  • High forecast accuracy is required as social issues on particulate matter increase. Therefore, many attempts are being made using machine learning to increase the accuracy of particulate matter prediction. However, due to problems with the distribution of imbalance in the concentration and various characteristics of particulate matter, the learning of prediction models is not well done. In this paper, to solve these problems, a binary classification model was proposed to predict the concentration of particulate matter needed for prediction by dividing it into two classes based on the value of 80㎍/㎥. Four classification algorithms were utilized for the binary classification of PM10. Classification algorithms used logistic regression, decision tree, SVM, and MLP. As a result of performance evaluation through confusion matrix, the MLP model showed the highest binary classification performance with 89.98% accuracy among the four models.

Machine Learning Model for Predicting the Residual Useful Lifetime of the CNC Milling Insert (공작기계의 절삭용 인서트의 잔여 유효 수명 예측 모형)

  • Won-Gun Choi;Heungseob Kim;Bong Jin Ko
    • Journal of Advanced Navigation Technology
    • /
    • v.27 no.1
    • /
    • pp.111-118
    • /
    • 2023
  • For the implementation of a smart factory, it is necessary to collect data by connecting various sensors and devices in the manufacturing environment and to diagnose or predict failures in production facilities through data analysis. In this paper, to predict the residual useful lifetime of milling insert used for machining products in CNC machine, weight k-NN algorithm, Decision Tree, SVR, XGBoost, Random forest, 1D-CNN, and frequency spectrum based on vibration signal are investigated. As the results of the paper, the frequency spectrum does not provide a reliable criterion for an accurate prediction of the residual useful lifetime of an insert. And the weighted k-nearest neighbor algorithm performed best with an MAE of 0.0013, MSE of 0.004, and RMSE of 0.0192. This is an error of 0.001 seconds of the remaining useful lifetime of the insert predicted by the weighted-nearest neighbor algorithm, and it is considered to be a level that can be applied to actual industrial sites.

Seoul Local Brand Alley Commercial Area Recommendation System Design Using Machine Learning (머신러닝 기반 서울시 로컬브랜드 골목상권 추천시스템 설계)

  • Jiyeon, Kim;Hyoseon, Jang;Minseo, Park
    • The Journal of the Convergence on Culture Technology
    • /
    • v.9 no.1
    • /
    • pp.101-109
    • /
    • 2023
  • According to data released by the Covid 19 Self-Employed Emergency Response Committee, 95.6% of small business sales due to Covid 19 have decreased over the past two years, and the damage has further increased due to social distancing for quarantine. However, as all social distancing guidelines have rebeen lifted, and the commercial district has been revitalized, the Seoul Metropolitan Government is pushing for a project to foster local brand commercial districts so that small business owners or prospective founders who have closed their businesses due to the prolonged COVID-19. Therefore, this study propose the model that recommends alley commercial districts suitable for founders among the five alley commercial districts selected for the project to foster local brand commercial districts in Seoul. The Seoul Metropolitan Government's local brand alley commercial recommendation system recommends major population age groups and major industries in the commercial district by combining the population perspective model using Xgboost and the commercial district characteristic model using Decision Tree.

Development of Sentiment Analysis Model for the hot topic detection of online stock forums (온라인 주식 포럼의 핫토픽 탐지를 위한 감성분석 모형의 개발)

  • Hong, Taeho;Lee, Taewon;Li, Jingjing
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.187-204
    • /
    • 2016
  • Document classification based on emotional polarity has become a welcomed emerging task owing to the great explosion of data on the Web. In the big data age, there are too many information sources to refer to when making decisions. For example, when considering travel to a city, a person may search reviews from a search engine such as Google or social networking services (SNSs) such as blogs, Twitter, and Facebook. The emotional polarity of positive and negative reviews helps a user decide on whether or not to make a trip. Sentiment analysis of customer reviews has become an important research topic as datamining technology is widely accepted for text mining of the Web. Sentiment analysis has been used to classify documents through machine learning techniques, such as the decision tree, neural networks, and support vector machines (SVMs). is used to determine the attitude, position, and sensibility of people who write articles about various topics that are published on the Web. Regardless of the polarity of customer reviews, emotional reviews are very helpful materials for analyzing the opinions of customers through their reviews. Sentiment analysis helps with understanding what customers really want instantly through the help of automated text mining techniques. Sensitivity analysis utilizes text mining techniques on text on the Web to extract subjective information in the text for text analysis. Sensitivity analysis is utilized to determine the attitudes or positions of the person who wrote the article and presented their opinion about a particular topic. In this study, we developed a model that selects a hot topic from user posts at China's online stock forum by using the k-means algorithm and self-organizing map (SOM). In addition, we developed a detecting model to predict a hot topic by using machine learning techniques such as logit, the decision tree, and SVM. We employed sensitivity analysis to develop our model for the selection and detection of hot topics from China's online stock forum. The sensitivity analysis calculates a sentimental value from a document based on contrast and classification according to the polarity sentimental dictionary (positive or negative). The online stock forum was an attractive site because of its information about stock investment. Users post numerous texts about stock movement by analyzing the market according to government policy announcements, market reports, reports from research institutes on the economy, and even rumors. We divided the online forum's topics into 21 categories to utilize sentiment analysis. One hundred forty-four topics were selected among 21 categories at online forums about stock. The posts were crawled to build a positive and negative text database. We ultimately obtained 21,141 posts on 88 topics by preprocessing the text from March 2013 to February 2015. The interest index was defined to select the hot topics, and the k-means algorithm and SOM presented equivalent results with this data. We developed a decision tree model to detect hot topics with three algorithms: CHAID, CART, and C4.5. The results of CHAID were subpar compared to the others. We also employed SVM to detect the hot topics from negative data. The SVM models were trained with the radial basis function (RBF) kernel function by a grid search to detect the hot topics. The detection of hot topics by using sentiment analysis provides the latest trends and hot topics in the stock forum for investors so that they no longer need to search the vast amounts of information on the Web. Our proposed model is also helpful to rapidly determine customers' signals or attitudes towards government policy and firms' products and services.

Determine Optimal Timing for Out-Licensing of New Drugs in the Aspect of Biotech (신약의 기술이전 최적시기 결정 문제 - 바이오텍의 측면에서)

  • Na, Byungsoo;Kim, Jaeyoung
    • Knowledge Management Research
    • /
    • v.21 no.3
    • /
    • pp.105-121
    • /
    • 2020
  • With regard to the development of new drugs, what is most important for a Korean Biotech, where no global sales network has been established, is decision-making related to out-licensing of new drugs. The probability of success for each clinical phase is different, and the licensing amount and its royalty vary depending on which clinical phase the licensing contract is made. Due to the nature of such a licensing contract and Biotech's weak financial status, it is a very important decision-making issue for a Biotech to determine when to license out to a Big Pharma. This study defined a model called 'optimal timing for out-licensing of new drugs' and the results were derived from the decision tree analysis. As a case study, we applied to a Biotech in Korea, which is conducting FDA global clinical trials for a first-in-class new drug. Assuming that the market size and expected market penetration rate of the target disease are known, it has been shown that out-licensing after phase 1 or phase 2 of clinical trials is a best alternative that maximizes Biotech's profits. This study can provide a conceptual framework for the use of management science methodologies in pharmaceutical fields, thus laying the foundation for knowledge and research on out-licensing of new drugs.

Analysis of Feature Variables for Breast Cancer Diagnosis

  • Jung, Yong Gyu;Kim, Jang Il;Sihn, Sung Chul;Heo, Jun
    • International journal of advanced smart convergence
    • /
    • v.2 no.2
    • /
    • pp.36-39
    • /
    • 2013
  • It is becoming more important as the growing of health information and increasing in cancer patients diagnose over the time gradually. Among the various types of cancer, we focuses on breast cancer diagnosis. The accuracy of breast cancer diagnosis is increasing when the diagnosis is based on evidence and statistics. To do this we use the weka data mining tools and analysis algorithms significantly associated with the decision tree uses rules. In addition, the data pre-processing and cross-validation are used to increase the reliability of the results. The number and cause of the disease becomes important to increase evidence-based medical doctors. As the evidence-based medical, the data obtained from patients in the past through the disease by calculating the probability for future patients to diagnose and predict disease and treatment plan. It can be found by improving the survival rate plays an important role.

A Study on Improving the predict accuracy rate of Hybrid Model Technique Using Error Pattern Modeling : Using Logistic Regression and Discriminant Analysis

  • Cho, Yong-Jun;Hur, Joon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.17 no.2
    • /
    • pp.269-278
    • /
    • 2006
  • This paper presents the new hybrid data mining technique using error pattern, modeling of improving classification accuracy. The proposed method improves classification accuracy by combining two different supervised learning methods. The main algorithm generates error pattern modeling between the two supervised learning methods(ex: Neural Networks, Decision Tree, Logistic Regression and so on.) The Proposed modeling method has been applied to the simulation of 10,000 data sets generated by Normal and exponential random distribution. The simulation results show that the performance of proposed method is superior to the existing methods like Logistic regression and Discriminant analysis.

  • PDF