• Title/Summary/Keyword: Decision Tree analysis

Search Result 725, Processing Time 0.026 seconds

An Intelligent Game Theoretic Model With Machine Learning For Online Cybersecurity Risk Management

  • Alharbi, Talal
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.6
    • /
    • pp.390-399
    • /
    • 2022
  • Cyber security and resilience are phrases that describe safeguards of ICTs (information and communication technologies) from cyber-attacks or mitigations of cyber event impacts. The sole purpose of Risk models are detections, analyses, and handling by considering all relevant perceptions of risks. The current research effort has resulted in the development of a new paradigm for safeguarding services offered online which can be utilized by both service providers and users. customers. However, rather of relying on detailed studies, this approach emphasizes task selection and execution that leads to successful risk treatment outcomes. Modelling intelligent CSGs (Cyber Security Games) using MLTs (machine learning techniques) was the focus of this research. By limiting mission risk, CSGs maximize ability of systems to operate unhindered in cyber environments. The suggested framework's main components are the Threat and Risk models. These models are tailored to meet the special characteristics of online services as well as the cyberspace environment. A risk management procedure is included in the framework. Risk scores are computed by combining probabilities of successful attacks with findings of impact models that predict cyber catastrophe consequences. To assess successful attacks, models emulating defense against threats can be used in topologies. CSGs consider widespread interconnectivity of cyber systems which forces defending all multi-step attack paths. In contrast, attackers just need one of the paths to succeed. CSGs are game-theoretic methods for identifying defense measures and reducing risks for systems and probe for maximum cyber risks using game formulations (MiniMax). To detect the impacts, the attacker player creates an attack tree for each state of the game using a modified Extreme Gradient Boosting Decision Tree (that sees numerous compromises ahead). Based on the findings, the proposed model has a high level of security for the web sources used in the experiment.

Machine Learning Algorithm for Estimating Ink Usage (머신러닝을 통한 잉크 필요량 예측 알고리즘)

  • Se Wook Kwon;Young Joo Hyun;Hyun Chul Tae
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.46 no.1
    • /
    • pp.23-31
    • /
    • 2023
  • Research and interest in sustainable printing are increasing in the packaging printing industry. Currently, predicting the amount of ink required for each work is based on the experience and intuition of field workers. Suppose the amount of ink produced is more than necessary. In this case, the rest of the ink cannot be reused and is discarded, adversely affecting the company's productivity and environment. Nowadays, machine learning models can be used to figure out this problem. This study compares the ink usage prediction machine learning models. A simple linear regression model, Multiple Regression Analysis, cannot reflect the nonlinear relationship between the variables required for packaging printing, so there is a limit to accurately predicting the amount of ink needed. This study has established various prediction models which are based on CART (Classification and Regression Tree), such as Decision Tree, Random Forest, Gradient Boosting Machine, and XGBoost. The accuracy of the models is determined by the K-fold cross-validation. Error metrics such as root mean squared error, mean absolute error, and R-squared are employed to evaluate estimation models' correctness. Among these models, XGBoost model has the highest prediction accuracy and can reduce 2134 (g) of wasted ink for each work. Thus, this study motivates machine learning's potential to help advance productivity and protect the environment.

An In-depth Survey Analysis Applying Data Mining Techniques (데이터마이닝을 이용한 설문조사의 심층 분석)

  • Kim, Wan-Seop;Lee, Soo-Won
    • Journal of Engineering Education Research
    • /
    • v.9 no.4
    • /
    • pp.71-82
    • /
    • 2006
  • To accomplish the educational objectives of a department, a system for CQI(Continuous Quality Improvement) is necessary. Improving the educational system by survey analysis is one of the most important factors for accomplishing the educational objectives. In general, survey analysis is carried out by using statistical distribution on an attribute or correlation analysis between two attributes. However, these analysis schemes have a limitation that they cannot find relations among various attributes. In this paper, an in-depth survey analysis method applying data mining techniques is presented. Data mining is a technique for extracting interesting knowledges from a large set of data. Survey from undergraduate students in the School of Computing of Soongsil University is analyzed in this paper by using a data mining tool, called Clementine. Results of Clementine analysis show the relationship between 'grade', and other attributes hierarchically, and provide useful information that can be applied in student consulting and program improvement.

Analysis on geographic variations and variational factors in expenditures for hypertension (고혈압 의료비 지역 간 변이 및 변이 요인 분석)

  • Choi, Soon-Ho;Yong, Wang-Sik;Kim, Yoo-Mi
    • Journal of Digital Convergence
    • /
    • v.13 no.10
    • /
    • pp.425-436
    • /
    • 2015
  • This study is to investigate how the expenditures for hypertension is affected by socioeconomic, health care resources, and health behavior factors with a special emphasis on geographic variations and to provide the data about regional management for hypertension. To analyze, we combined a unique data set including key indicators from Medical Service Usage Statistics 2012 by Region by National Health Insurance Corporation, Annual Community Health Survey 2012 by Korea Centers for Disease Control and Prevention and other government organizations at the 247 small administrative districts. We found that the average expenditures of hypertension in 249 small districts is 62,000 won and coefficient of variation is 30.0. Major factors of differences in hypertension expenditure is population density, marital status, household income, number of hospital per 100 thousand, medical expenses outside the jurisdiction, drinking rate, moderate and over-intensity physical activity, and hypertension diagnosis rate. The results of decision tree was that there were significant differences between regions in hypertension diagnosis rate, household income, marital status, number of hospital per 100 thousand, obesity rate, drinking rate. This study concluded that determinants of geographic variations in hypertension spending are not only health resources and socioepidemic characteristics but health behaviors.

A Convergence Study in the Severity-adjusted Mortality Ratio on inpatients with multiple chronic conditions (복합만성질환 입원환자의 중증도 보정 사망비에 대한 융복합 연구)

  • Seo, Young-Suk;Kang, Sung-Hong
    • Journal of Digital Convergence
    • /
    • v.13 no.12
    • /
    • pp.245-257
    • /
    • 2015
  • This study was to develop the predictive model for severity-adjusted mortality of inpatients with multiple chronic conditions and analyse the factors on the variation of hospital standardized mortality ratio(HSMR) to propose the plan to reduce the variation. We collect the data "Korean National Hospital Discharge In-depth Injury Survey" from 2008 to 2010 and select the final 110,700 objects of study who have chronic diseases for principal diagnosis and who are over the age of 30 with more than 2 chronic diseases including principal diagnosis. We designed a severity-adjusted mortality predictive model with using data-mining methods (logistic regression analysis, decision tree and neural network method). In this study, we used the predictive model for severity-adjusted mortality ratio by the decision tree using Elixhauser comorbidity index. As the result of the hospital standardized mortality ratio(HSMR) of inpatients with multiple chronic conditions, there were statistically significant differences in HSMR by the insurance type, bed number of hospital, and the location of hospital. We should find the method based on the result of this study to manage mortality ratio of inpatients with multiple chronic conditions efficiently as the national level. So we should make an effort to increase the quality of medical treatment for inpatients with multiple chronic diseases and to reduce growing medical expenses.

A Study on the Feature Extraction Using Spectral Indices from WorldView-2 Satellite Image (WorldView-2 위성영상의 분광지수를 이용한 개체 추출 연구)

  • Hyejin, Kim;Yongil, Kim;Byungkil, Lee
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.33 no.5
    • /
    • pp.363-371
    • /
    • 2015
  • Feature extraction is one of the main goals in many remote sensing analyses. After high-resolution imagery became more available, it became possible to extract more detailed and specific features. Thus, considerable image segmentation algorithms have been developed, because traditional pixel-based analysis proved insufficient for high-resolution imagery due to its inability to handle the internal variability of complex scenes. However, the individual segmentation method, which simply uses color layers, is limited in its ability to extract various target features with different spectral and shape characteristics. Spectral indices can be used to support effective feature extraction by helping to identify abundant surface materials. This study aims to evaluate a feature extraction method based on a segmentation technique with spectral indices. We tested the extraction of diverse target features-such as buildings, vegetation, water, and shadows from eight band WorldView-2 satellite image using decision tree classification and used the result to draw the appropriate spectral indices for each specific feature extraction. From the results, We identified that spectral band ratios can be applied to distinguish feature classes simply and effectively.

Analysis on Expected Profit for the Effective Operation of Social Cooperative -Focusing on the Education Model of the Meteorological Field (사회적협동조합의 효율적 운영을 위한 기대수익 분석 -기상분야 교육모델을 중심으로)

  • Kim, In-Gyum;Kim, Hyu-Min;Ahn, Suk-Hee;Lee, Seung-Wook;Kim, Jeong-Yun;Lee, Ki-Kwang
    • The Journal of the Korea Contents Association
    • /
    • v.15 no.12
    • /
    • pp.483-492
    • /
    • 2015
  • This study involved elementary schoolchildren in Busan Metropolitan city and assumed the foundation of social cooperative associations that provide education services for meteorological fields, then we analyzed expected profits in a year for successful operation of first year. Twelve variables relating to profits and expenses were derived, and we used the decision tree for analyzing optimal expected profits. Profit-related variables were lecture's fee per hour and price of textbooks. Expense-related variables were production costs for the textbooks, annual salary for a teacher, education costs for a teacher, developing costs for the textbooks, traveling expenses, rental fees, and operating costs. Besides, by adding education demands, the number of grades, and the number of teachers, we analyzed changes in expected profits, considering variability of profits and expenses. As a result, despite of expected lower demands, to increase price of textbooks and education costs per hour was of advantage to enhance expected profits. The reason is that the more demand, the more increased production costs for textbooks, which is because not to make enough profits to offset the increased expenses due to lowered price of textbooks and education costs. Considering the value of public interest for social cooperative associations, price determination only concerning increase in demands will be avoided.

A Study of the Advanced Strategy for ICT-based Public Compensation Business (ICT 기반 공익사업 보상업무 첨단화 방안 연구)

  • Seo, Myoung Bae
    • Smart Media Journal
    • /
    • v.9 no.1
    • /
    • pp.75-83
    • /
    • 2020
  • Compensation services that are indispensable during large-scale public utilities projects have been gradually increasing with the recent increase in construction, but there are no systematic compensation services due to the complicated procedures and manual work. For this reason, various problems such as construction period delays due to various complaints, corruption in compensation work, and impossible to trace the history of compensation data in the past are emerging. In this paper, in order to solve this problem, in-depth interviews and questionnaires were conducted to find out the problems of each compensation status. Based on this, 3 core technologies and 10 technical needs based on ICT were selected to improve the compensation work by deriving STEEP analysis and Issue Tree. The three core technologies are big data-based decision-making and prediction technology, advanced measurement technology, and open cloud-based compensation platform technology. In order to introduce the derived technologies to the institutions in charge of compensation, the possibility of technology diffusion by project operators was suggested based on the results of the current status of informatization by institution. Based on the core technology derived from this paper, it is necessary to make a prototype that can be advanced in compensation work and apply it to each institution and analyze the effect.

The statistical factors affecting the freezing of the road pavement (도로포장체의 동결에 영향을 미치는 통계적 요인)

  • Kim, Hyun-Ji;Lee, Jea-Young;Kim, Byung-Doo;Cho, Gyu-Tae
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.1
    • /
    • pp.67-74
    • /
    • 2016
  • Due to the character of the climate of Korea, the pavement of a road is Influenced by freezing in winter season and thawing in thawing season. In the last few years, several articles have been devoted to the study to minimize the damage of freezing and thawing action. The purpose of this paper is to identify appropriacy of factors that influence road pavement thickness. We conduct the decision tree analysis on the field data of road pavement. The target variable is 'Frost penetration'. This value was calculated from the temperature data. The input variables are 'Region', 'Type of road pavement', 'Anti-frost layer', 'Month' and 'Air temperature'. The region was divided into 9 regions by freezing index $350{\sim}450^{\circ}C{\cdot}day$, $450{\sim}550^{\circ}C{\cdot}day$, $550{\sim}650^{\circ}C{\cdot}day$. The type of road pavement has three-section such as area of cutting, boundary area of cutting and bankin, lower area of banking. As the result, the variables that influence 'Frost penetration' are Month, followed by anti-frost layer, air temperature and region.

Predicting Power Generation Patterns Using the Wind Power Data (풍력 데이터를 이용한 발전 패턴 예측)

  • Suh, Dong-Hyok;Kim, Kyu-Ik;Kim, Kwang-Deuk;Ryu, Keun-Ho
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.11
    • /
    • pp.245-253
    • /
    • 2011
  • Due to the imprudent spending of the fossil fuels, the environment was contaminated seriously and the exhaustion problems of the fossil fuels loomed large. Therefore people become taking a great interest in alternative energy resources which can solve problems of fossil fuels. The wind power energy is one of the most interested energy in the new and renewable energy. However, the plants of wind power energy and the traditional power plants should be balanced between the power generation and the power consumption. Therefore, we need analysis and prediction to generate power efficiently using wind energy. In this paper, we have performed a research to predict power generation patterns using the wind power data. Prediction approaches of datamining area can be used for building a prediction model. The research steps are as follows: 1) we performed preprocessing to handle the missing values and anomalous data. And we extracted the characteristic vector data. 2) The representative patterns were found by the MIA(Mean Index Adequacy) measure and the SOM(Self-Organizing Feature Map) clustering approach using the normalized dataset. We assigned the class labels to each data. 3) We built a new predicting model about the wind power generation with classification approach. In this experiment, we built a forecasting model to predict wind power generation patterns using the decision tree.