• Title/Summary/Keyword: Ensemble Methodology

Search Result 42, Processing Time 0.029 seconds

Securing SCADA Systems: A Comprehensive Machine Learning Approach for Detecting Reconnaissance Attacks

  • Ezaz Aldahasi;Talal Alkharobi
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.12
    • /
    • pp.1-12
    • /
    • 2023
  • Ensuring the security of Supervisory Control and Data Acquisition (SCADA) and Industrial Control Systems (ICS) is paramount to safeguarding the reliability and safety of critical infrastructure. This paper addresses the significant threat posed by reconnaissance attacks on SCADA/ICS networks and presents an innovative methodology for enhancing their protection. The proposed approach strategically employs imbalance dataset handling techniques, ensemble methods, and feature engineering to enhance the resilience of SCADA/ICS systems. Experimentation and analysis demonstrate the compelling efficacy of our strategy, as evidenced by excellent model performance characterized by good precision, recall, and a commendably low false negative (FN). The practical utility of our approach is underscored through the evaluation of real-world SCADA/ICS datasets, showcasing superior performance compared to existing methods in a comparative analysis. Moreover, the integration of feature augmentation is revealed to significantly enhance detection capabilities. This research contributes to advancing the security posture of SCADA/ICS environments, addressing a critical imperative in the face of evolving cyber threats.

Design of A Personalized Classifier using Soft Computing Techniques and Its Application to Facial Expression Recognition

  • Kim, Dae-Jin;Zeungnam Bien
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2003.09a
    • /
    • pp.521-524
    • /
    • 2003
  • In this paper, we propose a design process of 'personalized' classification with soft computing techniques. Based on human's thinking way, a construction methodology for personalized classifier is mentioned. Here, two fuzzy similarity measures and ensemble of classifiers are effectively used. As one of the possible applications, facial expression recognition problem is discussed. The numerical result shows that the proposed method is very useful for on-line learning, reusability of previous knowledge and so on.

  • PDF

Machine learning-based corporate default risk prediction model verification and policy recommendation: Focusing on improvement through stacking ensemble model (머신러닝 기반 기업부도위험 예측모델 검증 및 정책적 제언: 스태킹 앙상블 모델을 통한 개선을 중심으로)

  • Eom, Haneul;Kim, Jaeseong;Choi, Sangok
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.2
    • /
    • pp.105-129
    • /
    • 2020
  • This study uses corporate data from 2012 to 2018 when K-IFRS was applied in earnest to predict default risks. The data used in the analysis totaled 10,545 rows, consisting of 160 columns including 38 in the statement of financial position, 26 in the statement of comprehensive income, 11 in the statement of cash flows, and 76 in the index of financial ratios. Unlike most previous prior studies used the default event as the basis for learning about default risk, this study calculated default risk using the market capitalization and stock price volatility of each company based on the Merton model. Through this, it was able to solve the problem of data imbalance due to the scarcity of default events, which had been pointed out as the limitation of the existing methodology, and the problem of reflecting the difference in default risk that exists within ordinary companies. Because learning was conducted only by using corporate information available to unlisted companies, default risks of unlisted companies without stock price information can be appropriately derived. Through this, it can provide stable default risk assessment services to unlisted companies that are difficult to determine proper default risk with traditional credit rating models such as small and medium-sized companies and startups. Although there has been an active study of predicting corporate default risks using machine learning recently, model bias issues exist because most studies are making predictions based on a single model. Stable and reliable valuation methodology is required for the calculation of default risk, given that the entity's default risk information is very widely utilized in the market and the sensitivity to the difference in default risk is high. Also, Strict standards are also required for methods of calculation. The credit rating method stipulated by the Financial Services Commission in the Financial Investment Regulations calls for the preparation of evaluation methods, including verification of the adequacy of evaluation methods, in consideration of past statistical data and experiences on credit ratings and changes in future market conditions. This study allowed the reduction of individual models' bias by utilizing stacking ensemble techniques that synthesize various machine learning models. This allows us to capture complex nonlinear relationships between default risk and various corporate information and maximize the advantages of machine learning-based default risk prediction models that take less time to calculate. To calculate forecasts by sub model to be used as input data for the Stacking Ensemble model, training data were divided into seven pieces, and sub-models were trained in a divided set to produce forecasts. To compare the predictive power of the Stacking Ensemble model, Random Forest, MLP, and CNN models were trained with full training data, then the predictive power of each model was verified on the test set. The analysis showed that the Stacking Ensemble model exceeded the predictive power of the Random Forest model, which had the best performance on a single model. Next, to check for statistically significant differences between the Stacking Ensemble model and the forecasts for each individual model, the Pair between the Stacking Ensemble model and each individual model was constructed. Because the results of the Shapiro-wilk normality test also showed that all Pair did not follow normality, Using the nonparametric method wilcoxon rank sum test, we checked whether the two model forecasts that make up the Pair showed statistically significant differences. The analysis showed that the forecasts of the Staging Ensemble model showed statistically significant differences from those of the MLP model and CNN model. In addition, this study can provide a methodology that allows existing credit rating agencies to apply machine learning-based bankruptcy risk prediction methodologies, given that traditional credit rating models can also be reflected as sub-models to calculate the final default probability. Also, the Stacking Ensemble techniques proposed in this study can help design to meet the requirements of the Financial Investment Business Regulations through the combination of various sub-models. We hope that this research will be used as a resource to increase practical use by overcoming and improving the limitations of existing machine learning-based models.

A study on the nonlinearity in bio-logical systems using approximate entropy and correlation dimension (근사엔트로피와 상관차원을 이용한 비선형 신호의 분석)

  • Lee, Hae-Jin;Choi, Won-Young;Cha, Kyung-Joon;Park, Moon-Il;Oh, Jae-Eung
    • Proceedings of the Korean Society for Noise and Vibration Engineering Conference
    • /
    • 2007.11a
    • /
    • pp.760-763
    • /
    • 2007
  • We studied how linear and nonlinear heart rate dynamics differ between normal fetuses and uncomplicated small-forgestational age (SGA) fetuses, aged 32-40 weeks' gestation. We analyzed each fetal heart rate time series for 20 min and quantified the complexity (nonlinear dynamics) of each fetal heart rate (FHR) time series by approximate entropy (ApEn) and correlation dimension (CD). The linear dynamics were analyzed by canonical correlation analysis (CCA). The ApEn and CD of the uncomplicated SGA fetuses were significantly lower than that of the normal fetuses in all three gestational periods (32-34, 35-37, 38-40 weeks). Canonical correlation ensemble in SGA fetuses is slightly higher than normal ones in all three gestational periods, especially at 35-37 weeks. Irregularity and complexity of the heart rate dynamics of SGA fetuses are lower than that of normal ones. Also, canonical ensemble in SGA fetuses is higher than in normal ones, suggesting that the FHR control system has multiple complex interactions. Along with the clear difference between the two groups' non-linear chaotic dynamics in FHR patterns, we clarified the hidden subtle differences in linearity (e.g. canonical ensemble). The decrease in non-linear dynamics may contribute to the increase in linear dynamics. The present statistical methodology can be readily and routinely utilized in Obstetrics and Gynecologic fields.

  • PDF

A Study on Estimating Earthquake Magnitudes Based on the Observed S-Wave Seismograms at the Near-Source Region (근거리 지진관측자료의 S파를 이용한 지진규모 평가 연구)

  • Yun, Kwan-Hee;Choi, Shin-Kyu;Lee, Kang-Ryel
    • Journal of the Earthquake Engineering Society of Korea
    • /
    • v.28 no.3
    • /
    • pp.121-128
    • /
    • 2024
  • There are growing concerns that the recently implemented Earthquake Early Warning service is overestimating the rapidly provided earthquake magnitudes (M). As a result, the predicted damages unnecessarily activate earthquake protection systems for critical facilities and lifeline infrastructures that are far away. This study is conducted to improve the estimation accuracy of M by incorporating the observed S-wave seismograms in the near source region after removing the site effects of the seismograms in real time by filtering in the time domain. The ensemble of horizontal S-wave spectra from at least five seismograms without site effects is calculated and normalized to a hypocentric target distance (21.54 km) by using the distance attenuation model of Q(f)=348f0.52 and a cross-over distance of 50 km. The natural logarithmic mean of the S-wave ensemble spectra is then fitted to Brune's source spectrum to obtain the best estimates for M and stress drop (SD) with the fitting weight of 1/standard deviation. The proposed methodology was tested on the 18 recent inland earthquakes in South Korea, and the condition of at least five records for the near-source region is sufficiently fulfilled at an epicentral distance of 30 km. The natural logarithmic standard deviation of the observed S-wave spectra of the ensemble was calculated to be 0.53 using records near the source for 1~10 Hz, compared to 0.42 using whole records. The result shows that the root-mean-square error of M and ln(SD) is approximately 0.17 and 0.6, respectively. This accuracy can provide a confidence interval of 0.4~2.3 of Peak Ground Acceleration values in the distant range.

Energy Efficient Design of a Jet Pump by Ensemble of Surrogates and Evolutionary Approach

  • Husain, Afzal;Sonawat, Arihant;Mohan, Sarath;Samad, Abdus
    • International Journal of Fluid Machinery and Systems
    • /
    • v.9 no.3
    • /
    • pp.265-276
    • /
    • 2016
  • Energy systems working coherently in different conditions may not have a specific design which can provide optimal performance. A system working for a longer period at lower efficiency implies higher energy consumption. In this effort, a methodology demonstrated by a jet pump design and optimization via numerical modeling for fluid dynamics and implementation of an evolutionary algorithm for the optimization shows a reduction in computational costs. The jet pump inherently has a low efficiency because of improper mixing of primary and secondary fluids, and multiple momentum and energy transfer phenomena associated with it. The high fidelity solutions were obtained through a validated numerical model to construct an approximate function through surrogate analysis. Pareto-optimal solutions for two objective functions, i.e., secondary fluid pressure head and primary fluid pressure-drop, were generated through a multi-objective genetic algorithm. For the jet pump geometry, a design space of several design variables was discretized using the Latin hypercube sampling method for the optimization. The performance analysis of the surrogate models shows that the combined surrogates perform better than a single surrogate and the optimized jet pump shows a higher performance. The approach can be implemented in other energy systems to find a better design.

An Ensemble Model for Machine Failure Prediction (앙상블 모델 기반의 기계 고장 예측 방법)

  • Cheon, Kang Min;Yang, Jaekyung
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.43 no.1
    • /
    • pp.123-131
    • /
    • 2020
  • There have been a lot of studies in the past for the method of predicting the failure of a machine, and recently, a lot of researches and applications have been generated to diagnose the physical condition of the machine and the parts and to calculate the remaining life through various methods. Survival models are also used to predict plant failures based on past anomaly cycles. In particular, special machine that reflect the fluid flow and process characteristics of chemical plants are connected to hundreds or thousands of sensors, so there are not many factors that need to be considered, such as process and material data as well as application of derivative variables. In this paper, the data were preprocessed through time series anomaly detection based on unsupervised learning to predict the abnormalities of these special machine. Next, clustering results reflecting clustering-based data characteristics were applied to produce additional variables, and a learning data set was created based on the history of past facility abnormalities. Finally, the prediction methodology based on the supervised learning algorithm was applied, and the model update was confirmed to improve the accuracy of the prediction of facility failure. Through this, it is expected to improve the efficiency of facility operation by flexibly replacing the maintenance time and parts supply and demand by predicting abnormalities of machine and extracting key factors.

Comparing Methodology of Building Energy Analysis - Comparative Analysis from steady-state simulation to data-driven Analysis - (건물에너지 분석 방법론 비교 - Steady-state simulation에서부터 Data-driven 방법론의 비교 분석 -)

  • Cho, Sooyoun;Leigh, Seung-Bok
    • KIEAE Journal
    • /
    • v.17 no.5
    • /
    • pp.77-86
    • /
    • 2017
  • Purpose: Because of the growing concern over fossil fuel use and increasing demand for greenhouse gas emission reduction since the 1990s, the building energy analysis field has produced various types of methods, which are being applied more often and broadly than ever. A lot of research products have been actively proposed in the area of the building energy simulation for over 50 years around the world. However, in the last 20 years, there have been only a few research cases where the trend of building energy analysis is examined, estimated or compared. This research aims to investigate a trend of the building energy analysis by focusing on methodology and characteristics of each method. Method: The research papers addressing the building energy analysis are classified into two types of method: engineering analysis and algorithm estimation. Especially, EPG(Energy Performance Gap), which is the limit both for the existing engineering method and the single algorithm-based estimation method, results from comparing data of two different levels- in other words, real time data and simulation data. Result: When one or more ensemble algorithms are used, more accurate estimations of energy consumption and performance are produced, and thereby improving the problem of energy performance gap.

Machine Learning Methodology for Management of Shipbuilding Master Data

  • Jeong, Ju Hyeon;Woo, Jong Hun;Park, JungGoo
    • International Journal of Naval Architecture and Ocean Engineering
    • /
    • v.12 no.1
    • /
    • pp.428-439
    • /
    • 2020
  • The continuous development of information and communication technologies has resulted in an exponential increase in data. Consequently, technologies related to data analysis are growing in importance. The shipbuilding industry has high production uncertainty and variability, which has created an urgent need for data analysis techniques, such as machine learning. In particular, the industry cannot effectively respond to changes in the production-related standard time information systems, such as the basic cycle time and lead time. Improvement measures are necessary to enable the industry to respond swiftly to changes in the production environment. In this study, the lead times for fabrication, assembly of ship block, spool fabrication and painting were predicted using machine learning technology to propose a new management method for the process lead time using a master data system for the time element in the production data. Data preprocessing was performed in various ways using R and Python, which are open source programming languages, and process variables were selected considering their relationships with the lead time through correlation analysis and analysis of variables. Various machine learning, deep learning, and ensemble learning algorithms were applied to create the lead time prediction models. In addition, the applicability of the proposed machine learning methodology to standard work hour prediction was verified by evaluating the prediction models using the evaluation criteria, such as the Mean Absolute Percentage Error (MAPE) and Root Mean Squared Logarithmic Error (RMSLE).

Behavior Network based Bayesian Network Ensemble Methodology for Recognizing Uncertain Environment (불확실한 환경 인식을 위한 행동 네트워크 기반 베이지안 네트워크 앙상블 기법)

  • Im Seugn-Bin;Cho Sung-Bae
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2005.11a
    • /
    • pp.305-308
    • /
    • 2005
  • 시각 센서를 이용한 환경 및 상황 인식은 로봇의 자동화된 행동을 위해서 매우 중요하다. 실제 환경에서 사람은 주위를 인식할 때 여러 단계의 인식과정을 거친다. 효율적이고 정확한 환경 인식을 위해서는 지능형 로봇의 인식 또한 사람의 인식과정과 같이 다단계로 이루어져야 한다. 또한 실제 환경은 유동적이며 많은 불확실성을 가지고 있으므로 불확실한 상황에 강인한 인식 방법이 필요하다. 이러한 불확실성을 내포한 환경 및 상황 인식에는 베이지안 네트워크를 이용한 인식이 강인하나 복잡한 환경을 하나의 베이지안 네트워크로 인식하는 것은 어렵다. 이 논문에서는 복잡하고 불확실한 환경 인식을 위한 여러 베이지안 네트워크를 사람의 인식과 같은 다단계의 인식 과정으로 구성된 행동 네트워크 기반으로 결합하는 앙상블 기법을 제안한다. 불확실한 상황을 적용한 환경 실험과 로봇 시뮬레이터를 이용한 로봇 실험으로 베이지안 네트워크 앙상블 기법이 환경 인식에 효과적인 것을 확인할 수 있었다.

  • PDF