• Title/Summary/Keyword: Ensemble machine learning

Search Result 231, Processing Time 0.026 seconds

A Development of Defeat Prediction Model Using Machine Learning in Polyurethane Foaming Process for Automotive Seat (머신러닝을 활용한 자동차 시트용 폴리우레탄 발포공정의 불량 예측 모델 개발)

  • Choi, Nak-Hun;Oh, Jong-Seok;Ahn, Jong-Rok;Kim, Key-Sun
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.22 no.6
    • /
    • pp.36-42
    • /
    • 2021
  • With recent developments in the Fourth Industrial Revolution, the manufacturing industry has changed rapidly. Through key aspects of Fourth Industrial Revolution super-connections and super-intelligence, machine learning will be able to make fault predictions during the foam-making process. Polyol and isocyanate are components in polyurethane foam. There has been a lot of research that could affect the characteristics of the products, depending on the specific mixture ratio and temperature. Based on these characteristics, this study collects data from each factor during the foam-making process and applies them to machine learning in order to predict faults. The algorithms used in machine learning are the decision tree, kNN, and an ensemble algorithm, and these algorithms learn from 5,147 cases. Based on 1,000 pieces of data for validation, the learning results show up to 98.5% accuracy using the ensemble algorithm. Therefore, the results confirm the faults of currently produced parts by collecting real-time data from each factor during the foam-making process. Furthermore, control of each of the factors may improve the fault rate.

Ensemble Method for Predicting Particulate Matter and Odor Intensity (미세먼지, 악취 농도 예측을 위한 앙상블 방법)

  • Lee, Jong-Yeong;Choi, Myoung Jin;Joo, Yeongin;Yang, Jaekyung
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.42 no.4
    • /
    • pp.203-210
    • /
    • 2019
  • Recently, a number of researchers have produced research and reports in order to forecast more exactly air quality such as particulate matter and odor. However, such research mainly focuses on the atmospheric diffusion models that have been used for the air quality prediction in environmental engineering area. Even though it has various merits, it has some limitation in that it uses very limited spatial attributes such as geographical attributes. Thus, we propose the new approach to forecast an air quality using a deep learning based ensemble model combining temporal and spatial predictor. The temporal predictor employs the RNN LSTM and the spatial predictor is based on the geographically weighted regression model. The ensemble model also uses the RNN LSTM that combines two models with stacking structure. The ensemble model is capable of inferring the air quality of the areas without air quality monitoring station, and even forecasting future air quality. We installed the IoT sensors measuring PM2.5, PM10, H2S, NH3, VOC at the 8 stations in Jeonju in order to gather air quality data. The numerical results showed that our new model has very exact prediction capability with comparison to the real measured data. It implies that the spatial attributes should be considered to more exact air quality prediction.

Development of benthic macroinvertebrate species distribution models using the Bayesian optimization (베이지안 최적화를 통한 저서성 대형무척추동물 종분포모델 개발)

  • Go, ByeongGeon;Shin, Jihoon;Cha, Yoonkyung
    • Journal of Korean Society of Water and Wastewater
    • /
    • v.35 no.4
    • /
    • pp.259-275
    • /
    • 2021
  • This study explored the usefulness and implications of the Bayesian hyperparameter optimization in developing species distribution models (SDMs). A variety of machine learning (ML) algorithms, namely, support vector machine (SVM), random forest (RF), boosted regression tree (BRT), XGBoost (XGB), and Multilayer perceptron (MLP) were used for predicting the occurrence of four benthic macroinvertebrate species. The Bayesian optimization method successfully tuned model hyperparameters, with all ML models resulting an area under the curve (AUC) > 0.7. Also, hyperparameter search ranges that generally clustered around the optimal values suggest the efficiency of the Bayesian optimization in finding optimal sets of hyperparameters. Tree based ensemble algorithms (BRT, RF, and XGB) tended to show higher performances than SVM and MLP. Important hyperparameters and optimal values differed by species and ML model, indicating the necessity of hyperparameter tuning for improving individual model performances. The optimization results demonstrate that for all macroinvertebrate species SVM and RF required fewer numbers of trials until obtaining optimal hyperparameter sets, leading to reduced computational cost compared to other ML algorithms. The results of this study suggest that the Bayesian optimization is an efficient method for hyperparameter optimization of machine learning algorithms.

Predicting Administrative Issue Designation in KOSDAQ Market Using Machine Learning Techniques (머신러닝을 활용한 코스닥 관리종목지정 예측)

  • Chae, Seung-Il;Lee, Dong-Joo
    • Asia-Pacific Journal of Business
    • /
    • v.13 no.2
    • /
    • pp.107-122
    • /
    • 2022
  • Purpose - This study aims to develop machine learning models to predict administrative issue designation in KOSDAQ Market using financial data. Design/methodology/approach - Employing four classification techniques including logistic regression, support vector machine, random forest, and gradient boosting to a matched sample of five hundred and thirty-six firms over an eight-year period, the authors develop prediction models and explore the practicality of the models. Findings - The resulting four binary selection models reveal overall satisfactory classification performance in terms of various measures including AUC (area under the receiver operating characteristic curve), accuracy, F1-score, and top quartile lift, while the ensemble models (random forest and gradienct boosting) outperform the others in terms of most measures. Research implications or Originality - Although the assessment of administrative issue potential of firms is critical information to investors and financial institutions, detailed empirical investigation has lagged behind. The current research fills this gap in the literature by proposing parsimonious prediction models based on a few financial variables and validating the applicability of the models.

Prediction of ultimate shear strength and failure modes of R/C ledge beams using machine learning framework

  • Ahmed M. Yousef;Karim Abd El-Hady;Mohamed E. El-Madawy
    • Structural Monitoring and Maintenance
    • /
    • v.9 no.4
    • /
    • pp.337-357
    • /
    • 2022
  • The objective of this study is to present a data-driven machine learning (ML) framework for predicting ultimate shear strength and failure modes of reinforced concrete ledge beams. Experimental tests were collected on these beams with different loading, geometric and material properties. The database was analyzed using different ML algorithms including decision trees, discriminant analysis, support vector machine, logistic regression, nearest neighbors, naïve bayes, ensemble and artificial neural networks to identify the governing and critical parameters of reinforced concrete ledge beams. The results showed that ML framework can effectively identify the failure mode of these beams either web shear failure, flexural failure or ledge failure. ML framework can also derive equations for predicting the ultimate shear strength for each failure mode. A comparison of the ultimate shear strength of ledge failure was conducted between the experimental results and the results from the proposed equations and the design equations used by international codes. These comparisons indicated that the proposed ML equations predict the ultimate shear strength of reinforced concrete ledge beams better than the design equations of AASHTO LRFD-2020 or PCI-2020.

Risk Prediction and Analysis of Building Fires -Based on Property Damage and Occurrence of Fires- (건물별 화재 위험도 예측 및 분석: 재산 피해액과 화재 발생 여부를 바탕으로)

  • Lee, Ina;Oh, Hyung-Rok;Lee, Zoonky
    • The Journal of Bigdata
    • /
    • v.6 no.1
    • /
    • pp.133-144
    • /
    • 2021
  • This paper derives the fire risk of buildings in Seoul through the prediction of property damage and the occurrence of fires. This study differs from prior research in that it utilizes variables that include not only a building's characteristics but also its affiliated administrative area as well as the accessibility of nearby fire-fighting facilities. We use Ensemble Voting techniques to merge different machine learning algorithms to predict property damage and fire occurrence, and to extract feature importance to produce fire risk. Fire risk prediction was made on 300 buildings in Seoul utilizing the established model, and it has been derived that with buildings at Level 1 for fire risks, there were a high number of households occupying the building, and the buildings had many factors that could contribute to increasing the size of the fire, including the lack of nearby fire-fighting facilities as well as the far location of the 119 Safety Center. On the other hand, in the case of Level 5 buildings, the number of buildings and businesses is large, but the 119 Safety Center in charge are located closest to the building, which can properly respond to fire.

A Comparative Analysis of the Pre-Processing in the Kaggle Titanic Competition

  • Tai-Sung, Hur;Suyoung, Bang
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.3
    • /
    • pp.17-24
    • /
    • 2023
  • Based on the problem of 'Tatanic - Machine Learning from Disaster', a representative competition of Kaggle that presents challenges related to data science and solves them, we want to see how data preprocessing and model construction affect prediction accuracy and score. We compare and analyze the features by selecting seven top-ranked solutions with high scores, except when using redundant models or ensemble techniques. It was confirmed that most of the pretreatment has unique and differentiated characteristics, and although the pretreatment process was almost the same, there were differences in scores depending on the type of model. The comparative analysis study in this paper is expected to help participants in the kaggle competition and data science beginners by understanding the characteristics and analysis flow of the preprocessing methods of the top score participants.

Malwares Attack Detection Using Ensemble Deep Restricted Boltzmann Machine

  • K. Janani;R. Gunasundari
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.5
    • /
    • pp.64-72
    • /
    • 2024
  • In recent times cyber attackers can use Artificial Intelligence (AI) to boost the sophistication and scope of attacks. On the defense side, AI is used to enhance defense plans, to boost the robustness, flexibility, and efficiency of defense systems, which means adapting to environmental changes to reduce impacts. With increased developments in the field of information and communication technologies, various exploits occur as a danger sign to cyber security and these exploitations are changing rapidly. Cyber criminals use new, sophisticated tactics to boost their attack speed and size. Consequently, there is a need for more flexible, adaptable and strong cyber defense systems that can identify a wide range of threats in real-time. In recent years, the adoption of AI approaches has increased and maintained a vital role in the detection and prevention of cyber threats. In this paper, an Ensemble Deep Restricted Boltzmann Machine (EDRBM) is developed for the classification of cybersecurity threats in case of a large-scale network environment. The EDRBM acts as a classification model that enables the classification of malicious flowsets from the largescale network. The simulation is conducted to test the efficacy of the proposed EDRBM under various malware attacks. The simulation results show that the proposed method achieves higher classification rate in classifying the malware in the flowsets i.e., malicious flowsets than other methods.

A Comparative Analysis of Ensemble Learning-Based Classification Models for Explainable Term Deposit Subscription Forecasting (설명 가능한 정기예금 가입 여부 예측을 위한 앙상블 학습 기반 분류 모델들의 비교 분석)

  • Shin, Zian;Moon, Jihoon;Rho, Seungmin
    • The Journal of Society for e-Business Studies
    • /
    • v.26 no.3
    • /
    • pp.97-117
    • /
    • 2021
  • Predicting term deposit subscriptions is one of representative financial marketing in banks, and banks can build a prediction model using various customer information. In order to improve the classification accuracy for term deposit subscriptions, many studies have been conducted based on machine learning techniques. However, even if these models can achieve satisfactory performance, utilizing them is not an easy task in the industry when their decision-making process is not adequately explained. To address this issue, this paper proposes an explainable scheme for term deposit subscription forecasting. For this, we first construct several classification models using decision tree-based ensemble learning methods, which yield excellent performance in tabular data, such as random forest, gradient boosting machine (GBM), extreme gradient boosting (XGB), and light gradient boosting machine (LightGBM). We then analyze their classification performance in depth through 10-fold cross-validation. After that, we provide the rationale for interpreting the influence of customer information and the decision-making process by applying Shapley additive explanation (SHAP), an explainable artificial intelligence technique, to the best classification model. To verify the practicality and validity of our scheme, experiments were conducted with the bank marketing dataset provided by Kaggle; we applied the SHAP to the GBM and LightGBM models, respectively, according to different dataset configurations and then performed their analysis and visualization for explainable term deposit subscriptions.

Analysis of massive data in astronomy (천문학에서의 대용량 자료 분석)

  • Shin, Min-Su
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.6
    • /
    • pp.1107-1116
    • /
    • 2016
  • Recent astronomical survey observations have produced substantial amounts of data as well as completely changed conventional methods of analyzing astronomical data. Both classical statistical inference and modern machine learning methods have been used in every step of data analysis that range from data calibration to inferences of physical models. We are seeing the growing popularity of using machine learning methods in classical problems of astronomical data analysis due to low-cost data acquisition using cheap large-scale detectors and fast computer networks that enable us to share large volumes of data. It is common to consider the effects of inhomogeneous spatial and temporal coverage in the analysis of big astronomical data. The growing size of the data requires us to use parallel distributed computing environments as well as machine learning algorithms. Distributed data analysis systems have not been adopted widely for the general analysis of massive astronomical data. Gathering adequate training data is expensive in observation and learning data are generally collected from multiple data sources in astronomy; therefore, semi-supervised and ensemble machine learning methods will become important for the analysis of big astronomical data.