• Title/Summary/Keyword: Ensemble Machine Learning Models

Search Result 138, Processing Time 0.024 seconds

SUNSPOT AREA PREDICTION BASED ON COMPLEMENTARY ENSEMBLE EMPIRICAL MODE DECOMPOSITION AND EXTREME LEARNING MACHINE

  • Peng, Lingling
    • Journal of The Korean Astronomical Society
    • /
    • v.53 no.6
    • /
    • pp.139-147
    • /
    • 2020
  • The sunspot area is a critical physical quantity for assessing the solar activity level; forecasts of the sunspot area are of great importance for studies of the solar activity and space weather. We developed an innovative hybrid model prediction method by integrating the complementary ensemble empirical mode decomposition (CEEMD) and extreme learning machine (ELM). The time series is first decomposed into intrinsic mode functions (IMFs) with different frequencies by CEEMD; these IMFs can be divided into three groups, a high-frequency group, a low-frequency group, and a trend group. The ELM forecasting models are established to forecast the three groups separately. The final forecast results are obtained by summing up the forecast values of each group. The proposed hybrid model is applied to the smoothed monthly mean sunspot area archived at NASA's Marshall Space Flight Center (MSFC). We find a mean absolute percentage error (MAPE) and a root mean square error (RMSE) of 1.80% and 9.75, respectively, which indicates that: (1) for the CEEMD-ELM model, the predicted sunspot area is in good agreement with the observed one; (2) the proposed model outperforms previous approaches in terms of prediction accuracy and operational efficiency.

Development of Type 2 Prediction Prediction Based on Big Data (빅데이터 기반 2형 당뇨 예측 알고리즘 개발)

  • Hyun Sim;HyunWook Kim
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.18 no.5
    • /
    • pp.999-1008
    • /
    • 2023
  • Early prediction of chronic diseases such as diabetes is an important issue, and improving the accuracy of diabetes prediction is especially important. Various machine learning and deep learning-based methodologies are being introduced for diabetes prediction, but these technologies require large amounts of data for better performance than other methodologies, and the learning cost is high due to complex data models. In this study, we aim to verify the claim that DNN using the pima dataset and k-fold cross-validation reduces the efficiency of diabetes diagnosis models. Machine learning classification methods such as decision trees, SVM, random forests, logistic regression, KNN, and various ensemble techniques were used to determine which algorithm produces the best prediction results. After training and testing all classification models, the proposed system provided the best results on XGBoost classifier with ADASYN method, with accuracy of 81%, F1 coefficient of 0.81, and AUC of 0.84. Additionally, a domain adaptation method was implemented to demonstrate the versatility of the proposed system. An explainable AI approach using the LIME and SHAP frameworks was implemented to understand how the model predicts the final outcome.

Enhancing Autonomous Vehicle RADAR Performance Prediction Model Using Stacking Ensemble (머신러닝 스태킹 앙상블을 이용한 자율주행 자동차 RADAR 성능 향상)

  • Si-yeon Jang;Hye-lim Choi;Yun-ju Oh
    • Journal of Internet Computing and Services
    • /
    • v.25 no.2
    • /
    • pp.21-28
    • /
    • 2024
  • Radar is an essential sensor component in autonomous vehicles, and the market for radar applications in this context is steadily expanding with a growing variety of products. In this study, we aimed to enhance the stability and performance of radar systems by developing and evaluating a radar performance prediction model that can predict radar defects. We selected seven machine learning and deep learning algorithms and trained the model with a total of 49 input data types. Ultimately, when we employed an ensemble of 17 models, it exhibited the highest performance. We anticipate that these research findings will assist in predicting product defects at the production stage, thereby maximizing production yield and minimizing the costs associated with defective products.

Long-term runoff simulation using rainfall LSTM-MLP artificial neural network ensemble (LSTM - MLP 인공신경망 앙상블을 이용한 장기 강우유출모의)

  • An, Sungwook;Kang, Dongho;Sung, Janghyun;Kim, Byungsik
    • Journal of Korea Water Resources Association
    • /
    • v.57 no.2
    • /
    • pp.127-137
    • /
    • 2024
  • Physical models, which are often used for water resource management, are difficult to build and operate with input data and may involve the subjective views of users. In recent years, research using data-driven models such as machine learning has been actively conducted to compensate for these problems in the field of water resources, and in this study, an artificial neural network was used to simulate long-term rainfall runoff in the Osipcheon watershed in Samcheok-si, Gangwon-do. For this purpose, three input data groups (meteorological observations, daily precipitation and potential evapotranspiration, and daily precipitation - potential evapotranspiration) were constructed from meteorological data, and the results of training the LSTM (Long Short-term Memory) artificial neural network model were compared and analyzed. As a result, the performance of LSTM-Model 1 using only meteorological observations was the highest, and six LSTM-MLP ensemble models with MLP artificial neural networks were built to simulate long-term runoff in the Fifty Thousand Watershed. The comparison between the LSTM and LSTM-MLP models showed that both models had generally similar results, but the MAE, MSE, and RMSE of LSTM-MLP were reduced compared to LSTM, especially in the low-flow part. As the results of LSTM-MLP show an improvement in the low-flow part, it is judged that in the future, in addition to the LSTM-MLP model, various ensemble models such as CNN can be used to build physical models and create sulfur curves in large basins that take a long time to run and unmeasured basins that lack input data.

A Comparative Analysis of the Pre-Processing in the Kaggle Titanic Competition

  • Tai-Sung, Hur;Suyoung, Bang
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.3
    • /
    • pp.17-24
    • /
    • 2023
  • Based on the problem of 'Tatanic - Machine Learning from Disaster', a representative competition of Kaggle that presents challenges related to data science and solves them, we want to see how data preprocessing and model construction affect prediction accuracy and score. We compare and analyze the features by selecting seven top-ranked solutions with high scores, except when using redundant models or ensemble techniques. It was confirmed that most of the pretreatment has unique and differentiated characteristics, and although the pretreatment process was almost the same, there were differences in scores depending on the type of model. The comparative analysis study in this paper is expected to help participants in the kaggle competition and data science beginners by understanding the characteristics and analysis flow of the preprocessing methods of the top score participants.

Assessment of compressive strength of high-performance concrete using soft computing approaches

  • Chukwuemeka Daniel;Jitendra Khatti;Kamaldeep Singh Grover
    • Computers and Concrete
    • /
    • v.33 no.1
    • /
    • pp.55-75
    • /
    • 2024
  • The present study introduces an optimum performance soft computing model for predicting the compressive strength of high-performance concrete (HPC) by comparing models based on conventional (kernel-based, covariance function-based, and tree-based), advanced machine (least square support vector machine-LSSVM and minimax probability machine regressor-MPMR), and deep (artificial neural network-ANN) learning approaches using a common database for the first time. A compressive strength database, having results of 1030 concrete samples, has been compiled from the literature and preprocessed. For the purpose of training, testing, and validation of soft computing models, 803, 101, and 101 data points have been selected arbitrarily from preprocessed data points, i.e., 1005. Thirteen performance metrics, including three new metrics, i.e., a20-index, index of agreement, and index of scatter, have been implemented for each model. The performance comparison reveals that the SVM (kernel-based), ET (tree-based), MPMR (advanced), and ANN (deep) models have achieved higher performance in predicting the compressive strength of HPC. From the overall analysis of performance, accuracy, Taylor plot, accuracy metric, regression error characteristics curve, Anderson-Darling, Wilcoxon, Uncertainty, and reliability, it has been observed that model CS4 based on the ensemble tree has been recognized as an optimum performance model with higher performance, i.e., a correlation coefficient of 0.9352, root mean square error of 5.76 MPa, and mean absolute error of 4.1069 MPa. The present study also reveals that multicollinearity affects the prediction accuracy of Gaussian process regression, decision tree, multilinear regression, and adaptive boosting regressor models, novel research in compressive strength prediction of HPC. The cosine sensitivity analysis reveals that the prediction of compressive strength of HPC is highly affected by cement content, fine aggregate, coarse aggregate, and water content.

Outlier detection of main engine data of a ship using ensemble method (앙상블 기법을 이용한 선박 메인엔진 빅데이터의 이상치 탐지)

  • KIM, Dong-Hyun;LEE, Ji-Hwan;LEE, Sang-Bong;JUNG, Bong-Kyu
    • Journal of the Korean Society of Fisheries and Ocean Technology
    • /
    • v.56 no.4
    • /
    • pp.384-394
    • /
    • 2020
  • This paper proposes an outlier detection model based on machine learning that can diagnose the presence or absence of major engine parts through unsupervised learning analysis of main engine big data of a ship. Engine big data of the ship was collected for more than seven months, and expert knowledge and correlation analysis were performed to select features that are closely related to the operation of the main engine. For unsupervised learning analysis, ensemble model wherein many predictive models are strategically combined to increase the model performance, is used for anomaly detection. As a result, the proposed model successfully detected the anomalous engine status from the normal status. To validate our approach, clustering analysis was conducted to find out the different patterns of anomalies the anomalous point. By examining distribution of each cluster, we could successfully find the patterns of anomalies.

Analyzing Machine Learning Techniques for Fault Prediction Using Web Applications

  • Malhotra, Ruchika;Sharma, Anjali
    • Journal of Information Processing Systems
    • /
    • v.14 no.3
    • /
    • pp.751-770
    • /
    • 2018
  • Web applications are indispensable in the software industry and continuously evolve either meeting a newer criteria and/or including new functionalities. However, despite assuring quality via testing, what hinders a straightforward development is the presence of defects. Several factors contribute to defects and are often minimized at high expense in terms of man-hours. Thus, detection of fault proneness in early phases of software development is important. Therefore, a fault prediction model for identifying fault-prone classes in a web application is highly desired. In this work, we compare 14 machine learning techniques to analyse the relationship between object oriented metrics and fault prediction in web applications. The study is carried out using various releases of Apache Click and Apache Rave datasets. En-route to the predictive analysis, the input basis set for each release is first optimized using filter based correlation feature selection (CFS) method. It is found that the LCOM3, WMC, NPM and DAM metrics are the most significant predictors. The statistical analysis of these metrics also finds good conformity with the CFS evaluation and affirms the role of these metrics in the defect prediction of web applications. The overall predictive ability of different fault prediction models is first ranked using Friedman technique and then statistically compared using Nemenyi post-hoc analysis. The results not only upholds the predictive capability of machine learning models for faulty classes using web applications, but also finds that ensemble algorithms are most appropriate for defect prediction in Apache datasets. Further, we also derive a consensus between the metrics selected by the CFS technique and the statistical analysis of the datasets.

Boosting neural networks with an application to bankruptcy prediction (부스팅 인공신경망을 활용한 부실예측모형의 성과개선)

  • Kim, Myoung-Jong;Kang, Dae-Ki
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2009.05a
    • /
    • pp.872-875
    • /
    • 2009
  • In a bankruptcy prediction model, the accuracy is one of crucial performance measures due to its significant economic impacts. Ensemble is one of widely used methods for improving the performance of classification and prediction models. Two popular ensemble methods, Bagging and Boosting, have been applied with great success to various machine learning problems using mostly decision trees as base classifiers. In this paper, we analyze the performance of boosted neural networks for improving the performance of traditional neural networks on bankruptcy prediction tasks. Experimental results on Korean firms indicated that the boosted neural networks showed the improved performance over traditional neural networks.

  • PDF

Transfer Learning based DNN-SVM Hybrid Model for Breast Cancer Classification

  • Gui Rae Jo;Beomsu Baek;Young Soon Kim;Dong Hoon Lim
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.11
    • /
    • pp.1-11
    • /
    • 2023
  • Breast cancer is the disease that affects women the most worldwide. Due to the development of computer technology, the efficiency of machine learning has increased, and thus plays an important role in cancer detection and diagnosis. Deep learning is a field of machine learning technology based on an artificial neural network, and its performance has been rapidly improved in recent years, and its application range is expanding. In this paper, we propose a DNN-SVM hybrid model that combines the structure of a deep neural network (DNN) based on transfer learning and a support vector machine (SVM) for breast cancer classification. The transfer learning-based proposed model is effective for small training data, has a fast learning speed, and can improve model performance by combining all the advantages of a single model, that is, DNN and SVM. To evaluate the performance of the proposed DNN-SVM Hybrid model, the performance test results with WOBC and WDBC breast cancer data provided by the UCI machine learning repository showed that the proposed model is superior to single models such as logistic regression, DNN, and SVM, and ensemble models such as random forest in various performance measures.