• Title/Summary/Keyword: Ensemble learning technique

Search Result 72, Processing Time 0.024 seconds

A New Ensemble Machine Learning Technique with Multiple Stacking (다중 스태킹을 가진 새로운 앙상블 학습 기법)

  • Lee, Su-eun;Kim, Han-joon
    • The Journal of Society for e-Business Studies
    • /
    • v.25 no.3
    • /
    • pp.1-13
    • /
    • 2020
  • Machine learning refers to a model generation technique that can solve specific problems from the generalization process for given data. In order to generate a high performance model, high quality training data and learning algorithms for generalization process should be prepared. As one way of improving the performance of model to be learned, the Ensemble technique generates multiple models rather than a single model, which includes bagging, boosting, and stacking learning techniques. This paper proposes a new Ensemble technique with multiple stacking that outperforms the conventional stacking technique. The learning structure of multiple stacking ensemble technique is similar to the structure of deep learning, in which each layer is composed of a combination of stacking models, and the number of layers get increased so as to minimize the misclassification rate of each layer. Through experiments using four types of datasets, we have showed that the proposed method outperforms the exiting ones.

The ensemble approach in comparison with the diverse feature selection techniques for estimating NPPs parameters using the different learning algorithms of the feed-forward neural network

  • Moshkbar-Bakhshayesh, Khalil
    • Nuclear Engineering and Technology
    • /
    • v.53 no.12
    • /
    • pp.3944-3951
    • /
    • 2021
  • Several reasons such as no free lunch theorem indicate that there is not a universal Feature selection (FS) technique that outperforms other ones. Moreover, some approaches such as using synthetic dataset, in presence of large number of FS techniques, are very tedious and time consuming task. In this study to tackle the issue of dependency of estimation accuracy on the selected FS technique, a methodology based on the heterogeneous ensemble is proposed. The performance of the major learning algorithms of neural network (i.e. the FFNN-BR, the FFNN-LM) in combination with the diverse FS techniques (i.e. the NCA, the F-test, the Kendall's tau, the Pearson, the Spearman, and the Relief) and different combination techniques of the heterogeneous ensemble (i.e. the Min, the Median, the Arithmetic mean, and the Geometric mean) are considered. The target parameters/transients of Bushehr nuclear power plant (BNPP) are examined as the case study. The results show that the Min combination technique gives the more accurate estimation. Therefore, if the number of FS techniques is m and the number of learning algorithms is n, by the heterogeneous ensemble, the search space for acceptable estimation of the target parameters may be reduced from n × m to n × 1. The proposed methodology gives a simple and practical approach for more reliable and more accurate estimation of the target parameters compared to the methods such as the use of synthetic dataset or trial and error methods.

Estimating Farmland Prices Using Distance Metrics and an Ensemble Technique (거리척도와 앙상블 기법을 활용한 지가 추정)

  • Lee, Chang-Ro;Park, Key-Ho
    • Journal of Cadastre & Land InformatiX
    • /
    • v.46 no.2
    • /
    • pp.43-55
    • /
    • 2016
  • This study estimated land prices using instance-based learning. A k-nearest neighbor method was utilized among various instance-based learning methods, and the 10 distance metrics including Euclidean distance were calculated in k-nearest neighbor estimation. One distance metric prediction which shows the best predictive performance would be normally chosen as final estimate out of 10 distance metric predictions. In contrast to this practice, an ensemble technique which combines multiple predictions to obtain better performance was applied in this study. We applied the gradient boosting algorithm, a sort of residual-fitting model to our data in ensemble combining. Sales price data of farm lands in Haenam-gun, Jeolla Province were used to demonstrate advantages of instance-based learning as well as an ensemble technique. The result showed that the ensemble prediction was more accurate than previous 10 distance metric predictions.

Ensemble Design of Machine Learning Technigues: Experimental Verification by Prediction of Drifter Trajectory (앙상블을 이용한 기계학습 기법의 설계: 뜰개 이동경로 예측을 통한 실험적 검증)

  • Lee, Chan-Jae;Kim, Yong-Hyuk
    • Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology
    • /
    • v.8 no.3
    • /
    • pp.57-67
    • /
    • 2018
  • The ensemble is a unified approach used for getting better performance by using multiple algorithms in machine learning. In this paper, we introduce boosting and bagging, which have been widely used in ensemble techniques, and design a method using support vector regression, radial basis function network, Gaussian process, and multilayer perceptron. In addition, our experiment was performed by adding a recurrent neural network and MOHID numerical model. The drifter data used for our experimental verification consist of 683 observations in seven regions. The performance of our ensemble technique is verified by comparison with four algorithms each. As verification, mean absolute error was adapted. The presented methods are based on ensemble models using bagging, boosting, and machine learning. The error rate was calculated by assigning the equal weight value and different weight value to each unit model in ensemble. The ensemble model using machine learning showed 61.7% improvement compared to the average of four machine learning technique.

Transfer Learning-Based Feature Fusion Model for Classification of Maneuver Weapon Systems

  • Jinyong Hwang;You-Rak Choi;Tae-Jin Park;Ji-Hoon Bae
    • Journal of Information Processing Systems
    • /
    • v.19 no.5
    • /
    • pp.673-687
    • /
    • 2023
  • Convolutional neural network-based deep learning technology is the most commonly used in image identification, but it requires large-scale data for training. Therefore, application in specific fields in which data acquisition is limited, such as in the military, may be challenging. In particular, the identification of ground weapon systems is a very important mission, and high identification accuracy is required. Accordingly, various studies have been conducted to achieve high performance using small-scale data. Among them, the ensemble method, which achieves excellent performance through the prediction average of the pre-trained models, is the most representative method; however, it requires considerable time and effort to find the optimal combination of ensemble models. In addition, there is a performance limitation in the prediction results obtained by using an ensemble method. Furthermore, it is difficult to obtain the ensemble effect using models with imbalanced classification accuracies. In this paper, we propose a transfer learning-based feature fusion technique for heterogeneous models that extracts and fuses features of pre-trained heterogeneous models and finally, fine-tunes hyperparameters of the fully connected layer to improve the classification accuracy. The experimental results of this study indicate that it is possible to overcome the limitations of the existing ensemble methods by improving the classification accuracy through feature fusion between heterogeneous models based on transfer learning.

Optimal Selection of Classifier Ensemble Using Genetic Algorithms (유전자 알고리즘을 이용한 분류자 앙상블의 최적 선택)

  • Kim, Myung-Jong
    • Journal of Intelligence and Information Systems
    • /
    • v.16 no.4
    • /
    • pp.99-112
    • /
    • 2010
  • Ensemble learning is a method for improving the performance of classification and prediction algorithms. It is a method for finding a highly accurateclassifier on the training set by constructing and combining an ensemble of weak classifiers, each of which needs only to be moderately accurate on the training set. Ensemble learning has received considerable attention from machine learning and artificial intelligence fields because of its remarkable performance improvement and flexible integration with the traditional learning algorithms such as decision tree (DT), neural networks (NN), and SVM, etc. In those researches, all of DT ensemble studies have demonstrated impressive improvements in the generalization behavior of DT, while NN and SVM ensemble studies have not shown remarkable performance as shown in DT ensembles. Recently, several works have reported that the performance of ensemble can be degraded where multiple classifiers of an ensemble are highly correlated with, and thereby result in multicollinearity problem, which leads to performance degradation of the ensemble. They have also proposed the differentiated learning strategies to cope with performance degradation problem. Hansen and Salamon (1990) insisted that it is necessary and sufficient for the performance enhancement of an ensemble that the ensemble should contain diverse classifiers. Breiman (1996) explored that ensemble learning can increase the performance of unstable learning algorithms, but does not show remarkable performance improvement on stable learning algorithms. Unstable learning algorithms such as decision tree learners are sensitive to the change of the training data, and thus small changes in the training data can yield large changes in the generated classifiers. Therefore, ensemble with unstable learning algorithms can guarantee some diversity among the classifiers. To the contrary, stable learning algorithms such as NN and SVM generate similar classifiers in spite of small changes of the training data, and thus the correlation among the resulting classifiers is very high. This high correlation results in multicollinearity problem, which leads to performance degradation of the ensemble. Kim,s work (2009) showedthe performance comparison in bankruptcy prediction on Korea firms using tradition prediction algorithms such as NN, DT, and SVM. It reports that stable learning algorithms such as NN and SVM have higher predictability than the unstable DT. Meanwhile, with respect to their ensemble learning, DT ensemble shows the more improved performance than NN and SVM ensemble. Further analysis with variance inflation factor (VIF) analysis empirically proves that performance degradation of ensemble is due to multicollinearity problem. It also proposes that optimization of ensemble is needed to cope with such a problem. This paper proposes a hybrid system for coverage optimization of NN ensemble (CO-NN) in order to improve the performance of NN ensemble. Coverage optimization is a technique of choosing a sub-ensemble from an original ensemble to guarantee the diversity of classifiers in coverage optimization process. CO-NN uses GA which has been widely used for various optimization problems to deal with the coverage optimization problem. The GA chromosomes for the coverage optimization are encoded into binary strings, each bit of which indicates individual classifier. The fitness function is defined as maximization of error reduction and a constraint of variance inflation factor (VIF), which is one of the generally used methods to measure multicollinearity, is added to insure the diversity of classifiers by removing high correlation among the classifiers. We use Microsoft Excel and the GAs software package called Evolver. Experiments on company failure prediction have shown that CO-NN is effectively applied in the stable performance enhancement of NNensembles through the choice of classifiers by considering the correlations of the ensemble. The classifiers which have the potential multicollinearity problem are removed by the coverage optimization process of CO-NN and thereby CO-NN has shown higher performance than a single NN classifier and NN ensemble at 1% significance level, and DT ensemble at 5% significance level. However, there remain further research issues. First, decision optimization process to find optimal combination function should be considered in further research. Secondly, various learning strategies to deal with data noise should be introduced in more advanced further researches in the future.

Development of a High-Performance Concrete Compressive-Strength Prediction Model Using an Ensemble Machine-Learning Method Based on Bagging and Stacking (배깅 및 스태킹 기반 앙상블 기계학습법을 이용한 고성능 콘크리트 압축강도 예측모델 개발)

  • Yun-Ji Kwak;Chaeyeon Go;Shinyoung Kwag;Seunghyun Eem
    • Journal of the Computational Structural Engineering Institute of Korea
    • /
    • v.36 no.1
    • /
    • pp.9-18
    • /
    • 2023
  • Predicting the compressive strength of high-performance concrete (HPC) is challenging because of the use of additional cementitious materials; thus, the development of improved predictive models is essential. The purpose of this study was to develop an HPC compressive-strength prediction model using an ensemble machine-learning method of combined bagging and stacking techniques. The result is a new ensemble technique that integrates the existing ensemble methods of bagging and stacking to solve the problems of a single machine-learning model and improve the prediction performance of the model. The nonlinear regression, support vector machine, artificial neural network, and Gaussian process regression approaches were used as single machine-learning methods and bagging and stacking techniques as ensemble machine-learning methods. As a result, the model of the proposed method showed improved accuracy results compared with single machine-learning models, an individual bagging technique model, and a stacking technique model. This was confirmed through a comparison of four representative performance indicators, verifying the effectiveness of the method.

A Study on the Work-time Estimation for Block Erections Using Stacking Ensemble Learning (Stacking Ensemble Learning을 활용한 블록 탑재 시수 예측)

  • Kwon, Hyukcheon;Ruy, Wonsun
    • Journal of the Society of Naval Architects of Korea
    • /
    • v.56 no.6
    • /
    • pp.488-496
    • /
    • 2019
  • The estimation of block erection work time at a dock is one of the important factors when establishing or managing the total shipbuilding schedule. In order to predict the work time, it is a natural approach that the existing block erection data would be used to solve the problem. Generally the work time per unit is the product of coefficient value, quantity, and product value. Previously, the work time per unit is determined statistically by unit load data. However, we estimate the work time per unit through work time coefficient value from series ships using machine learning. In machine learning, the outcome depends mainly on how the training data is organized. Therefore, in this study, we use 'Feature Engineering' to determine which one should be used as features, and to check their influence on the result. In order to get the coefficient value of each block, we try to solve this problem through the Ensemble learning methods which is actively used nowadays. Among the many techniques of Ensemble learning, the final model is constructed by Stacking Ensemble techniques, consisting of the existing Ensemble models (Decision Tree, Random Forest, Gradient Boost, Square Loss Gradient Boost, XG Boost), and the accuracy is maximized by selecting three candidates among all models. Finally, the results of this study are verified by the predicted total work time for one ship among the same series.

CNN-based Weighted Ensemble Technique for ImageNet Classification (대용량 이미지넷 인식을 위한 CNN 기반 Weighted 앙상블 기법)

  • Jung, Heechul;Choi, Min-Kook;Kim, Junkwang;Kwon, Soon;Jung, Wooyoung
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.15 no.4
    • /
    • pp.197-204
    • /
    • 2020
  • The ImageNet dataset is a large scale dataset and contains various natural scene images. In this paper, we propose a convolutional neural network (CNN)-based weighted ensemble technique for the ImageNet classification task. First, in order to fuse several models, our technique uses weights for each model, unlike the existing average-based ensemble technique. Then we propose an algorithm that automatically finds the coefficients used in later ensemble process. Our algorithm sequentially selects the model with the best performance of the validation set, and then obtains a weight that improves performance when combined with existing selected models. We applied the proposed algorithm to a total of 13 heterogeneous models, and as a result, 5 models were selected. These selected models were combined with weights, and we achieved 3.297% Top-5 error rate on the ImageNet test dataset.

Prediction of English Premier League Game Using an Ensemble Technique (앙상블 기법을 통한 잉글리시 프리미어리그 경기결과 예측)

  • Yi, Jae Hyun;Lee, Soo Won
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.9 no.5
    • /
    • pp.161-168
    • /
    • 2020
  • Predicting outcome of the sports enables teams to establish their strategy by analyzing variables that affect overall game flow and wins and losses. Many studies have been conducted on the prediction of the outcome of sports events through statistical techniques and machine learning techniques. Predictive performance is the most important in a game prediction model. However, statistical and machine learning models show different optimal performance depending on the characteristics of the data used for learning. In this paper, we propose a new ensemble model to predict English Premier League soccer games using statistical models and the machine learning models which showed good performance in predicting the results of the soccer games and this model is possible to select a model that performs best when predicting the data even if the data are different. The proposed ensemble model predicts game results by learning the final prediction model with the game prediction results of each single model and the actual game results. Experimental results for the proposed model show higher performance than the single models.