• Title/Summary/Keyword: Ensemble model

Search Result 662, Processing Time 0.028 seconds

An Ensemble Cascading Extremely Randomized Trees Framework for Short-Term Traffic Flow Prediction

  • Zhang, Fan;Bai, Jing;Li, Xiaoyu;Pei, Changxing;Havyarimana, Vincent
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.4
    • /
    • pp.1975-1988
    • /
    • 2019
  • Short-term traffic flow prediction plays an important role in intelligent transportation systems (ITS) in areas such as transportation management, traffic control and guidance. For short-term traffic flow regression predictions, the main challenge stems from the non-stationary property of traffic flow data. In this paper, we design an ensemble cascading prediction framework based on extremely randomized trees (extra-trees) using a boosting technique called EET to predict the short-term traffic flow under non-stationary environments. Extra-trees is a tree-based ensemble method. It essentially consists of strongly randomizing both the attribute and cut-point choices while splitting a tree node. This mechanism reduces the variance of the model and is, therefore, more suitable for traffic flow regression prediction in non-stationary environments. Moreover, the extra-trees algorithm uses boosting ensemble technique averaging to improve the predictive accuracy and control overfitting. To the best of our knowledge, this is the first time that extra-trees have been used as fundamental building blocks in boosting committee machines. The proposed approach involves predicting 5 min in advance using real-time traffic flow data in the context of inherently considering temporal and spatial correlations. Experiments demonstrate that the proposed method achieves higher accuracy and lower variance and computational complexity when compared to the existing methods.

Comparison of AT1- and Kalman Filter-Based Ensemble Time Scale Algorithms

  • Lee, Ho Seong;Kwon, Taeg Yong;Lee, Young Kyu;Yang, Sung-hoon;Yu, Dai-Hyuk;Park, Sang Eon;Heo, Myoung-Sun
    • Journal of Positioning, Navigation, and Timing
    • /
    • v.10 no.3
    • /
    • pp.197-206
    • /
    • 2021
  • We compared two typical ensemble time scale algorithms; AT1 and Kalman filter. Four commercial atomic clocks composed of two hydrogen masers and two cesium atomic clocks provided measurement data to the algorithms. The allocation of relative weights to the clocks is important to generate a stable ensemble time. A 30 day-average-weight model, which was obtained from the average Allan variance of each clock, was applied to the AT1 algorithm. For the reduced Kalman filter (Kred) algorithm, we gave the same weights to the two hydrogen masers. We also compared the frequency stabilities of the outcome from the algorithms when the frequency offsets and/or the frequency drift offsets estimated by the algorithms were corrected or not corrected by the KRISS-made primary frequency standard, KRISS-F1. We found that the Kred algorithm is more effective to generate a stable ensemble time scale in the long-term, and the algorithm also generates much enhanced short-term stability when the frequency offset is used for the calculation of the Allan deviation instead of the phase offset.

A Study on the Insider Behavior Analysis Framework for Detecting Information Leakage Using Network Traffic Collection and Restoration (네트워크 트래픽 수집 및 복원을 통한 내부자 행위 분석 프레임워크 연구)

  • Kauh, Janghyuk;Lee, Dongho
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.13 no.4
    • /
    • pp.125-139
    • /
    • 2017
  • In this paper, we developed a framework to detect and predict insider information leakage by collecting and restoring network traffic. For automated behavior analysis, many meta information and behavior information obtained using network traffic collection are used as machine learning features. By these features, we created and learned behavior model, network model and protocol-specific models. In addition, the ensemble model was developed by digitizing and summing the results of various models. We developed a function to present information leakage candidates and view meta information and behavior information from various perspectives using the visual analysis. This supports to rule-based threat detection and machine learning based threat detection. In the future, we plan to make an ensemble model that applies a regression model to the results of the models, and plan to develop a model with deep learning technology.

Stochastic Simulation Model for non-stationary time series using Wavelet AutoRegressive Model

  • Moon, Young-Il;Kwon, Hyun-Han
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2007.05a
    • /
    • pp.1437-1440
    • /
    • 2007
  • Many hydroclimatic time series are marked by interannual and longer quasi-period features that are associated with narrow band oscillatory climate modes. A time series modeling approach that directly considers such structures is developed and presented. The essence of the approach is to first develop a wavelet decomposition of the time series that retains only the statistically significant wavelet components, and to then model each such component and the residual time series as univariate autoregressive processes. The efficacy of this approach is demonstrated through the simulation of observed and paleo reconstructions of climate indices related to ENSO and AMO, tree ring and rainfall time series. Long ensemble simulations that preserve the spectral attributes of the time series in each ensemble member can be generated. The usual low order statistics are preserved by the proposed model, and its long memory performance is superior to the direction application of an autoregressive model.

  • PDF

Impact of Ensemble Member Size on Confidence-based Selection in Bankruptcy Prediction (부도예측을 위한 확신 기반의 선택 접근법에서 앙상블 멤버 사이즈의 영향에 관한 연구)

  • Kim, Na-Ra;Shin, Kyung-Shik;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.2
    • /
    • pp.55-71
    • /
    • 2013
  • The prediction model is the main factor affecting the performance of a knowledge-based system for bankruptcy prediction. Earlier studies on prediction modeling have focused on the building of a single best model using statistical and artificial intelligence techniques. However, since the mid-1980s, integration of multiple techniques (hybrid techniques) and, by extension, combinations of the outputs of several models (ensemble techniques) have, according to the experimental results, generally outperformed individual models. An ensemble is a technique that constructs a set of multiple models, combines their outputs, and produces one final prediction. The way in which the outputs of ensemble members are combined is one of the important issues affecting prediction accuracy. A variety of combination schemes have been proposed in order to improve prediction performance in ensembles. Each combination scheme has advantages and limitations, and can be influenced by domain and circumstance. Accordingly, decisions on the most appropriate combination scheme in a given domain and contingency are very difficult. This paper proposes a confidence-based selection approach as part of an ensemble bankruptcy-prediction scheme that can measure unified confidence, even if ensemble members produce different types of continuous-valued outputs. The present experimental results show that when varying the number of models to combine, according to the creation type of ensemble members, the proposed combination method offers the best performance in the ensemble having the largest number of models, even when compared with the methods most often employed in bankruptcy prediction.

Estimating Farmland Prices Using Distance Metrics and an Ensemble Technique (거리척도와 앙상블 기법을 활용한 지가 추정)

  • Lee, Chang-Ro;Park, Key-Ho
    • Journal of Cadastre & Land InformatiX
    • /
    • v.46 no.2
    • /
    • pp.43-55
    • /
    • 2016
  • This study estimated land prices using instance-based learning. A k-nearest neighbor method was utilized among various instance-based learning methods, and the 10 distance metrics including Euclidean distance were calculated in k-nearest neighbor estimation. One distance metric prediction which shows the best predictive performance would be normally chosen as final estimate out of 10 distance metric predictions. In contrast to this practice, an ensemble technique which combines multiple predictions to obtain better performance was applied in this study. We applied the gradient boosting algorithm, a sort of residual-fitting model to our data in ensemble combining. Sales price data of farm lands in Haenam-gun, Jeolla Province were used to demonstrate advantages of instance-based learning as well as an ensemble technique. The result showed that the ensemble prediction was more accurate than previous 10 distance metric predictions.

A Feature Selection-based Ensemble Method for Arrhythmia Classification

  • Namsrai, Erdenetuya;Munkhdalai, Tsendsuren;Li, Meijing;Shin, Jung-Hoon;Namsrai, Oyun-Erdene;Ryu, Keun Ho
    • Journal of Information Processing Systems
    • /
    • v.9 no.1
    • /
    • pp.31-40
    • /
    • 2013
  • In this paper, a novel method is proposed to build an ensemble of classifiers by using a feature selection schema. The feature selection schema identifies the best feature sets that affect the arrhythmia classification. Firstly, a number of feature subsets are extracted by applying the feature selection schema to the original dataset. Then classification models are built by using the each feature subset. Finally, we combine the classification models by adopting a voting approach to form a classification ensemble. The voting approach in our method involves both classification error rate and feature selection rate to calculate the score of the each classifier in the ensemble. In our method, the feature selection rate depends on the extracting order of the feature subsets. In the experiment, we applied our method to arrhythmia dataset and generated three top disjointed feature sets. We then built three classifiers based on the top-three feature subsets and formed the classifier ensemble by using the voting approach. Our method can improve the classification accuracy in high dimensional dataset. The performance of each classifier and the performance of their ensemble were higher than the performance of the classifier that was based on whole feature space of the dataset. The classification performance was improved and a more stable classification model could be constructed with the proposed approach.

A Comparison of Ensemble Methods Combining Resampling Techniques for Class Imbalanced Data (데이터 전처리와 앙상블 기법을 통한 불균형 데이터의 분류모형 비교 연구)

  • Leea, Hee-Jae;Lee, Sungim
    • The Korean Journal of Applied Statistics
    • /
    • v.27 no.3
    • /
    • pp.357-371
    • /
    • 2014
  • There are many studies related to imbalanced data in which the class distribution is highly skewed. To address the problem of imbalanced data, previous studies deal with resampling techniques which correct the skewness of the class distribution in each sampled subset by using under-sampling, over-sampling or hybrid-sampling such as SMOTE. Ensemble methods have also alleviated the problem of class imbalanced data. In this paper, we compare around a dozen algorithms that combine the ensemble methods and resampling techniques based on simulated data sets generated by the Backbone model, which can handle the imbalance rate. The results on various real imbalanced data sets are also presented to compare the effectiveness of algorithms. As a result, we highly recommend the resampling technique combining ensemble methods for imbalanced data in which the proportion of the minority class is less than 10%. We also find that each ensemble method has a well-matched sampling technique. The algorithms which combine bagging or random forest ensembles with random undersampling tend to perform well; however, the boosting ensemble appears to perform better with over-sampling. All ensemble methods combined with SMOTE outperform in most situations.

A Study on the Work-time Estimation for Block Erections Using Stacking Ensemble Learning (Stacking Ensemble Learning을 활용한 블록 탑재 시수 예측)

  • Kwon, Hyukcheon;Ruy, Wonsun
    • Journal of the Society of Naval Architects of Korea
    • /
    • v.56 no.6
    • /
    • pp.488-496
    • /
    • 2019
  • The estimation of block erection work time at a dock is one of the important factors when establishing or managing the total shipbuilding schedule. In order to predict the work time, it is a natural approach that the existing block erection data would be used to solve the problem. Generally the work time per unit is the product of coefficient value, quantity, and product value. Previously, the work time per unit is determined statistically by unit load data. However, we estimate the work time per unit through work time coefficient value from series ships using machine learning. In machine learning, the outcome depends mainly on how the training data is organized. Therefore, in this study, we use 'Feature Engineering' to determine which one should be used as features, and to check their influence on the result. In order to get the coefficient value of each block, we try to solve this problem through the Ensemble learning methods which is actively used nowadays. Among the many techniques of Ensemble learning, the final model is constructed by Stacking Ensemble techniques, consisting of the existing Ensemble models (Decision Tree, Random Forest, Gradient Boost, Square Loss Gradient Boost, XG Boost), and the accuracy is maximized by selecting three candidates among all models. Finally, the results of this study are verified by the predicted total work time for one ship among the same series.

Remaining Useful Life Estimation based on Noise Injection and a Kalman Filter Ensemble of modified Bagging Predictors

  • Hung-Cuong Trinh;Van-Huy Pham;Anh H. Vo
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.12
    • /
    • pp.3242-3265
    • /
    • 2023
  • Ensuring reliability of a machinery system involve the prediction of remaining useful life (RUL). In most RUL prediction approaches, noise is always considered for removal. Nevertheless, noise could be properly utilized to enhance the prediction capabilities. In this paper, we proposed a novel RUL prediction approach based on noise injection and a Kalman filter ensemble of modified bagging predictors. Firstly, we proposed a new method to insert Gaussian noises into both observation and feature spaces of an original training dataset, named GN-DAFC. Secondly, we developed a modified bagging method based on Kalman filter averaging, named KBAG. Then, we developed a new ensemble method which is a Kalman filter ensemble of KBAGs, named DKBAG. Finally, we proposed a novel RUL prediction approach GN-DAFC-DKBAG in which the optimal noise-injected training dataset was determined by a GN-DAFC-based searching strategy and then inputted to a DKBAG model. Our approach is validated on the NASA C-MAPSS dataset of aero-engines. Experimental results show that our approach achieves significantly better performance than a traditional Kalman filter ensemble of single learning models (KESLM) and the original DKBAG approaches. We also found that the optimal noise-injected data could improve the prediction performance of both KESLM and DKBAG. We further compare our approach with two advanced ensemble approaches, and the results indicate that the former also has better performance than the latters. Thus, our approach of combining optimal noise injection and DKBAG provides an effective solution for RUL estimation of machinery systems.