• Title/Summary/Keyword: Model Ensemble

Search Result 642, Processing Time 0.029 seconds

A Study on the Insider Behavior Analysis Framework for Detecting Information Leakage Using Network Traffic Collection and Restoration (네트워크 트래픽 수집 및 복원을 통한 내부자 행위 분석 프레임워크 연구)

  • Kauh, Janghyuk;Lee, Dongho
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.13 no.4
    • /
    • pp.125-139
    • /
    • 2017
  • In this paper, we developed a framework to detect and predict insider information leakage by collecting and restoring network traffic. For automated behavior analysis, many meta information and behavior information obtained using network traffic collection are used as machine learning features. By these features, we created and learned behavior model, network model and protocol-specific models. In addition, the ensemble model was developed by digitizing and summing the results of various models. We developed a function to present information leakage candidates and view meta information and behavior information from various perspectives using the visual analysis. This supports to rule-based threat detection and machine learning based threat detection. In the future, we plan to make an ensemble model that applies a regression model to the results of the models, and plan to develop a model with deep learning technology.

Stochastic Simulation Model for non-stationary time series using Wavelet AutoRegressive Model

  • Moon, Young-Il;Kwon, Hyun-Han
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2007.05a
    • /
    • pp.1437-1440
    • /
    • 2007
  • Many hydroclimatic time series are marked by interannual and longer quasi-period features that are associated with narrow band oscillatory climate modes. A time series modeling approach that directly considers such structures is developed and presented. The essence of the approach is to first develop a wavelet decomposition of the time series that retains only the statistically significant wavelet components, and to then model each such component and the residual time series as univariate autoregressive processes. The efficacy of this approach is demonstrated through the simulation of observed and paleo reconstructions of climate indices related to ENSO and AMO, tree ring and rainfall time series. Long ensemble simulations that preserve the spectral attributes of the time series in each ensemble member can be generated. The usual low order statistics are preserved by the proposed model, and its long memory performance is superior to the direction application of an autoregressive model.

  • PDF

Impact of Ensemble Member Size on Confidence-based Selection in Bankruptcy Prediction (부도예측을 위한 확신 기반의 선택 접근법에서 앙상블 멤버 사이즈의 영향에 관한 연구)

  • Kim, Na-Ra;Shin, Kyung-Shik;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.2
    • /
    • pp.55-71
    • /
    • 2013
  • The prediction model is the main factor affecting the performance of a knowledge-based system for bankruptcy prediction. Earlier studies on prediction modeling have focused on the building of a single best model using statistical and artificial intelligence techniques. However, since the mid-1980s, integration of multiple techniques (hybrid techniques) and, by extension, combinations of the outputs of several models (ensemble techniques) have, according to the experimental results, generally outperformed individual models. An ensemble is a technique that constructs a set of multiple models, combines their outputs, and produces one final prediction. The way in which the outputs of ensemble members are combined is one of the important issues affecting prediction accuracy. A variety of combination schemes have been proposed in order to improve prediction performance in ensembles. Each combination scheme has advantages and limitations, and can be influenced by domain and circumstance. Accordingly, decisions on the most appropriate combination scheme in a given domain and contingency are very difficult. This paper proposes a confidence-based selection approach as part of an ensemble bankruptcy-prediction scheme that can measure unified confidence, even if ensemble members produce different types of continuous-valued outputs. The present experimental results show that when varying the number of models to combine, according to the creation type of ensemble members, the proposed combination method offers the best performance in the ensemble having the largest number of models, even when compared with the methods most often employed in bankruptcy prediction.

Estimating Farmland Prices Using Distance Metrics and an Ensemble Technique (거리척도와 앙상블 기법을 활용한 지가 추정)

  • Lee, Chang-Ro;Park, Key-Ho
    • Journal of Cadastre & Land InformatiX
    • /
    • v.46 no.2
    • /
    • pp.43-55
    • /
    • 2016
  • This study estimated land prices using instance-based learning. A k-nearest neighbor method was utilized among various instance-based learning methods, and the 10 distance metrics including Euclidean distance were calculated in k-nearest neighbor estimation. One distance metric prediction which shows the best predictive performance would be normally chosen as final estimate out of 10 distance metric predictions. In contrast to this practice, an ensemble technique which combines multiple predictions to obtain better performance was applied in this study. We applied the gradient boosting algorithm, a sort of residual-fitting model to our data in ensemble combining. Sales price data of farm lands in Haenam-gun, Jeolla Province were used to demonstrate advantages of instance-based learning as well as an ensemble technique. The result showed that the ensemble prediction was more accurate than previous 10 distance metric predictions.

A Feature Selection-based Ensemble Method for Arrhythmia Classification

  • Namsrai, Erdenetuya;Munkhdalai, Tsendsuren;Li, Meijing;Shin, Jung-Hoon;Namsrai, Oyun-Erdene;Ryu, Keun Ho
    • Journal of Information Processing Systems
    • /
    • v.9 no.1
    • /
    • pp.31-40
    • /
    • 2013
  • In this paper, a novel method is proposed to build an ensemble of classifiers by using a feature selection schema. The feature selection schema identifies the best feature sets that affect the arrhythmia classification. Firstly, a number of feature subsets are extracted by applying the feature selection schema to the original dataset. Then classification models are built by using the each feature subset. Finally, we combine the classification models by adopting a voting approach to form a classification ensemble. The voting approach in our method involves both classification error rate and feature selection rate to calculate the score of the each classifier in the ensemble. In our method, the feature selection rate depends on the extracting order of the feature subsets. In the experiment, we applied our method to arrhythmia dataset and generated three top disjointed feature sets. We then built three classifiers based on the top-three feature subsets and formed the classifier ensemble by using the voting approach. Our method can improve the classification accuracy in high dimensional dataset. The performance of each classifier and the performance of their ensemble were higher than the performance of the classifier that was based on whole feature space of the dataset. The classification performance was improved and a more stable classification model could be constructed with the proposed approach.

A Comparison of Ensemble Methods Combining Resampling Techniques for Class Imbalanced Data (데이터 전처리와 앙상블 기법을 통한 불균형 데이터의 분류모형 비교 연구)

  • Leea, Hee-Jae;Lee, Sungim
    • The Korean Journal of Applied Statistics
    • /
    • v.27 no.3
    • /
    • pp.357-371
    • /
    • 2014
  • There are many studies related to imbalanced data in which the class distribution is highly skewed. To address the problem of imbalanced data, previous studies deal with resampling techniques which correct the skewness of the class distribution in each sampled subset by using under-sampling, over-sampling or hybrid-sampling such as SMOTE. Ensemble methods have also alleviated the problem of class imbalanced data. In this paper, we compare around a dozen algorithms that combine the ensemble methods and resampling techniques based on simulated data sets generated by the Backbone model, which can handle the imbalance rate. The results on various real imbalanced data sets are also presented to compare the effectiveness of algorithms. As a result, we highly recommend the resampling technique combining ensemble methods for imbalanced data in which the proportion of the minority class is less than 10%. We also find that each ensemble method has a well-matched sampling technique. The algorithms which combine bagging or random forest ensembles with random undersampling tend to perform well; however, the boosting ensemble appears to perform better with over-sampling. All ensemble methods combined with SMOTE outperform in most situations.

A Study on the Work-time Estimation for Block Erections Using Stacking Ensemble Learning (Stacking Ensemble Learning을 활용한 블록 탑재 시수 예측)

  • Kwon, Hyukcheon;Ruy, Wonsun
    • Journal of the Society of Naval Architects of Korea
    • /
    • v.56 no.6
    • /
    • pp.488-496
    • /
    • 2019
  • The estimation of block erection work time at a dock is one of the important factors when establishing or managing the total shipbuilding schedule. In order to predict the work time, it is a natural approach that the existing block erection data would be used to solve the problem. Generally the work time per unit is the product of coefficient value, quantity, and product value. Previously, the work time per unit is determined statistically by unit load data. However, we estimate the work time per unit through work time coefficient value from series ships using machine learning. In machine learning, the outcome depends mainly on how the training data is organized. Therefore, in this study, we use 'Feature Engineering' to determine which one should be used as features, and to check their influence on the result. In order to get the coefficient value of each block, we try to solve this problem through the Ensemble learning methods which is actively used nowadays. Among the many techniques of Ensemble learning, the final model is constructed by Stacking Ensemble techniques, consisting of the existing Ensemble models (Decision Tree, Random Forest, Gradient Boost, Square Loss Gradient Boost, XG Boost), and the accuracy is maximized by selecting three candidates among all models. Finally, the results of this study are verified by the predicted total work time for one ship among the same series.

Remaining Useful Life Estimation based on Noise Injection and a Kalman Filter Ensemble of modified Bagging Predictors

  • Hung-Cuong Trinh;Van-Huy Pham;Anh H. Vo
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.12
    • /
    • pp.3242-3265
    • /
    • 2023
  • Ensuring reliability of a machinery system involve the prediction of remaining useful life (RUL). In most RUL prediction approaches, noise is always considered for removal. Nevertheless, noise could be properly utilized to enhance the prediction capabilities. In this paper, we proposed a novel RUL prediction approach based on noise injection and a Kalman filter ensemble of modified bagging predictors. Firstly, we proposed a new method to insert Gaussian noises into both observation and feature spaces of an original training dataset, named GN-DAFC. Secondly, we developed a modified bagging method based on Kalman filter averaging, named KBAG. Then, we developed a new ensemble method which is a Kalman filter ensemble of KBAGs, named DKBAG. Finally, we proposed a novel RUL prediction approach GN-DAFC-DKBAG in which the optimal noise-injected training dataset was determined by a GN-DAFC-based searching strategy and then inputted to a DKBAG model. Our approach is validated on the NASA C-MAPSS dataset of aero-engines. Experimental results show that our approach achieves significantly better performance than a traditional Kalman filter ensemble of single learning models (KESLM) and the original DKBAG approaches. We also found that the optimal noise-injected data could improve the prediction performance of both KESLM and DKBAG. We further compare our approach with two advanced ensemble approaches, and the results indicate that the former also has better performance than the latters. Thus, our approach of combining optimal noise injection and DKBAG provides an effective solution for RUL estimation of machinery systems.

Performance Assessment of Weekly Ensemble Prediction Data at Seasonal Forecast System with High Resolution (고해상도 장기예측시스템의 주별 앙상블 예측자료 성능 평가)

  • Ham, Hyunjun;Won, Dukjin;Lee, Yei-sook
    • Atmosphere
    • /
    • v.27 no.3
    • /
    • pp.261-276
    • /
    • 2017
  • The main objectives of this study are to introduce Global Seasonal forecasting system version5 (GloSea5) of KMA and to evaluate the performance of ensemble prediction of system. KMA has performed an operational seasonal forecast system which is a joint system between KMA and UK Met office since 2014. GloSea5 is a fully coupled global climate model which consists of atmosphere (UM), ocean (NEMO), land surface (JULES) and sea ice (CICE) components through the coupler OASIS. The model resolution, used in GloSea5, is N216L85 (~60 km in mid-latitudes) in the atmosphere and ORCA0.25L75 ($0.25^{\circ}$ on a tri-polar grid) in the ocean. In this research, we evaluate the performance of this system using by RMSE, Correlation and MSSS for ensemble mean values. The forecast (FCST) and hindcast (HCST) are separately verified, and the operational data of GloSea5 are used from 2014 to 2015. The performance skills are similar to the past study. For example, the RMSE of h500 is increased from 22.30 gpm of 1 week forecast to 53.82 gpm of 7 week forecast but there is a similar error about 50~53 gpm after 3 week forecast. The Nino Index of SST shows a great correlation (higher than 0.9) up to 7 week forecast in Nino 3.4 area. It can be concluded that GloSea5 has a great performance for seasonal prediction.

Application of Carbon Tracking System based on Ensemble Kalman Filter on the Diagnosis of Carbon Cycle in Asia (앙상블 칼만 필터 기반 탄소추적시스템의 아시아 지역 탄소 순환 진단에의 적용)

  • Kim, JinWoong;Kim, Hyun Mee;Cho, Chun-Ho
    • Atmosphere
    • /
    • v.22 no.4
    • /
    • pp.415-427
    • /
    • 2012
  • $CO_2$ is the most important trace gas related to climate change. Therefore, understanding surface carbon sources and sinks is important when seeking to estimate the impact of $CO_2$ on the environment and climate. CarbonTracker, developed by NOAA, is an inverse modeling system that estimates surface carbon fluxes using an ensemble Kalman filter with atmospheric $CO_2$ measurements as a constraint. In this study, to investigate the capability of CarbonTracker as an analysis tool for estimating surface carbon fluxes in Asia, an experiment with a nesting domain centered in Asia is performed. In general, the results show that setting a nesting domain centered in Asia region enables detailed estimations of surface carbon fluxes in Asia. From a rank histogram, the prior ensemble spread verified at observational sites located in Asia is well represented with a relatively flat rank histogram. The posterior flux in the Eurasian Boreal and Eurasian Temperate regions is well analyzed with proper seasonal cycles and amplitudes. On the other hand, in tropical regions of Asia, the posterior flux does not differ greatly from the prior flux due to fewer $CO_2$ observations. The root mean square error of the model $CO_2$ calculated by the posterior flux is less than the model $CO_2$ calculated by the prior flux, implying that CarbonTracker based on the ensemble Kalman filter works appropriately for the Asia region.