• Title/Summary/Keyword: Ensemble-based algorithm

Search Result 140, Processing Time 0.03 seconds

Investigating Dynamic Mutation Process of Issues Using Unstructured Text Analysis (부도예측을 위한 KNN 앙상블 모형의 동시 최적화)

  • Min, Sung-Hwan
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.139-157
    • /
    • 2016
  • Bankruptcy involves considerable costs, so it can have significant effects on a country's economy. Thus, bankruptcy prediction is an important issue. Over the past several decades, many researchers have addressed topics associated with bankruptcy prediction. Early research on bankruptcy prediction employed conventional statistical methods such as univariate analysis, discriminant analysis, multiple regression, and logistic regression. Later on, many studies began utilizing artificial intelligence techniques such as inductive learning, neural networks, and case-based reasoning. Currently, ensemble models are being utilized to enhance the accuracy of bankruptcy prediction. Ensemble classification involves combining multiple classifiers to obtain more accurate predictions than those obtained using individual models. Ensemble learning techniques are known to be very useful for improving the generalization ability of the classifier. Base classifiers in the ensemble must be as accurate and diverse as possible in order to enhance the generalization ability of an ensemble model. Commonly used methods for constructing ensemble classifiers include bagging, boosting, and random subspace. The random subspace method selects a random feature subset for each classifier from the original feature space to diversify the base classifiers of an ensemble. Each ensemble member is trained by a randomly chosen feature subspace from the original feature set, and predictions from each ensemble member are combined by an aggregation method. The k-nearest neighbors (KNN) classifier is robust with respect to variations in the dataset but is very sensitive to changes in the feature space. For this reason, KNN is a good classifier for the random subspace method. The KNN random subspace ensemble model has been shown to be very effective for improving an individual KNN model. The k parameter of KNN base classifiers and selected feature subsets for base classifiers play an important role in determining the performance of the KNN ensemble model. However, few studies have focused on optimizing the k parameter and feature subsets of base classifiers in the ensemble. This study proposed a new ensemble method that improves upon the performance KNN ensemble model by optimizing both k parameters and feature subsets of base classifiers. A genetic algorithm was used to optimize the KNN ensemble model and improve the prediction accuracy of the ensemble model. The proposed model was applied to a bankruptcy prediction problem by using a real dataset from Korean companies. The research data included 1800 externally non-audited firms that filed for bankruptcy (900 cases) or non-bankruptcy (900 cases). Initially, the dataset consisted of 134 financial ratios. Prior to the experiments, 75 financial ratios were selected based on an independent sample t-test of each financial ratio as an input variable and bankruptcy or non-bankruptcy as an output variable. Of these, 24 financial ratios were selected by using a logistic regression backward feature selection method. The complete dataset was separated into two parts: training and validation. The training dataset was further divided into two portions: one for the training model and the other to avoid overfitting. The prediction accuracy against this dataset was used to determine the fitness value in order to avoid overfitting. The validation dataset was used to evaluate the effectiveness of the final model. A 10-fold cross-validation was implemented to compare the performances of the proposed model and other models. To evaluate the effectiveness of the proposed model, the classification accuracy of the proposed model was compared with that of other models. The Q-statistic values and average classification accuracies of base classifiers were investigated. The experimental results showed that the proposed model outperformed other models, such as the single model and random subspace ensemble model.

Tor Network Website Fingerprinting Using Statistical-Based Feature and Ensemble Learning of Traffic Data (트래픽 데이터의 통계적 기반 특징과 앙상블 학습을 이용한 토르 네트워크 웹사이트 핑거프린팅)

  • Kim, Junho;Kim, Wongyum;Hwang, Doosung
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.9 no.6
    • /
    • pp.187-194
    • /
    • 2020
  • This paper proposes a website fingerprinting method using ensemble learning over a Tor network that guarantees client anonymity and personal information. We construct a training problem for website fingerprinting from the traffic packets collected in the Tor network, and compare the performance of the website fingerprinting system using tree-based ensemble models. A training feature vector is prepared from the general information, burst, cell sequence length, and cell order that are extracted from the traffic sequence, and the features of each website are represented with a fixed length. For experimental evaluation, we define four learning problems (Wang14, BW, CWT, CWH) according to the use of website fingerprinting, and compare the performance with the support vector machine model using CUMUL feature vectors. In the experimental evaluation, the proposed statistical-based training feature representation is superior to the CUMUL feature representation except for the BW case.

A Comparative Study of Phishing Websites Classification Based on Classifier Ensemble

  • Tama, Bayu Adhi;Rhee, Kyung-Hyune
    • Journal of Korea Multimedia Society
    • /
    • v.21 no.5
    • /
    • pp.617-625
    • /
    • 2018
  • Phishing website has become a crucial concern in cyber security applications. It is performed by fraudulently deceiving users with the aim of obtaining their sensitive information such as bank account information, credit card, username, and password. The threat has led to huge losses to online retailers, e-business platform, financial institutions, and to name but a few. One way to build anti-phishing detection mechanism is to construct classification algorithm based on machine learning techniques. The objective of this paper is to compare different classifier ensemble approaches, i.e. random forest, rotation forest, gradient boosted machine, and extreme gradient boosting against single classifiers, i.e. decision tree, classification and regression tree, and credal decision tree in the case of website phishing. Area under ROC curve (AUC) is employed as a performance metric, whilst statistical tests are used as baseline indicator of significance evaluation among classifiers. The paper contributes the existing literature on making a benchmark of classifier ensembles for web phishing detection.

A Jittering-based Neural Network Ensemble Approach for Regionalized Low-flow Frequency Analysis

  • Ahn, Kuk-Hyun
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2020.06a
    • /
    • pp.382-382
    • /
    • 2020
  • 과거 많은 연구에서 다수의 모형의 결과를 이용한 앙상블 방법론은 인공지능 모형 (artificial neural network)의 예측 능력에 향상을 갖고 온다 논하였다. 본 연구에서는 미계측유역의 저수량(low flow)의 예측을 위하여 Jittering을 기반으로 한 인공지능 모형을 제시하고자 한다. 기본적인 방법론은 설명변수들에게 백색 잡음(white noise)를 삽입하여 훈련되는 자료를 증가시키는 것이다. Jittering을 기반으로 한 인공지능 모형에 대한 효과를 검증하기 위하여 본 연구에서는 Multi-output neural network model을 기반으로 모형을 구축하였다. 다음으로 Jittering을 기반으로 한 앙상블 모형을 variable importance measuring algorithm과 결합시켜서 유역특성치와 예측되는 저수량의 특성치들의 관계를 추론하였다. 본 연구에서 사용되는 방법론들의 효용성을 평가하기 위해서 미동북부에 위치하고 있는 총 207개의 유역을 사용하였다. 결과적으로 본 연구에서 제시한 Jittering을 기반으로 한 인공지능 앙상블 모형은 단일예측모형 (single modeling approach)을 정확도 측면에서 우수한 것으로 확인되었다. 또한, 적은 숫자의 앙상블 모형에서도 그 정확성이 단일예측모형보다 우수한 것을 확인하였다. 마지막으로 본 연구에서는 유역특성치들의 효과가 살펴보고자 하는 저수량의 특성치들에 따라서 일관적으로 영향을 미치거나 그 중요도가 변화하는 것을 확인하였다.

  • PDF

A hybrid algorithm based on EEMD and EMD for multi-mode signal processing

  • Lin, Jeng-Wen
    • Structural Engineering and Mechanics
    • /
    • v.39 no.6
    • /
    • pp.813-831
    • /
    • 2011
  • This paper presents an efficient version of Hilbert-Huang transform for nonlinear non-stationary systems analyses. An ensemble empirical mode decomposition (EEMD) is introduced to alleviate the problem of mode mixing between intrinsic mode functions (IMFs) decomposed by EMD. Yet the problem has not been fully resolved when a signal of a similar scale resides in different IMF components. Instead of using a trial and error method to select the "best" outcome generated by EEMD, a hybrid algorithm based on EEMD and EMD is proposed for multi-mode signal processing. The developed approach comprises the steps from a bandpass filter design for regrouping modes of the IMFs obtained from EEMD, to the mode extraction using EMD, and to the assessment of each mode in the marginal spectrum. A simulated two-mode signal is tested to demonstrate the efficiency and robustness of the approach, showing average relative errors all equal to 1.46% for various noise levels added to the signal. The developed approach is also applied to a real bridge structure, showing more reliable results than the pure EMD. Discussions on the mode determination are offered to explain the connection between modegrouping form on the one hand, and mode-grouping performance on the other.

Robust Digital Watermarking for High-definition Video using Steerable Pyramid Transform, Two Dimensional Fast Fourier Transform and Ensemble Position-based Error Correcting

  • Jin, Xun;Kim, JongWeon
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.7
    • /
    • pp.3438-3454
    • /
    • 2018
  • In this paper, we propose a robust blind watermarking scheme for high-definition video. In the embedding process, luminance component of each frame is transformed by 2-dimensional fast Fourier transform (2D FFT). A secret key is used to generate a matrix of random numbers for the security of watermark information. The matrix is transformed by inverse steerable pyramid transform (SPT). We embed the watermark into the low and mid-frequency of 2D FFT coefficients with the transformed matrix. In the extraction process, the 2D FFT coefficients of each frame and the transformed matrix are transformed by SPT respectively, to produce two oriented sub-bands. We extract the watermark from each frame by cross-correlating two oriented sub-bands. If a video is degraded by some attacks, the watermarks of frames contain some errors. Thus, we use an ensemble position-based error correcting algorithm to estimate the errors and correct them. The experimental results show that the proposed watermarking algorithm is imperceptible and moreover is robust against various attacks. After embedding 64 bits of watermark into each frame, the average peak signal-to-noise ratio between original frames and embedded frames is 45.7 dB.

Classification Algorithm for Liver Lesions of Ultrasound Images using Ensemble Deep Learning (앙상블 딥러닝을 이용한 초음파 영상의 간병변증 분류 알고리즘)

  • Cho, Young-Bok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.20 no.4
    • /
    • pp.101-106
    • /
    • 2020
  • In the current medical field, ultrasound diagnosis can be said to be the same as a stethoscope in the past. However, due to the nature of ultrasound, it has the disadvantage that the prediction of results is uncertain depending on the skill level of the examiner. Therefore, this paper aims to improve the accuracy of liver lesion detection during ultrasound examination based on deep learning technology to solve this problem. In the proposed paper, we compared the accuracy of lesion classification using a CNN model and an ensemble model. As a result of the experiment, it was confirmed that the classification accuracy in the CNN model averaged 82.33% and the ensemble model averaged 89.9%, about 7% higher. Also, it was confirmed that the ensemble model was 0.97 in the average ROC curve, which is about 0.4 higher than the CNN model.

Decentralized Structural Diagnosis and Monitoring System for Ensemble Learning on Dynamic Characteristics (동특성 앙상블 학습 기반 구조물 진단 모니터링 분산처리 시스템)

  • Shin, Yoon-Soo;Min, Kyung-Won
    • Journal of the Computational Structural Engineering Institute of Korea
    • /
    • v.34 no.4
    • /
    • pp.183-189
    • /
    • 2021
  • In recent years, active research has been devoted toward developing a monitoring system using ambient vibration data in order to quantitatively determine the deterioration occurring in a structure over a long period of time. This study developed a low-cost edge computing system that detects the abnormalities in structures by utilizing the dynamic characteristics acquired from the structure over the long term for ensemble learning. The system hardware consists of the Raspberry Pi, an accelerometer, an inclinometer, a GPS RTK module, and a LoRa communication module. The structural abnormality detection afforded by the ensemble learning using dynamic characteristics is verified using a laboratory-scale structure model vibration experiment. A real-time distributed processing algorithm with dynamic feature extraction based on the experiment is installed on the Raspberry Pi. Based on the stable operation of installed systems at the Community Service Center, Pohang-si, Korea, the validity of the developed system was verified on-site.

Development of an Ensemble Prediction Model for Lateral Deformation of Retaining Wall Under Construction (시공 중 흙막이 벽체 수평변위 예측을 위한 앙상블 모델 개발)

  • Seo, Seunghwan;Chung, Moonkyung
    • Journal of the Korean Geotechnical Society
    • /
    • v.39 no.4
    • /
    • pp.5-17
    • /
    • 2023
  • The advancement in large-scale underground excavation in urban areas necessitates monitoring and predicting technologies that can pre-emptively mitigate risk factors at construction sites. Traditionally, two methods predict the deformation of retaining walls induced by excavation: empirical and numerical analysis. Recent progress in artificial intelligence technology has led to the development of a predictive model using machine learning techniques. This study developed a model for predicting the deformation of a retaining wall under construction using a boosting-based algorithm and an ensemble model with outstanding predictive power and efficiency. A database was established using the data from the design-construction-maintenance process of the underground retaining wall project in a manifold manner. Based on these data, a learning model was created, and the performance was evaluated. The boosting and ensemble models demonstrated that wall deformation could be accurately predicted. In addition, it was confirmed that prediction results with the characteristics of the actual construction process can be presented using data collected from ground measurements. The predictive model developed in this study is expected to be used to evaluate and monitor the stability of retaining walls under construction.

Comparing Classification Accuracy of Ensemble and Clustering Algorithms Based on Taguchi Design (다구찌 디자인을 이용한 앙상블 및 군집분석 분류 성능 비교)

  • Shin, Hyung-Won;Sohn, So-Young
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.27 no.1
    • /
    • pp.47-53
    • /
    • 2001
  • In this paper, we compare the classification performances of both ensemble and clustering algorithms (Data Bagging, Variable Selection Bagging, Parameter Combining, Clustering) to logistic regression in consideration of various characteristics of input data. Four factors used to simulate the logistic model are (1) correlation among input variables (2) variance of observation (3) training data size and (4) input-output function. In view of the unknown relationship between input and output function, we use a Taguchi design to improve the practicality of our study results by letting it as a noise factor. Experimental study results indicate the following: When the level of the variance is medium, Bagging & Parameter Combining performs worse than Logistic Regression, Variable Selection Bagging and Clustering. However, classification performances of Logistic Regression, Variable Selection Bagging, Bagging and Clustering are not significantly different when the variance of input data is either small or large. When there is strong correlation in input variables, Variable Selection Bagging outperforms both Logistic Regression and Parameter combining. In general, Parameter Combining algorithm appears to be the worst at our disappointment.

  • PDF