• Title/Summary/Keyword: ensemble methods

Search Result 282, Processing Time 0.024 seconds

Incorporating BERT-based NLP and Transformer for An Ensemble Model and its Application to Personal Credit Prediction

  • Sophot Ky;Ju-Hong Lee;Kwangtek Na
    • Smart Media Journal
    • /
    • v.13 no.4
    • /
    • pp.9-15
    • /
    • 2024
  • Tree-based algorithms have been the dominant methods used build a prediction model for tabular data. This also includes personal credit data. However, they are limited to compatibility with categorical and numerical data only, and also do not capture information of the relationship between other features. In this work, we proposed an ensemble model using the Transformer architecture that includes text features and harness the self-attention mechanism to tackle the feature relationships limitation. We describe a text formatter module, that converts the original tabular data into sentence data that is fed into FinBERT along with other text features. Furthermore, we employed FT-Transformer that train with the original tabular data. We evaluate this multi-modal approach with two popular tree-based algorithms known as, Random Forest and Extreme Gradient Boosting, XGBoost and TabTransformer. Our proposed method shows superior Default Recall, F1 score and AUC results across two public data sets. Our results are significant for financial institutions to reduce the risk of financial loss regarding defaulters.

Light-weight Gender Classification and Age Estimation based on Ensemble Multi-tasking Deep Learning (앙상블 멀티태스킹 딥러닝 기반 경량 성별 분류 및 나이별 추정)

  • Huy Tran, Quoc Bao;Park, JongHyeon;Chung, SunTae
    • Journal of Korea Multimedia Society
    • /
    • v.25 no.1
    • /
    • pp.39-51
    • /
    • 2022
  • Image-based gender classification and age estimation of human are classic problems in computer vision. Most of researches in this field focus just only one task of either gender classification or age estimation and most of the reported methods for each task focus on accuracy performance and are not computationally light. Thus, running both tasks together simultaneously on low cost mobile or embedded systems with limited cpu processing speed and memory capacity are practically prohibited. In this paper, we propose a novel light-weight gender classification and age estimation method based on ensemble multitasking deep learning with light-weight processing neural network architecture, which processes both gender classification and age estimation simultaneously and in real-time even for embedded systems. Through experiments over various well-known datasets, it is shown that the proposed method performs comparably to the state-of-the-art gender classification and/or age estimation methods with respect to accuracy and runs fast enough (average 14fps) on a Jestson Nano embedded board.

Securing SCADA Systems: A Comprehensive Machine Learning Approach for Detecting Reconnaissance Attacks

  • Ezaz Aldahasi;Talal Alkharobi
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.12
    • /
    • pp.1-12
    • /
    • 2023
  • Ensuring the security of Supervisory Control and Data Acquisition (SCADA) and Industrial Control Systems (ICS) is paramount to safeguarding the reliability and safety of critical infrastructure. This paper addresses the significant threat posed by reconnaissance attacks on SCADA/ICS networks and presents an innovative methodology for enhancing their protection. The proposed approach strategically employs imbalance dataset handling techniques, ensemble methods, and feature engineering to enhance the resilience of SCADA/ICS systems. Experimentation and analysis demonstrate the compelling efficacy of our strategy, as evidenced by excellent model performance characterized by good precision, recall, and a commendably low false negative (FN). The practical utility of our approach is underscored through the evaluation of real-world SCADA/ICS datasets, showcasing superior performance compared to existing methods in a comparative analysis. Moreover, the integration of feature augmentation is revealed to significantly enhance detection capabilities. This research contributes to advancing the security posture of SCADA/ICS environments, addressing a critical imperative in the face of evolving cyber threats.

In vivo Evaluation of Flow Estimation Methods for 3D Color Doppler Imaging

  • Yoo, Yang-Mo
    • Journal of Biomedical Engineering Research
    • /
    • v.31 no.3
    • /
    • pp.177-186
    • /
    • 2010
  • In 3D ultrasound color Doppler imaging (CDI), 8-16 pulse transmissions (ensembles) per each scanline are used for effective clutter rejection and flow estimation, but it yields a low volume acquisition rate. In this paper, we have evaluated three flow estimation methods: autoregression (AR), eigendecomposition (ED), and autocorrelation combined with adaptive clutter rejection (AC-ACR) for a small ensemble size (E=4). The performance of AR, ED and AC-ACR methods was compared using 2D and 3D in vivo data acquired under different clutter conditions (common carotid artery, kidney and liver). To evaluate the effectiveness of three methods, receiver operating characteristic (ROC) curves were generated. For 2D kidney in vivo data, the AC-ACR method outperforms the AR and ED methods in terms of the area under the ROC curve (AUC) (0.852 vs. 0.793 and 0.813, respectively). Similarly, the AC-ACR method shows higher AUC values for 2D liver in vivo data compared to the AR and ED methods (0.855 vs. 0.807 and 0.823, respectively). For the common carotid artery data, the AR provides higher AUC values, but it suffers from biased estimates. For 3D in vivo data acquired from a kidney transplant patient, the AC-ACR with E=4 provides an AUC value of 0.799. These in vivo experiment results indicate that the AC-ACR method can provide more robust flow estimates compared to the AR and ED methods with a small ensemble size.

Impact of Ensemble Member Size on Confidence-based Selection in Bankruptcy Prediction (부도예측을 위한 확신 기반의 선택 접근법에서 앙상블 멤버 사이즈의 영향에 관한 연구)

  • Kim, Na-Ra;Shin, Kyung-Shik;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.2
    • /
    • pp.55-71
    • /
    • 2013
  • The prediction model is the main factor affecting the performance of a knowledge-based system for bankruptcy prediction. Earlier studies on prediction modeling have focused on the building of a single best model using statistical and artificial intelligence techniques. However, since the mid-1980s, integration of multiple techniques (hybrid techniques) and, by extension, combinations of the outputs of several models (ensemble techniques) have, according to the experimental results, generally outperformed individual models. An ensemble is a technique that constructs a set of multiple models, combines their outputs, and produces one final prediction. The way in which the outputs of ensemble members are combined is one of the important issues affecting prediction accuracy. A variety of combination schemes have been proposed in order to improve prediction performance in ensembles. Each combination scheme has advantages and limitations, and can be influenced by domain and circumstance. Accordingly, decisions on the most appropriate combination scheme in a given domain and contingency are very difficult. This paper proposes a confidence-based selection approach as part of an ensemble bankruptcy-prediction scheme that can measure unified confidence, even if ensemble members produce different types of continuous-valued outputs. The present experimental results show that when varying the number of models to combine, according to the creation type of ensemble members, the proposed combination method offers the best performance in the ensemble having the largest number of models, even when compared with the methods most often employed in bankruptcy prediction.

Effective Korean sentiment classification method using word2vec and ensemble classifier (Word2vec과 앙상블 분류기를 사용한 효율적 한국어 감성 분류 방안)

  • Park, Sung Soo;Lee, Kun Chang
    • Journal of Digital Contents Society
    • /
    • v.19 no.1
    • /
    • pp.133-140
    • /
    • 2018
  • Accurate sentiment classification is an important research topic in sentiment analysis. This study suggests an efficient classification method of Korean sentiment using word2vec and ensemble methods which have been recently studied variously. For the 200,000 Korean movie review texts, we generate a POS-based BOW feature and a feature using word2vec, and integrated features of two feature representation. We used a single classifier of Logistic Regression, Decision Tree, Naive Bayes, and Support Vector Machine and an ensemble classifier of Adaptive Boost, Bagging, Gradient Boosting, and Random Forest for sentiment classification. As a result of this study, the integrated feature representation composed of BOW feature including adjective and adverb and word2vec feature showed the highest sentiment classification accuracy. Empirical results show that SVM, a single classifier, has the highest performance but ensemble classifiers show similar or slightly lower performance than the single classifier.

Multi-scale Attention and Deep Ensemble-Based Animal Skin Lesions Classification (다중 스케일 어텐션과 심층 앙상블 기반 동물 피부 병변 분류 기법)

  • Kwak, Min Ho;Kim, Kyeong Tae;Choi, Jae Young
    • Journal of Korea Multimedia Society
    • /
    • v.25 no.8
    • /
    • pp.1212-1223
    • /
    • 2022
  • Skin lesions are common diseases that range from skin rashes to skin cancer, which can lead to death. Note that early diagnosis of skin diseases can be important because early diagnosis of skin diseases considerably can reduce the course of treatment and the harmful effect of the disease. Recently, the development of computer-aided diagnosis (CAD) systems based on artificial intelligence has been actively made for the early diagnosis of skin diseases. In a typical CAD system, the accurate classification of skin lesion types is of great importance for improving the diagnosis performance. Motivated by this, we propose a novel deep ensemble classification with multi-scale attention networks. The proposed deep ensemble networks are jointly trained using a single loss function in an end-to-end manner. In addition, the proposed deep ensemble network is equipped with a multi-scale attention mechanism and segmentation information of the original skin input image, which improves the classification performance. To demonstrate our method, the publicly available human skin disease dataset (HAM 10000) and the private animal skin lesion dataset were used for the evaluation. Experiment results showed that the proposed methods can achieve 97.8% and 81% accuracy on each HAM10000 and animal skin lesion dataset. This research work would be useful for developing a more reliable CAD system which helps doctors early diagnose skin diseases.

Boosting neural networks with an application to bankruptcy prediction (부스팅 인공신경망을 활용한 부실예측모형의 성과개선)

  • Kim, Myoung-Jong;Kang, Dae-Ki
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2009.05a
    • /
    • pp.872-875
    • /
    • 2009
  • In a bankruptcy prediction model, the accuracy is one of crucial performance measures due to its significant economic impacts. Ensemble is one of widely used methods for improving the performance of classification and prediction models. Two popular ensemble methods, Bagging and Boosting, have been applied with great success to various machine learning problems using mostly decision trees as base classifiers. In this paper, we analyze the performance of boosted neural networks for improving the performance of traditional neural networks on bankruptcy prediction tasks. Experimental results on Korean firms indicated that the boosted neural networks showed the improved performance over traditional neural networks.

  • PDF

Comparison between Uncertainties of Cultivar Parameter Estimates Obtained Using Error Calculation Methods for Forage Rice Cultivars (오차 계산 방식에 따른 사료용 벼 품종의 품종모수 추정치 불확도 비교)

  • Young Sang Joh;Shinwoo Hyun;Kwang Soo Kim
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.25 no.3
    • /
    • pp.129-141
    • /
    • 2023
  • Crop models have been used to predict yield under diverse environmental and cultivation conditions, which can be used to support decisions on the management of forage crop. Cultivar parameters are one of required inputs to crop models in order to represent genetic properties for a given forage cultivar. The objectives of this study were to compare calibration and ensemble approaches in order to minimize the uncertainty of crop yield estimates using the SIMPLE crop model. Cultivar parameters were calibrated using Log-likelihood (LL) and Generic Composite Similarity Measure (GCSM) as an objective function for Metropolis-Hastings (MH) algorithm. In total, 20 sets of cultivar parameters were generated for each method. Two types of ensemble approach. First type of ensemble approach was the average of model outputs (Eem), using individual parameters. The second ensemble approach was model output (Epm) of cultivar parameter obtained by averaging given 20 sets of parameters. Comparison was done for each cultivar and for each error calculation methods. 'Jowoo' and 'Yeongwoo', which are forage rice cultivars used in Korea, were subject to the parameter calibration. Yield data were obtained from experiment fields at Suwon, Jeonju, Naju and I ksan. Data for 2013, 2014 and 2016 were used for parameter calibration. For validation, yield data reported from 2016 to 2018 at Suwon was used. Initial calibration indicated that genetic coefficients obtained by LL were distributed in a narrower range than coefficients obtained by GCSM. A two-sample t-test was performed to compare between different methods of ensemble approaches and no significant difference was found between them. Uncertainty of GCSM can be neutralized by adjusting the acceptance probability. The other ensemble method (Epm) indicates that the uncertainty can be reduced with less computation using ensemble approach.