• Title/Summary/Keyword: Ensemble learning technique

Search Result 74, Processing Time 0.023 seconds

A Study on the Development of University Students Dropout Prediction Model Using Ensemble Technique (앙상블 기법을 활용한 대학생 중도탈락 예측 모형 개발)

  • Park, Sangsung
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.17 no.1
    • /
    • pp.109-115
    • /
    • 2021
  • The number of freshmen at universities is decreasing due to the recent decline in the school-age population, and the survival of many universities is threatened. To overcome this situation, universities are seeking ways to use big data within the school to improve the quality of education. A study on the prediction of dropout students is a representative case of using big data in universities. The dropout prediction can prepare a systematic management plan by identifying students who will drop out of school due to reasons such as dropout or expulsion. In the case of actual on-campus data, a large number of missing values are included because it is collected and managed by various departments. For this reason, it is necessary to construct a model by effectively reflecting the missing values. In this study, we propose a university student dropout prediction model based on eXtreme Gradient Boost that can be applied to data with many missing values and shows high performance. In order to examine the practical applicability of the proposed model, an experiment was performed using data from C University in Chungbuk. As a result of the experiment, the prediction performance of the proposed model was found to be excellent. The management strategy of dropout students can be established through the prediction results of the model proposed in this paper.

Proposal of a Learning Model for Mobile App Malicious Code Analysis (모바일 앱 악성코드 분석을 위한 학습모델 제안)

  • Bae, Se-jin;Choi, Young-ryul;Rhee, Jung-soo;Baik, Nam-kyun
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.10a
    • /
    • pp.455-457
    • /
    • 2021
  • App is used on mobile devices such as smartphones and also has malicious code, which can be divided into normal and malicious depending on the presence or absence of hacking codes. Because there are many kind of malware, it is difficult to detect directly, we propose a method to detect malicious app using AI. Most of the existing methods are to detect malicious app by extracting features from malicious app. However, the number of types have increased exponentially, making it impossible to detect malicious code. Therefore, we would like to propose two more methods besides detecting malicious app by extracting features from most existing malicious app. The first method is to learn normal app to extract normal's features, as opposed to the existing method of learning malicious app and find abnormalities (malicious app). The second one is an 'ensemble technique' that combines the existing method with the first proposal. These two methods need to be studied so that they can be used in future mobile environment.

  • PDF

Prediction of compressive strength of sustainable concrete using machine learning tools

  • Lokesh Choudhary;Vaishali Sahu;Archanaa Dongre;Aman Garg
    • Computers and Concrete
    • /
    • v.33 no.2
    • /
    • pp.137-145
    • /
    • 2024
  • The technique of experimentally determining concrete's compressive strength for a given mix design is time-consuming and difficult. The goal of the current work is to propose a best working predictive model based on different machine learning algorithms such as Gradient Boosting Machine (GBM), Stacked Ensemble (SE), Distributed Random Forest (DRF), Extremely Randomized Trees (XRT), Generalized Linear Model (GLM), and Deep Learning (DL) that can forecast the compressive strength of ternary geopolymer concrete mix without carrying out any experimental procedure. A geopolymer mix uses supplementary cementitious materials obtained as industrial by-products instead of cement. The input variables used for assessing the best machine learning algorithm not only include individual ingredient quantities, but molarity of the alkali activator and age of testing as well. Myriad statistical parameters used to measure the effectiveness of the models in forecasting the compressive strength of ternary geopolymer concrete mix, it has been found that GBM performs better than all other algorithms. A sensitivity analysis carried out towards the end of the study suggests that GBM model predicts results close to the experimental conditions with an accuracy between 95.6 % to 98.2 % for testing and training datasets.

Machine learning application to seismic site classification prediction model using Horizontal-to-Vertical Spectral Ratio (HVSR) of strong-ground motions

  • Francis G. Phi;Bumsu Cho;Jungeun Kim;Hyungik Cho;Yun Wook Choo;Dookie Kim;Inhi Kim
    • Geomechanics and Engineering
    • /
    • v.37 no.6
    • /
    • pp.539-554
    • /
    • 2024
  • This study explores development of prediction model for seismic site classification through the integration of machine learning techniques with horizontal-to-vertical spectral ratio (HVSR) methodologies. To improve model accuracy, the research employs outlier detection methods and, synthetic minority over-sampling technique (SMOTE) for data balance, and evaluates using seven machine learning models using seismic data from KiK-net. Notably, light gradient boosting method (LGBM), gradient boosting, and decision tree models exhibit improved performance when coupled with SMOTE, while Multiple linear regression (MLR) and Support vector machine (SVM) models show reduced efficacy. Outlier detection techniques significantly enhance accuracy, particularly for LGBM, gradient boosting, and voting boosting. The ensemble of LGBM with the isolation forest and SMOTE achieves the highest accuracy of 0.91, with LGBM and local outlier factor yielding the highest F1-score of 0.79. Consistently outperforming other models, LGBM proves most efficient for seismic site classification when supported by appropriate preprocessing procedures. These findings show the significance of outlier detection and data balancing for precise seismic soil classification prediction, offering insights and highlighting the potential of machine learning in optimizing site classification accuracy.

Development of Product Recommender System using Collaborative Filtering and Stacking Model (협업필터링과 스태킹 모형을 이용한 상품추천시스템 개발)

  • Park, Sung-Jong;Kim, Young-Min;Ahn, Jae-Joon
    • Journal of Convergence for Information Technology
    • /
    • v.9 no.6
    • /
    • pp.83-90
    • /
    • 2019
  • People constantly strive for better choices. For this reason, recommender system has been developed since the early 1990s. In particular, collaborative filtering technique has shown excellent performance in the field of recommender systems, and research of recommender system using machine learning has been actively conducted. This study constructs recommender system using collaborative filtering and machine learning based on stacking model which is one of ensemble methods. The results of this study confirm that the recommender system with the stacking model is useful in aspects of recommender performance. In the future, the model proposed in this study is expected to help individuals or firms to make better choices.

A Personalized Recommendation System Using Machine Learning for Performing Arts Genre (머신러닝을 이용한 공연문화예술 개인화 장르 추천 시스템)

  • Hyung Su Kim;Yerin Bak;Jeongmin Lee
    • Information Systems Review
    • /
    • v.21 no.4
    • /
    • pp.31-45
    • /
    • 2019
  • Despite the expansion of the market of performing arts and culture, small and medium size theaters are still experiencing difficulties due to poor accessibility of information by consumers. This study proposes a machine learning based genre recommendation system as an alternative to enhance the marketing capability of small and medium sized theaters. We developed five recommendation systems that recommend three genres per customer using customer master DB and transaction history DB of domestic venues. We propose an optimal recommendation system by comparing performances of recommendation system. As a result, the recommendation system based on the ensemble model showed better performance than the single predictive model. This study applied the personalized recommendation technique which was scarce in the field of performing arts and culture, and suggests that it is worthy enough to use it in the field of performing arts and culture.

A Comparative Analysis of Ensemble Learning-Based Classification Models for Explainable Term Deposit Subscription Forecasting (설명 가능한 정기예금 가입 여부 예측을 위한 앙상블 학습 기반 분류 모델들의 비교 분석)

  • Shin, Zian;Moon, Jihoon;Rho, Seungmin
    • The Journal of Society for e-Business Studies
    • /
    • v.26 no.3
    • /
    • pp.97-117
    • /
    • 2021
  • Predicting term deposit subscriptions is one of representative financial marketing in banks, and banks can build a prediction model using various customer information. In order to improve the classification accuracy for term deposit subscriptions, many studies have been conducted based on machine learning techniques. However, even if these models can achieve satisfactory performance, utilizing them is not an easy task in the industry when their decision-making process is not adequately explained. To address this issue, this paper proposes an explainable scheme for term deposit subscription forecasting. For this, we first construct several classification models using decision tree-based ensemble learning methods, which yield excellent performance in tabular data, such as random forest, gradient boosting machine (GBM), extreme gradient boosting (XGB), and light gradient boosting machine (LightGBM). We then analyze their classification performance in depth through 10-fold cross-validation. After that, we provide the rationale for interpreting the influence of customer information and the decision-making process by applying Shapley additive explanation (SHAP), an explainable artificial intelligence technique, to the best classification model. To verify the practicality and validity of our scheme, experiments were conducted with the bank marketing dataset provided by Kaggle; we applied the SHAP to the GBM and LightGBM models, respectively, according to different dataset configurations and then performed their analysis and visualization for explainable term deposit subscriptions.

Performance Characteristics of an Ensemble Machine Learning Model for Turbidity Prediction With Improved Data Imbalance (데이터 불균형 개선에 따른 탁도 예측 앙상블 머신러닝 모형의 성능 특성)

  • HyunSeok Yang;Jungsu Park
    • Ecology and Resilient Infrastructure
    • /
    • v.10 no.4
    • /
    • pp.107-115
    • /
    • 2023
  • High turbidity in source water can have adverse effects on water treatment plant operations and aquatic ecosystems, necessitating turbidity management. Consequently, research aimed at predicting river turbidity continues. This study developed a multi-class classification model for prediction of turbidity using LightGBM (Light Gradient Boosting Machine), a representative ensemble machine learning algorithm. The model utilized data that was classified into four classes ranging from 1 to 4 based on turbidity, from low to high. The number of input data points used for analysis varied among classes, with 945, 763, 95, and 25 data points for classes 1 to 4, respectively. The developed model exhibited precisions of 0.85, 0.71, 0.26, and 0.30, as well as recalls of 0.82, 0.76, 0.19, and 0.60 for classes 1 to 4, respectively. The model tended to perform less effectively in the minority classes due to the limited data available for these classes. To address data imbalance, the SMOTE (Synthetic Minority Over-sampling Technique) algorithm was applied, resulting in improved model performance. For classes 1 to 4, the Precision and Recall of the improved model were 0.88, 0.71, 0.26, 0.25 and 0.79, 0.76, 0.38, 0.60, respectively. This demonstrated that alleviating data imbalance led to a significant enhancement in Recall of the model. Furthermore, to analyze the impact of differences in input data composition addressing the input data imbalance, input data was constructed with various ratios for each class, and the model performances were compared. The results indicate that an appropriate composition ratio for model input data improves the performance of the machine learning model.

Improved Estimation of Hourly Surface Ozone Concentrations using Stacking Ensemble-based Spatial Interpolation (스태킹 앙상블 모델을 이용한 시간별 지상 오존 공간내삽 정확도 향상)

  • KIM, Ye-Jin;KANG, Eun-Jin;CHO, Dong-Jin;LEE, Si-Woo;IM, Jung-Ho
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.25 no.3
    • /
    • pp.74-99
    • /
    • 2022
  • Surface ozone is produced by photochemical reactions of nitrogen oxides(NOx) and volatile organic compounds(VOCs) emitted from vehicles and industrial sites, adversely affecting vegetation and the human body. In South Korea, ozone is monitored in real-time at stations(i.e., point measurements), but it is difficult to monitor and analyze its continuous spatial distribution. In this study, surface ozone concentrations were interpolated to have a spatial resolution of 1.5km every hour using the stacking ensemble technique, followed by a 5-fold cross-validation. Base models for the stacking ensemble were cokriging, multi-linear regression(MLR), random forest(RF), and support vector regression(SVR), while MLR was used as the meta model, having all base model results as additional input variables. The results showed that the stacking ensemble model yielded the better performance than the individual base models, resulting in an averaged R of 0.76 and RMSE of 0.0065ppm during the study period of 2020. The surface ozone concentration distribution generated by the stacking ensemble model had a wider range with a spatial pattern similar with terrain and urbanization variables, compared to those by the base models. Not only should the proposed model be capable of producing the hourly spatial distribution of ozone, but it should also be highly applicable for calculating the daily maximum 8-hour ozone concentrations.

Ensemble Model using Multiple Profiles for Analytical Classification of Threat Intelligence (보안 인텔리전트 유형 분류를 위한 다중 프로파일링 앙상블 모델)

  • Kim, Young Soo
    • The Journal of the Korea Contents Association
    • /
    • v.17 no.3
    • /
    • pp.231-237
    • /
    • 2017
  • Threat intelligences collected from cyber incident sharing system and security events collected from Security Information & Event Management system are analyzed and coped with expanding malicious code rapidly with the advent of big data. Analytical classification of the threat intelligence in cyber incidents requires various features of cyber observable. Therefore it is necessary to improve classification accuracy of the similarity by using multi-profile which is classified as the same features of cyber observables. We propose a multi-profile ensemble model performed similarity analysis on cyber incident of threat intelligence based on both attack types and cyber observables that can enhance the accuracy of the classification. We see a potential improvement of the cyber incident analysis system, which enhance the accuracy of the classification. Implementation of our suggested technique in a computer network offers the ability to classify and detect similar cyber incident of those not detected by other mechanisms.