• Title/Summary/Keyword: Ensemble Machine learning

Search Result 234, Processing Time 0.023 seconds

A Study on the Development of University Students Dropout Prediction Model Using Ensemble Technique (앙상블 기법을 활용한 대학생 중도탈락 예측 모형 개발)

  • Park, Sangsung
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.17 no.1
    • /
    • pp.109-115
    • /
    • 2021
  • The number of freshmen at universities is decreasing due to the recent decline in the school-age population, and the survival of many universities is threatened. To overcome this situation, universities are seeking ways to use big data within the school to improve the quality of education. A study on the prediction of dropout students is a representative case of using big data in universities. The dropout prediction can prepare a systematic management plan by identifying students who will drop out of school due to reasons such as dropout or expulsion. In the case of actual on-campus data, a large number of missing values are included because it is collected and managed by various departments. For this reason, it is necessary to construct a model by effectively reflecting the missing values. In this study, we propose a university student dropout prediction model based on eXtreme Gradient Boost that can be applied to data with many missing values and shows high performance. In order to examine the practical applicability of the proposed model, an experiment was performed using data from C University in Chungbuk. As a result of the experiment, the prediction performance of the proposed model was found to be excellent. The management strategy of dropout students can be established through the prediction results of the model proposed in this paper.

Predicting rock brittleness indices from simple laboratory test results using some machine learning methods

  • Davood Fereidooni;Zohre Karimi
    • Geomechanics and Engineering
    • /
    • v.34 no.6
    • /
    • pp.697-726
    • /
    • 2023
  • Brittleness as an important property of rock plays a crucial role both in the failure process of intact rock and rock mass response to excavation in engineering geological and geotechnical projects. Generally, rock brittleness indices are calculated from the mechanical properties of rocks such as uniaxial compressive strength, tensile strength and modulus of elasticity. These properties are generally determined from complicated, expensive and time-consuming tests in laboratory. For this reason, in the present research, an attempt has been made to predict the rock brittleness indices from simple, inexpensive, and quick laboratory test results namely dry unit weight, porosity, slake-durability index, P-wave velocity, Schmidt rebound hardness, and point load strength index using multiple linear regression, exponential regression, support vector machine (SVM) with various kernels, generating fuzzy inference system, and regression tree ensemble (RTE) with boosting framework. So, this could be considered as an innovation for the present research. For this purpose, the number of 39 rock samples including five igneous, twenty-six sedimentary, and eight metamorphic were collected from different regions of Iran. Mineralogical, physical and mechanical properties as well as five well known rock brittleness indices (i.e., B1, B2, B3, B4, and B5) were measured for the selected rock samples before application of the above-mentioned machine learning techniques. The performance of the developed models was evaluated based on several statistical metrics such as mean square error, relative absolute error, root relative absolute error, determination coefficients, variance account for, mean absolute percentage error and standard deviation of the error. The comparison of the obtained results revealed that among the studied methods, SVM is the most suitable one for predicting B1, B2 and B5, while RTE predicts B3 and B4 better than other methods.

Assessment of compressive strength of high-performance concrete using soft computing approaches

  • Chukwuemeka Daniel;Jitendra Khatti;Kamaldeep Singh Grover
    • Computers and Concrete
    • /
    • v.33 no.1
    • /
    • pp.55-75
    • /
    • 2024
  • The present study introduces an optimum performance soft computing model for predicting the compressive strength of high-performance concrete (HPC) by comparing models based on conventional (kernel-based, covariance function-based, and tree-based), advanced machine (least square support vector machine-LSSVM and minimax probability machine regressor-MPMR), and deep (artificial neural network-ANN) learning approaches using a common database for the first time. A compressive strength database, having results of 1030 concrete samples, has been compiled from the literature and preprocessed. For the purpose of training, testing, and validation of soft computing models, 803, 101, and 101 data points have been selected arbitrarily from preprocessed data points, i.e., 1005. Thirteen performance metrics, including three new metrics, i.e., a20-index, index of agreement, and index of scatter, have been implemented for each model. The performance comparison reveals that the SVM (kernel-based), ET (tree-based), MPMR (advanced), and ANN (deep) models have achieved higher performance in predicting the compressive strength of HPC. From the overall analysis of performance, accuracy, Taylor plot, accuracy metric, regression error characteristics curve, Anderson-Darling, Wilcoxon, Uncertainty, and reliability, it has been observed that model CS4 based on the ensemble tree has been recognized as an optimum performance model with higher performance, i.e., a correlation coefficient of 0.9352, root mean square error of 5.76 MPa, and mean absolute error of 4.1069 MPa. The present study also reveals that multicollinearity affects the prediction accuracy of Gaussian process regression, decision tree, multilinear regression, and adaptive boosting regressor models, novel research in compressive strength prediction of HPC. The cosine sensitivity analysis reveals that the prediction of compressive strength of HPC is highly affected by cement content, fine aggregate, coarse aggregate, and water content.

A Method of Machine Learning-based Defective Health Functional Food Detection System for Efficient Inspection of Imported Food (효율적 수입식품 검사를 위한 머신러닝 기반 부적합 건강기능식품 탐지 방법)

  • Lee, Kyoungsu;Bak, Yerin;Shin, Yoonjong;Sohn, Kwonsang;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.3
    • /
    • pp.139-159
    • /
    • 2022
  • As interest in health functional foods has increased since COVID-19, the importance of imported food safety inspections is growing. However, in contrast to the annual increase in imports of health functional foods, the budget and manpower required for inspections for import and export are reaching their limit. Hence, the purpose of this study is to propose a machine learning model that efficiently detects unsuitable food suitable for the characteristics of data possessed by government offices on imported food. First, the components of food import/export inspections data that affect the judgment of nonconformity were examined and derived variables were newly created. Second, in order to select features for the machine learning, class imbalance and nonlinearity were considered when performing exploratory analysis on imported food-related data. Third, we try to compare the performance and interpretability of each model by applying various machine learning techniques. In particular, the ensemble model was the best, and it was confirmed that the derived variables and models proposed in this study can be helpful to the system used in import/export inspections.

Improving Efficiency of Food Hygiene Surveillance System by Using Machine Learning-Based Approaches (기계학습을 이용한 식품위생점검 체계의 효율성 개선 연구)

  • Cho, Sanggoo;Cho, Seung Yong
    • The Journal of Bigdata
    • /
    • v.5 no.2
    • /
    • pp.53-67
    • /
    • 2020
  • This study employees a supervised learning prediction model to detect nonconformity in advance of processed food manufacturing and processing businesses. The study was conducted according to the standard procedure of machine learning, such as definition of objective function, data preprocessing and feature engineering and model selection and evaluation. The dependent variable was set as the number of supervised inspection detections over the past five years from 2014 to 2018, and the objective function was to maximize the probability of detecting the nonconforming companies. The data was preprocessed by reflecting not only basic attributes such as revenues, operating duration, number of employees, but also the inspections track records and extraneous climate data. After applying the feature variable extraction method, the machine learning algorithm was applied to the data by deriving the company's risk, item risk, environmental risk, and past violation history as feature variables that affect the determination of nonconformity. The f1-score of the decision tree, one of ensemble models, was much higher than those of other models. Based on the results of this study, it is expected that the official food control for food safety management will be enhanced and geared into the data-evidence based management as well as scientific administrative system.

Improved Estimation of Hourly Surface Ozone Concentrations using Stacking Ensemble-based Spatial Interpolation (스태킹 앙상블 모델을 이용한 시간별 지상 오존 공간내삽 정확도 향상)

  • KIM, Ye-Jin;KANG, Eun-Jin;CHO, Dong-Jin;LEE, Si-Woo;IM, Jung-Ho
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.25 no.3
    • /
    • pp.74-99
    • /
    • 2022
  • Surface ozone is produced by photochemical reactions of nitrogen oxides(NOx) and volatile organic compounds(VOCs) emitted from vehicles and industrial sites, adversely affecting vegetation and the human body. In South Korea, ozone is monitored in real-time at stations(i.e., point measurements), but it is difficult to monitor and analyze its continuous spatial distribution. In this study, surface ozone concentrations were interpolated to have a spatial resolution of 1.5km every hour using the stacking ensemble technique, followed by a 5-fold cross-validation. Base models for the stacking ensemble were cokriging, multi-linear regression(MLR), random forest(RF), and support vector regression(SVR), while MLR was used as the meta model, having all base model results as additional input variables. The results showed that the stacking ensemble model yielded the better performance than the individual base models, resulting in an averaged R of 0.76 and RMSE of 0.0065ppm during the study period of 2020. The surface ozone concentration distribution generated by the stacking ensemble model had a wider range with a spatial pattern similar with terrain and urbanization variables, compared to those by the base models. Not only should the proposed model be capable of producing the hourly spatial distribution of ozone, but it should also be highly applicable for calculating the daily maximum 8-hour ozone concentrations.

An Ensemble Deep Learning Model for Measuring Displacement in Cultural Asset images (목조 문화재 영상에서의 변위량 측정을 위한 앙상블 딥러닝 모델)

  • Kang, Jaeyong;Kim, Inki;Lim, Hyunseok;Gwak, Jeonghwan
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2021.07a
    • /
    • pp.141-143
    • /
    • 2021
  • 본 논문에서는 목조 문화재의 변위량을 감지할 수 있는 앙상블 딥러닝 모델 모델을 제안한다. 우선 총 2개의 서로 다른 사전 학습된 합성 곱 신경망을 사용하여 입력 영상에 대한 심층 특징들을 추출한다. 그 이후 2개의 서로 다른 심층 특징들을 결합하여 하나의 특징 벡터를 생성한다. 그 이후 합쳐진 특징 벡터는 완전 연결 계층의 입력 값으로 들어와서 최종적으로 변위의 심각 단계에 대한 예측을 수행하게 된다. 데이터 셋으로는 충주시 근처의 문화재에 방문해서 수집한 목조 문화재 이미지를 가지고 정상 및 비정상으로 구분한 데이터 셋을 사용하였다. 실험 결과 앙상블 딥러닝 기법을 사용한 모델이 앙상블 기법을 사용하지 않는 모델보다 더 좋은 성능을 나타냄을 확인하였다. 이러한 결과로부터 우리가 제안한 방법이 목재 문화재의 변위량 예측에 있어서 매우 적합함을 보여준다.

  • PDF

Ensemble Model using Multiple Profiles for Analytical Classification of Threat Intelligence (보안 인텔리전트 유형 분류를 위한 다중 프로파일링 앙상블 모델)

  • Kim, Young Soo
    • The Journal of the Korea Contents Association
    • /
    • v.17 no.3
    • /
    • pp.231-237
    • /
    • 2017
  • Threat intelligences collected from cyber incident sharing system and security events collected from Security Information & Event Management system are analyzed and coped with expanding malicious code rapidly with the advent of big data. Analytical classification of the threat intelligence in cyber incidents requires various features of cyber observable. Therefore it is necessary to improve classification accuracy of the similarity by using multi-profile which is classified as the same features of cyber observables. We propose a multi-profile ensemble model performed similarity analysis on cyber incident of threat intelligence based on both attack types and cyber observables that can enhance the accuracy of the classification. We see a potential improvement of the cyber incident analysis system, which enhance the accuracy of the classification. Implementation of our suggested technique in a computer network offers the ability to classify and detect similar cyber incident of those not detected by other mechanisms.

Technology of Lessons Learned Analysis using Artificial intelligence: Focused on the 'L2-OODA Ensemble Algorithm' (인공지능형 전훈분석기술: 'L2-OODA 앙상블 알고리즘'을 중심으로)

  • Yang, Seong-sil;Shin, Jin
    • Convergence Security Journal
    • /
    • v.21 no.2
    • /
    • pp.67-79
    • /
    • 2021
  • Lessons Learned(LL) is a military term defined as all activities that promote future development by finding problems and need improvement in education and reality in the field of warfare development. In this paper, we focus on presenting actual examples and applying AI analysis inference techniques to solve revealed problems in promoting LL activities, such as long-term analysis, budget problems, and necessary expertise. AI legal advice services using cognitive computing-related technologies that have already been practical and in use, were judged to be the best examples to solve the problems of LL. This paper presents intelligent LL inference techniques, which utilize AI. To this end, we want to explore theoretical backgrounds such as LL analysis definitions and examples, evolution of AI into Machine Learning, cognitive computing, and apply it to new technologies in the defense sector using the newly proposed L2-OODA ensemble algorithm to contribute to implementing existing power improvement and optimization.

Infrastructure Anomaly Analysis for Data-center Failure Prevention: Based on RRCF and Prophet Ensemble Analysis (데이터센터 장애 예방을 위한 인프라 이상징후 분석: RRCF와 Prophet Ensemble 분석 기반)

  • Hyun-Jong Kim;Sung-Keun Kim;Byoung-Whan Chun;Kyong-Bog, Jin;Seung-Jeong Yang
    • The Journal of Bigdata
    • /
    • v.7 no.1
    • /
    • pp.113-124
    • /
    • 2022
  • Various methods using machine learning and big data have been applied to prevent failures in Data Centers. However, there are many limitations to referencing individual equipment-based performance indicators or to being practically utilized as an approach that does not consider the infrastructure operating environment. In this study, the performance indicators of individual infrastructure equipment are integrated monitoring and the performance indicators of various equipment are segmented and graded to make a single numerical value. Data pre-processing based on experience in infrastructure operation. And an ensemble of RRCF (Robust Random Cut Forest) analysis and Prophet analysis model led to reliable analysis results in detecting anomalies. A failure analysis system was implemented to facilitate the use of Data Center operators. It can provide a preemptive response to Data Center failures and an appropriate tuning time.