• Title/Summary/Keyword: Gradient boosting

Search Result 205, Processing Time 0.025 seconds

A Gradient Boosting Method for Graph Neural Networks (그래프 신경망에 대한 그래디언트 부스팅 기법)

  • Jang, Eunjo;Lee, Ki Yong
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2022.11a
    • /
    • pp.574-576
    • /
    • 2022
  • 최근 여러 분야에서 그래프 신경망(graph neural network, GNN)이 활발히 연구되고 있다. 하지만 지금까지 대부분의 GNN 연구는 단일 GNN 모델의 성능을 향상하는 데 집중되었다. 본 논문에서는 앙상블(ensemble) 기법의 대표적 기법인 그래디언트 부스팅(gradient boosting)을 이용하여 GNN의 앙상블 모델을 만드는 방법을 제안한다. 제안 방법은 앞서 만들어진 GNN의 오차를 경사 하강법(gradient descent)을 이용하여 감소시키는 방향으로 다음 GNN을 생성한다. 이 과정을 반복하여 GNN의 최종 앙상블 모델을 얻는다. 실험에서 GNN의 대표적인 모델인 그래프 합성곱 신경망(graph convolutional network, GCN)에 제안 방법을 적용하여 앙상블 모델을 생성한 결과, 단일 GCN 모델에 비해 노드 분류 정확도가 11.3%p까지 증가하였음을 확인하였다.

Decision based uncertainty model to predict rockburst in underground engineering structures using gradient boosting algorithms

  • Kidega, Richard;Ondiaka, Mary Nelima;Maina, Duncan;Jonah, Kiptanui Arap Too;Kamran, Muhammad
    • Geomechanics and Engineering
    • /
    • v.30 no.3
    • /
    • pp.259-272
    • /
    • 2022
  • Rockburst is a dynamic, multivariate, and non-linear phenomenon that occurs in underground mining and civil engineering structures. Predicting rockburst is challenging since conventional models are not standardized. Hence, machine learning techniques would improve the prediction accuracies. This study describes decision based uncertainty models to predict rockburst in underground engineering structures using gradient boosting algorithms (GBM). The model input variables were uniaxial compressive strength (UCS), uniaxial tensile strength (UTS), maximum tangential stress (MTS), excavation depth (D), stress ratio (SR), and brittleness coefficient (BC). Several models were trained using different combinations of the input variables and a 3-fold cross-validation resampling procedure. The hyperparameters comprising learning rate, number of boosting iterations, tree depth, and number of minimum observations were tuned to attain the optimum models. The performance of the models was tested using classification accuracy, Cohen's kappa coefficient (k), sensitivity and specificity. The best-performing model showed a classification accuracy, k, sensitivity and specificity values of 98%, 93%, 1.00 and 0.957 respectively by optimizing model ROC metrics. The most and least influential input variables were MTS and BC, respectively. The partial dependence plots revealed the relationship between the changes in the input variables and model predictions. The findings reveal that GBM can be used to anticipate rockburst and guide decisions about support requirements before mining development.

Nanotechnology in early diagnosis of gastro intestinal cancer surgery through CNN and ANN-extreme gradient boosting

  • Y. Wenjing;T. Yuhan;Y. Zhiang;T. Shanhui;L. Shijun;M. Sharaf
    • Advances in nano research
    • /
    • v.15 no.5
    • /
    • pp.451-466
    • /
    • 2023
  • Gastrointestinal cancer (GC) is a prevalent malignant tumor of the digestive system that poses a severe health risk to humans. Due to the specific organ structure of the gastrointestinal system, both endoscopic and MRI diagnoses of GIC have limited sensitivity. The primary factors influencing curative efficacy in GIC patients are drug inefficacy and high recurrence rates in surgical and pharmacological therapy. Due to its unique optical features, good biocompatibility, surface effects, and small size effects, nanotechnology is a developing and advanced area of study for the detection and treatment of cancer. Because of its deep location and complex surgery, diagnosing and treating gastrointestinal cancer is very difficult. The early diagnosis and urgent treatment of gastrointestinal illness are enabled by nanotechnology. As diagnostic and therapeutic tools, nanoparticles directly target tumor cells, allowing their detection and removal. XGBoost was used as a classification method known for achieving numerous winning solutions in data analysis competitions, to capture nonlinear relations among many input variables and outcomes using the boosting approach to machine learning. The research sample included 300 GC patients, comprising 190 males (72.2% of the sample) and 110 women (27.8%). Using convolutional neural networks (CNN) and artificial neural networks (ANN)-EXtreme Gradient Boosting (XGBoost), the patients mean± SD age was 50.42 ± 13.06. High-risk behaviors (P = 0.070), age at diagnosis (P = 0.037), distant metastasis (P = 0.004), and tumor stage (P = 0.015) were shown to have a statistically significant link with GC patient survival. AUC was 0.92, sensitivity was 81.5%, specificity was 90.5%, and accuracy was 84.7 when analyzing stomach picture.

Development of ensemble machine learning models for evaluating seismic demands of steel moment frames

  • Nguyen, Hoang D.;Kim, JunHee;Shin, Myoungsu
    • Steel and Composite Structures
    • /
    • v.44 no.1
    • /
    • pp.49-63
    • /
    • 2022
  • This study aims to develop ensemble machine learning (ML) models for estimating the peak floor acceleration and maximum top drift of steel moment frames. For this purpose, random forest, adaptive boosting, gradient boosting regression tree (GBRT), and extreme gradient boosting (XGBoost) models were considered. A total of 621 steel moment frames were analyzed under 240 ground motions using OpenSees software to generate the dataset for ML models. From the results, the GBRT and XGBoost models exhibited the highest performance for predicting peak floor acceleration and maximum top drift, respectively. The significance of each input variable on the prediction was examined using the best-performing models and Shapley additive explanations approach (SHAP). It turned out that the peak ground acceleration had the most significant impact on the peak floor acceleration prediction. Meanwhile, the spectral accelerations at 1 and 2 s had the most considerable influence on the maximum top drift prediction. Finally, a graphical user interface module was created that places a pioneering step for the application of ML to estimate the seismic demands of building structures in practical design.

Machine learning-based prediction of wind forces on CAARC standard tall buildings

  • Yi Li;Jie-Ting Yin;Fu-Bin Chen;Qiu-Sheng Li
    • Wind and Structures
    • /
    • v.36 no.6
    • /
    • pp.355-366
    • /
    • 2023
  • Although machine learning (ML) techniques have been widely used in various fields of engineering practice, their applications in the field of wind engineering are still at the initial stage. In order to evaluate the feasibility of machine learning algorithms for prediction of wind loads on high-rise buildings, this study took the exposure category type, wind direction and the height of local wind force as the input features and adopted four different machine learning algorithms including k-nearest neighbor (KNN), support vector machine (SVM), gradient boosting regression tree (GBRT) and extreme gradient (XG) boosting to predict wind force coefficients of CAARC standard tall building model. All the hyper-parameters of four ML algorithms are optimized by tree-structured Parzen estimator (TPE). The result shows that mean drag force coefficients and RMS lift force coefficients can be well predicted by the GBRT algorithm model while the RMS drag force coefficients can be forecasted preferably by the XG boosting algorithm model. The proposed machine learning based algorithms for wind loads prediction can be an alternative of traditional wind tunnel tests and computational fluid dynamic simulations.

Predicting Gross Box Office Revenue for Domestic Films

  • Song, Jongwoo;Han, Suji
    • Communications for Statistical Applications and Methods
    • /
    • v.20 no.4
    • /
    • pp.301-309
    • /
    • 2013
  • This paper predicts gross box office revenue for domestic films using the Korean film data from 2008-2011. We use three regression methods, Linear Regression, Random Forest and Gradient Boosting to predict the gross box office revenue. We only consider domestic films with a revenue size of at least KRW 500 million; relevant explanatory variables are chosen by data visualization and variable selection techniques. The key idea of analyzing this data is to construct the meaningful explanatory variables from the data sources available to the public. Some variables must be categorized to conduct more effective analysis and clustering methods are applied to achieve this task. We choose the best model based on performance in the test set and important explanatory variables are discussed.

A Study On User Skin Color-Based Foundation Color Recommendation Method Using Deep Learning (딥러닝을 이용한 사용자 피부색 기반 파운데이션 색상 추천 기법 연구)

  • Jeong, Minuk;Kim, Hyeonji;Gwak, Chaewon;Oh, Yoosoo
    • Journal of Korea Multimedia Society
    • /
    • v.25 no.9
    • /
    • pp.1367-1374
    • /
    • 2022
  • In this paper, we propose an automatic cosmetic foundation recommendation system that suggests a good foundation product based on the user's skin color. The proposed system receives and preprocesses user images and detects skin color with OpenCV and machine learning algorithms. The system then compares the performance of the training model using XGBoost, Gradient Boost, Random Forest, and Adaptive Boost (AdaBoost), based on 550 datasets collected as essential bestsellers in the United States. Based on the comparison results, this paper implements a recommendation system using the highest performing machine learning model. As a result of the experiment, our system can effectively recommend a suitable skin color foundation. Thus, our system model is 98% accurate. Furthermore, our system can reduce the selection trials of foundations against the user's skin color. It can also save time in selecting foundations.

Analysis of Factors Related To Elderly Pedestrian Traffic Accients : Centered on Seoul Metropolitan City (노인보행자교통사고 요인 분석 : 서울특별시 중심으로)

  • Seong, Je Min;Yoon, Byoung-Jo
    • Proceedings of the Korean Society of Disaster Information Conference
    • /
    • 2023.11a
    • /
    • pp.261-262
    • /
    • 2023
  • 보행자 교통사고는 보행자와 운행 중인 차량 간 발생한 충돌사고로 도로 및 주변 환경 등에 영항을 받는다. 이 연구에서는 2018년부터 2022년까지 서울특별시에서 발생한 노인 보행자 교통사고 자료를 수집하여 보행자 교통사고의 사고 요인을 분석하였다. 분석에 있어서 고려된 연구모형은 랜덤포레스트, Gradient Boosting regression(GBR)이다. 분석 결과 서울특별시의 지리적 특성과 교통 통행 패턴을 반영하여 교통약자를 대상으로 하는 교통정책을 보완하고, 보행 안전을 강화하는 것이 필요하다.

  • PDF

Ensemble Gene Selection Method Based on Multiple Tree Models

  • Mingzhu Lou
    • Journal of Information Processing Systems
    • /
    • v.19 no.5
    • /
    • pp.652-662
    • /
    • 2023
  • Identifying highly discriminating genes is a critical step in tumor recognition tasks based on microarray gene expression profile data and machine learning. Gene selection based on tree models has been the subject of several studies. However, these methods are based on a single-tree model, often not robust to ultra-highdimensional microarray datasets, resulting in the loss of useful information and unsatisfactory classification accuracy. Motivated by the limitations of single-tree-based gene selection, in this study, ensemble gene selection methods based on multiple-tree models were studied to improve the classification performance of tumor identification. Specifically, we selected the three most representative tree models: ID3, random forest, and gradient boosting decision tree. Each tree model selects top-n genes from the microarray dataset based on its intrinsic mechanism. Subsequently, three ensemble gene selection methods were investigated, namely multipletree model intersection, multiple-tree module union, and multiple-tree module cross-union, were investigated. Experimental results on five benchmark public microarray gene expression datasets proved that the multiple tree module union is significantly superior to gene selection based on a single tree model and other competitive gene selection methods in classification accuracy.

A sensitivity analysis of machine learning models on fire-induced spalling of concrete: Revealing the impact of data manipulation on accuracy and explainability

  • Mohammad K. al-Bashiti;M.Z. Naser
    • Computers and Concrete
    • /
    • v.33 no.4
    • /
    • pp.409-423
    • /
    • 2024
  • Using an extensive database, a sensitivity analysis across fifteen machine learning (ML) classifiers was conducted to evaluate the impact of various data manipulation techniques, evaluation metrics, and explainability tools. The results of this sensitivity analysis reveal that the examined models can achieve an accuracy ranging from 72-93% in predicting the fire-induced spalling of concrete and denote the light gradient boosting machine, extreme gradient boosting, and random forest algorithms as the best-performing models. Among such models, the six key factors influencing spalling were maximum exposure temperature, heating rate, compressive strength of concrete, moisture content, silica fume content, and the quantity of polypropylene fiber. Our analysis also documents some conflicting results observed with the deep learning model. As such, this study highlights the necessity of selecting suitable models and carefully evaluating the presence of possible outcome biases.