• Title/Summary/Keyword: tree-based models

Search Result 437, Processing Time 0.025 seconds

Optimizing E-Commerce with Ensemble Learning and Iterative Clustering for Superior Product Selection

  • Yuchen Liu;Meng Wang;Gangmin Li;Terry R. Payne;Yong Yue;Ka Lok Man
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.10
    • /
    • pp.2818-2839
    • /
    • 2024
  • With the continuous growth of e-commerce sales, a robust product selection model is essential to maintain competitiveness and meet consumer demand. Current research primarily focuses on single models for sales prediction and lacks an integrated approach to sales forecasting and product selection. This paper proposes a comprehensive framework (VN-CPC) that combines sales forecasting with product selection to address these issues. We integrate a series of classical machine learning models, including Tree Models (XGBoost, LightGBM, CatBoost), Support Vector Machine (SVM), Bayesian Ridge, and Artificial Neural Networks (ANN), using a voting mechanism to determine the optimal weighting scheme. Our method demonstrates a lower Root Mean Square Error (RMSE) on collected Amazon data than individual models and other ensemble models. Furthermore, we employ a three-tiered clustering model: Initial Clustering, Refinement Clustering, and Final Clustering, based on our predictive model to refine product selection to specific categories. This integrated forecasting and selection framework can be more effectively applied in the dynamic e-commerce environment. It provides a robust tool for businesses to optimize their product offerings and stay ahead in a competitive market.

Analysis of disc cutter replacement based on wear patterns using artificial intelligence classification models

  • Yunhee Kim;Jaewoo Shin;Bumjoo Kim
    • Geomechanics and Engineering
    • /
    • v.38 no.6
    • /
    • pp.633-645
    • /
    • 2024
  • Disc cutters, used as excavation tools for rocks in a Tunnel Boring Machine (TBM), naturally undergo wear during the tunneling process, involving crushing and cutting through the ground, leading to various wear types. When disc cutters reach their wear limits, they must be replaced at the appropriate time to ensure efficient excavation. General disc cutter life prediction models are typically used during the design phase to predict the total required quantity and replacement locations for construction. However, disc cutters are replaced more frequently during tunneling than initially planned. Unpredictable disc cutter replacements can easily diminish tunneling efficiency, and abnormal wear is a common cause during tunneling in complex ground conditions. This study aims to overcome the limitations of existing disc cutter life prediction models by utilizing machine data generated during tunneling to predict disc cutter wear patterns and determine the need for replacements in real-time. Artificial intelligence classification algorithms, including K-nearest Neighbors (KNN), Support Vector Machine (SVM), Decision Tree (DT), and Stacking, are employed to assess the need for disc cutter replacement. Binary classification models are developed to predict which disc cutters require replacement, while multi-class classification models are fine-tuned to identify three categories: no replacement required, replacement due to normal wear, and replacement due to abnormal wear during tunneling. The performance of these models is thoroughly assessed, demonstrating that the proposed approach effectively manages disc cutter wear and replacements in shield TBM tunnel projects.

3D Model Retrieval Using Geometric Information (기하학 정보를 이용한 3차원 모델 검색)

  • Lee Kee-Ho;Kim Nac-Woo;Kim Tae-Yong;Choi Jong-Soo
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.30 no.10C
    • /
    • pp.1007-1016
    • /
    • 2005
  • This paper presents a feature extraction method for shape based retrieval of 3D models. Since the feature descriptor of 3D model should be invariant to translation, rotation and scaling, it is necessary to preprocess the 3D models to represent them in a canonical coordinate system. We use the PCA(Principal Component Analysis) method to preprocess the 3D models. Also, we apply that to make a MBR(Minimum Boundary Rectangle) and a circumsphere. The proposed algorithm is as follows. We generate a circumsphere around 3D models, where radius equals 1(r=1) and locate each model in the center of the circumsphere. We produce the concentric spheres with a different radius($r_i=i/n,\;i=1,2,{\ldots},n$). After looking for meshes intersected with the concentric spheres, we compute the curvature of the meshes. We use these curvatures as the model descriptor. Experimental results numerically show the performance improvement of proposed algorithm from min. 0.1 to max. 0.6 in comparison with conventional methods by ANMRR, although our method uses .relatively small bins. This paper uses $R{^*}-tree$ as the indexing.

Hybrid machine learning with moth-flame optimization methods for strength prediction of CFDST columns under compression

  • Quang-Viet Vu;Dai-Nhan Le;Thai-Hoan Pham;Wei Gao;Sawekchai Tangaramvong
    • Steel and Composite Structures
    • /
    • v.51 no.6
    • /
    • pp.679-695
    • /
    • 2024
  • This paper presents a novel technique that combines machine learning (ML) with moth-flame optimization (MFO) methods to predict the axial compressive strength (ACS) of concrete filled double skin steel tubes (CFDST) columns. The proposed model is trained and tested with a dataset containing 125 tests of the CFDST column subjected to compressive loading. Five ML models, including extreme gradient boosting (XGBoost), gradient tree boosting (GBT), categorical gradient boosting (CAT), support vector machines (SVM), and decision tree (DT) algorithms, are utilized in this work. The MFO algorithm is applied to find optimal hyperparameters of these ML models and to determine the most effective model in predicting the ACS of CFDST columns. Predictive results given by some performance metrics reveal that the MFO-CAT model provides superior accuracy compared to other considered models. The accuracy of the MFO-CAT model is validated by comparing its predictive results with existing design codes and formulae. Moreover, the significance and contribution of each feature in the dataset are examined by employing the SHapley Additive exPlanations (SHAP) method. A comprehensive uncertainty quantification on probabilistic characteristics of the ACS of CFDST columns is conducted for the first time to examine the models' responses to variations of input variables in the stochastic environments. Finally, a web-based application is developed to predict ACS of the CFDST column, enabling rapid practical utilization without requesting any programing or machine learning expertise.

Estimation of fruit number of apple tree based on YOLOv5 and regression model (YOLOv5 및 다항 회귀 모델을 활용한 사과나무의 착과량 예측 방법)

  • Hee-Jin Gwak;Yunju Jeong;Ik-Jo Chun;Cheol-Hee Lee
    • Journal of IKEEE
    • /
    • v.28 no.2
    • /
    • pp.150-157
    • /
    • 2024
  • In this paper, we propose a novel algorithm for predicting the number of apples on an apple tree using a deep learning-based object detection model and a polynomial regression model. Measuring the number of apples on an apple tree can be used to predict apple yield and to assess losses for determining agricultural disaster insurance payouts. To measure apple fruit load, we photographed the front and back sides of apple trees. We manually labeled the apples in the captured images to construct a dataset, which was then used to train a one-stage object detection CNN model. However, when apples on an apple tree are obscured by leaves, branches, or other parts of the tree, they may not be captured in images. Consequently, it becomes difficult for image recognition-based deep learning models to detect or infer the presence of these apples. To address this issue, we propose a two-stage inference process. In the first stage, we utilize an image-based deep learning model to count the number of apples in photos taken from both sides of the apple tree. In the second stage, we conduct a polynomial regression analysis, using the total apple count from the deep learning model as the independent variable, and the actual number of apples manually counted during an on-site visit to the orchard as the dependent variable. The performance evaluation of the two-stage inference system proposed in this paper showed an average accuracy of 90.98% in counting the number of apples on each apple tree. Therefore, the proposed method can significantly reduce the time and cost associated with manually counting apples. Furthermore, this approach has the potential to be widely adopted as a new foundational technology for fruit load estimation in related fields using deep learning.

Healing of CAD Model Errors Using Design History (설계이력 정보를 이용한 CAD모델의 오류 수정)

  • Yang J. S.;Han S. H.
    • Korean Journal of Computational Design and Engineering
    • /
    • v.10 no.4
    • /
    • pp.262-273
    • /
    • 2005
  • For CAD data users, few things are as frustrating as receiving CAD data that is unusable due to poor data quality. Users waste time trying to get better data, fixing the data, or even rebuilding the data from scratch from paper drawings or other sources. Most related works and commercial tools handle the boundary representation (B-Rep) shape of CAD models. However, we propose a design history?based approach for healing CAD model errors. Because the design history, which covers the features, the history tree, the parameterization data and constraints, reflects the design intent, CAD model errors can be healed by an interdependency analysis of the feature commands or of the parametric data of each feature command, and by the reconstruction of these feature commands through the rule-based reasoning of an expert system. Unlike other B Rep correction methods, our method automatically heals parametric feature models without translating them to a B-Rep shape, and it also preserves engineering information.

Factors affecting success and failure of Internet company business model using inductive learning based on ID3 algorithm (ID3 알고리즘 기반의 귀납적 추론을 활용한 인터넷 기업 비즈니스 모델의 성공과 실패에 영향을 미치는 요인에 관한 연구)

  • Jin, Dong-su
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.23 no.2
    • /
    • pp.111-116
    • /
    • 2019
  • New technologies such as the IoT, Big Data, and Artificial Intelligence, starting from the Web, mobile, and smart device, enable new business models that did not exist before, and various types of Internet companies based on these business models has been emerged. In this research, we examine the factors that influence the success and failure of Internet companies. To do this, we review the recent studies on business model and examine the variables affecting the success of Internet companies in terms of network effect, user interface, cooperation with actors, creating value for users. Using the five derived variables, we will select 14 Internet companies that succeeded and failed in seven commercial business model categories. We derive decision tree by applying inductive learning based on ID3 algorithm to the analysis result and derive rules that affect success and failure based on derived decision tree. With these rules, we want to present the strategic implications for actors to succeed in Internet companies.

Development and Validation of MRI-Based Radiomics Models for Diagnosing Juvenile Myoclonic Epilepsy

  • Kyung Min Kim;Heewon Hwang;Beomseok Sohn;Kisung Park;Kyunghwa Han;Sung Soo Ahn;Wonwoo Lee;Min Kyung Chu;Kyoung Heo;Seung-Koo Lee
    • Korean Journal of Radiology
    • /
    • v.23 no.12
    • /
    • pp.1281-1289
    • /
    • 2022
  • Objective: Radiomic modeling using multiple regions of interest in MRI of the brain to diagnose juvenile myoclonic epilepsy (JME) has not yet been investigated. This study aimed to develop and validate radiomics prediction models to distinguish patients with JME from healthy controls (HCs), and to evaluate the feasibility of a radiomics approach using MRI for diagnosing JME. Materials and Methods: A total of 97 JME patients (25.6 ± 8.5 years; female, 45.5%) and 32 HCs (28.9 ± 11.4 years; female, 50.0%) were randomly split (7:3 ratio) into a training (n = 90) and a test set (n = 39) group. Radiomic features were extracted from 22 regions of interest in the brain using the T1-weighted MRI based on clinical evidence. Predictive models were trained using seven modeling methods, including a light gradient boosting machine, support vector classifier, random forest, logistic regression, extreme gradient boosting, gradient boosting machine, and decision tree, with radiomics features in the training set. The performance of the models was validated and compared to the test set. The model with the highest area under the receiver operating curve (AUROC) was chosen, and important features in the model were identified. Results: The seven tested radiomics models, including light gradient boosting machine, support vector classifier, random forest, logistic regression, extreme gradient boosting, gradient boosting machine, and decision tree, showed AUROC values of 0.817, 0.807, 0.783, 0.779, 0.767, 0.762, and 0.672, respectively. The light gradient boosting machine with the highest AUROC, albeit without statistically significant differences from the other models in pairwise comparisons, had accuracy, precision, recall, and F1 scores of 0.795, 0.818, 0.931, and 0.871, respectively. Radiomic features, including the putamen and ventral diencephalon, were ranked as the most important for suggesting JME. Conclusion: Radiomic models using MRI were able to differentiate JME from HCs.

Explainable analysis of the Relationship between Hypertension with Gas leakages (설명 가능한 인공지능 기술을 활용한 가스누출과 고혈압의 연관 분석)

  • Dashdondov, Khongorzul;Jo, Kyuri;Kim, Mi-Hye
    • Annual Conference of KIPS
    • /
    • 2022.11a
    • /
    • pp.55-56
    • /
    • 2022
  • Hypertension is a severe health problem and increases the risk of other health issues, such as heart disease, heart attack, and stroke. In this research, we propose a machine learning-based prediction method for the risk of chronic hypertension. The proposed method consists of four main modules. In the first module, the linear interpolation method fills missing values of the integration of gas and meteorological datasets. In the second module, the OrdinalEncoder-based normalization is followed by the Decision tree algorithm to select important features. The prediction analysis module builds three models based on k-Nearest Neighbors, Decision Tree, and Random Forest to predict hypertension levels. Finally, the features used in the prediction model are explained by the DeepSHAP approach. The proposed method is evaluated by integrating the Korean meteorological agency dataset, natural gas leakage dataset, and Korean National Health and Nutrition Examination Survey dataset. The experimental results showed important global features for the hypertension of the entire population and local components for particular patients. Based on the local explanation results for a randomly selected 65-year-old male, the effect of hypertension increased from 0.694 to 1.249 when age increased by 0.37 and gas loss increased by 0.17. Therefore, it is concluded that gas loss is the cause of high blood pressure.

Forecasting of the COVID-19 pandemic situation of Korea

  • Goo, Taewan;Apio, Catherine;Heo, Gyujin;Lee, Doeun;Lee, Jong Hyeok;Lim, Jisun;Han, Kyulhee;Park, Taesung
    • Genomics & Informatics
    • /
    • v.19 no.1
    • /
    • pp.11.1-11.8
    • /
    • 2021
  • For the novel coronavirus disease 2019 (COVID-19), predictive modeling, in the literature, uses broadly susceptible exposed infected recoverd (SEIR)/SIR, agent-based, curve-fitting models. Governments and legislative bodies rely on insights from prediction models to suggest new policies and to assess the effectiveness of enforced policies. Therefore, access to accurate outbreak prediction models is essential to obtain insights into the likely spread and consequences of infectious diseases. The objective of this study is to predict the future COVID-19 situation of Korea. Here, we employed 5 models for this analysis; SEIR, local linear regression (LLR), negative binomial (NB) regression, segment Poisson, deep-learning based long short-term memory models (LSTM) and tree based gradient boosting machine (GBM). After prediction, model performance comparison was evelauated using relative mean squared errors (RMSE) for two sets of train (January 20, 2020-December 31, 2020 and January 20, 2020-January 31, 2021) and testing data (January 1, 2021-February 28, 2021 and February 1, 2021-February 28, 2021) . Except for segmented Poisson model, the other models predicted a decline in the daily confirmed cases in the country for the coming future. RMSE values' comparison showed that LLR, GBM, SEIR, NB, and LSTM respectively, performed well in the forecasting of the pandemic situation of the country. A good understanding of the epidemic dynamics would greatly enhance the control and prevention of COVID-19 and other infectious diseases. Therefore, with increasing daily confirmed cases since this year, these results could help in the pandemic response by informing decisions about planning, resource allocation, and decision concerning social distancing policies.