• Title/Summary/Keyword: Catboost

Search Result 7, Processing Time 0.018 seconds

A Study on the traffic flow prediction through Catboost algorithm (Catboost 알고리즘을 통한 교통흐름 예측에 관한 연구)

  • Cheon, Min Jong;Choi, Hye Jin;Park, Ji Woong;Choi, HaYoung;Lee, Dong Hee;Lee, Ook
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.22 no.3
    • /
    • pp.58-64
    • /
    • 2021
  • As the number of registered vehicles increases, traffic congestion will worsen worse, which may act as an inhibitory factor for urban social and economic development. Through accurate traffic flow prediction, various AI techniques have been used to prevent traffic congestion. This paper uses the data from a VDS (Vehicle Detection System) as input variables. This study predicted traffic flow in five levels (free flow, somewhat delayed, delayed, somewhat congested, and congested), rather than predicting traffic flow in two levels (free flow and congested). The Catboost model, which is a machine-learning algorithm, was used in this study. This model predicts traffic flow in five levels and compares and analyzes the accuracy of the prediction with other algorithms. In addition, the preprocessed model that went through RandomizedSerachCv and One-Hot Encoding was compared with the naive one. As a result, the Catboost model without any hyper-parameter showed the highest accuracy of 93%. Overall, the Catboost model analyzes and predicts a large number of categorical traffic data better than any other machine learning and deep learning models, and the initial set parameters are optimized for Catboost.

Corporate Innovation and Business Performance Prediction Using Ensemble Learning (앙상블 학습을 이용한 기업혁신과 경영성과 예측)

  • An, Kyung Min;Lee, Young Chan
    • The Journal of Information Systems
    • /
    • v.30 no.4
    • /
    • pp.247-275
    • /
    • 2021
  • Purpose This study attempted to predict corporate innovation and business performance using ensemble learning. Design/methodology/approach The ensemble techniques uses weak learning to create robust learning, which combines several weak models to derive improved performance. In this study, XGboost, LightGBM, and Catboost were used among ensemble techniques. It was compared and evaluated with traditional machine learning methods. Findings The summary of the research results is as follows. First, the type of innovation is expanding from technical innovation to non-technical areas. Second, it was confirmed that LightGBM performed best for radical innovation prediction, and XGboost performed best for incremental innovation prediction. Third, Catboost performed best for firm performance prediction. Although there was no significant difference in predictive power between ensemble techniques, we found that comparative analysis was necessary to confirm better prediction performance.

Data-driven Modeling for Valve Size and Type Prediction Using Machine Learning (머신 러닝을 이용한 밸브 사이즈 및 종류 예측 모델 개발)

  • Chanho Kim;Minshick Choi;Chonghyo Joo;A-Reum Lee;Yun Gun;Sungho Cho;Junghwan Kim
    • Korean Chemical Engineering Research
    • /
    • v.62 no.3
    • /
    • pp.214-224
    • /
    • 2024
  • Valves play an essential role in a chemical plant such as regulating fluid flow and pressure. Therefore, optimal selection of the valve size and type is essential task. Valve size and type have been selected based on theoretical formulas about calculating valve sizing coefficient (Cv). However, this approach has limitations such as requiring expert knowledge and consuming substantial time and costs. Herein, this study developed a model for predicting valve sizes and types using machine learning. We developed models using four algorithms: ANN, Random Forest, XGBoost, and Catboost and model performances were evaluated using NRMSE & R2 score for size prediction and F1 score for type prediction. Additionally, a case study was conducted to explore the impact of phases on valve selection, using four datasets: total fluids, liquids, gases, and steam. As a result of the study, for valve size prediction, total fluid, liquid, and gas dataset demonstrated the best performance with Catboost (Based on R2, total: 0.99216, liquid: 0.98602, gas: 0.99300. Based on NRMSE, total: 0.04072, liquid: 0.04886, gas: 0.03619) and steam dataset showed the best performance with RandomForest (R2: 0.99028, NRMSE: 0.03493). For valve type prediction, Catboost outperformed all datasets with the highest F1 scores (total: 0.95766, liquids: 0.96264, gases: 0.95770, steam: 1.0000). In Engineering Procurement Construction industry, the proposed fluid-specific machine learning-based model is expected to guide the selection of suitable valves based on given process conditions and facilitate faster decision-making.

A Study on the Prediction of Apartment Sale Price Using Machine Learning : Focused on the Collection of Internal and External Data and Price Prediction of Korean Apartments (기계학습을 이용한 아파트 매매가격 예측 연구 : 한국 아파트의 내·외적 데이터 수집과 가격 예측 중심으로)

  • Ju, Jeong-Min;Kang, Sun-Mee;Choi, Ji-Wung;Han, Youngwoo
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2020.11a
    • /
    • pp.956-959
    • /
    • 2020
  • 본 연구에서는 아파트를 대표할 수 있는 내·외적 데이터를 수집하고 인공지능 기술들을 활용하여 아파트 가격을 예측하는 시스템을 구축하고자 한다. 구체적으로 웹크롤링 기법을 통해 수집한 아파트 내·외적 데이터의 변수들에 대한 특성 선택(Feature Selection)을 수행하였고, 다양한 인공지능 기법을 활용하여 부동산 가격 예측 모형을 개발하였다. 아파트 가격 예측 모형 생성을 위해 Linear Regression, Ridge, Xgboost, Lightgbm, Catboost 등의 기계학습 알고리즘을 사용하였고, RMSE를 사용하여 각 예측 모형 간의 성능 비교를 수행하였다. 가장 성능이 좋은 예측 모형은 Xgboost기반 예측 모형이였으며, RMSE값이 약 0.0366으로 가장 낮았으며 테스트 데이터에 대한 정확도는 약 95.1%였다.

Malicious Traffic Classification Using Mitre ATT&CK and Machine Learning Based on UNSW-NB15 Dataset (마이터 어택과 머신러닝을 이용한 UNSW-NB15 데이터셋 기반 유해 트래픽 분류)

  • Yoon, Dong Hyun;Koo, Ja Hwan;Won, Dong Ho
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.2
    • /
    • pp.99-110
    • /
    • 2023
  • This study proposed a classification of malicious network traffic using the cyber threat framework(Mitre ATT&CK) and machine learning to solve the real-time traffic detection problems faced by current security monitoring systems. We applied a network traffic dataset called UNSW-NB15 to the Mitre ATT&CK framework to transform the label and generate the final dataset through rare class processing. After learning several boosting-based ensemble models using the generated final dataset, we demonstrated how these ensemble models classify network traffic using various performance metrics. Based on the F-1 score, we showed that XGBoost with no rare class processing is the best in the multi-class traffic environment. We recognized that machine learning ensemble models through Mitre ATT&CK label conversion and oversampling processing have differences over existing studies, but have limitations due to (1) the inability to match perfectly when converting between existing datasets and Mitre ATT&CK labels and (2) the presence of excessive sparse classes. Nevertheless, Catboost with B-SMOTE achieved the classification accuracy of 0.9526, which is expected to be able to automatically detect normal/abnormal network traffic.

URL Phishing Detection System Utilizing Catboost Machine Learning Approach

  • Fang, Lim Chian;Ayop, Zakiah;Anawar, Syarulnaziah;Othman, Nur Fadzilah;Harum, Norharyati;Abdullah, Raihana Syahirah
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.9
    • /
    • pp.297-302
    • /
    • 2021
  • The development of various phishing websites enables hackers to access confidential personal or financial data, thus, decreasing the trust in e-business. This paper compared the detection techniques utilizing URL-based features. To analyze and compare the performance of supervised machine learning classifiers, the machine learning classifiers were trained by using more than 11,005 phishing and legitimate URLs. 30 features were extracted from the URLs to detect a phishing or legitimate URL. Logistic Regression, Random Forest, and CatBoost classifiers were then analyzed and their performances were evaluated. The results yielded that CatBoost was much better classifier than Random Forest and Logistic Regression with up to 96% of detection accuracy.

Development of a water quality prediction model for mineral springs in the metropolitan area using machine learning (머신러닝을 활용한 수도권 약수터 수질 예측 모델 개발)

  • Yeong-Woo Lim;Ji-Yeon Eom;Kee-Young Kwahk
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.1
    • /
    • pp.307-325
    • /
    • 2023
  • Due to the prolonged COVID-19 pandemic, the frequency of people who are tired of living indoors visiting nearby mountains and national parks to relieve depression and lethargy has exploded. There is a place where thousands of people who came out of nature stop walking and breathe and rest, that is the mineral spring. Even in mountains or national parks, there are about 600 mineral springs that can be found occasionally in neighboring parks or trails in the metropolitan area. However, due to irregular and manual water quality tests, people drink mineral water without knowing the test results in real time. Therefore, in this study, we intend to develop a model that can predict the quality of the spring water in real time by exploring the factors affecting the quality of the spring water and collecting data scattered in various places. After limiting the regions to Seoul and Gyeonggi-do due to the limitations of data collection, we obtained data on water quality tests from 2015 to 2020 for about 300 mineral springs in 18 cities where data management is well performed. A total of 10 factors were finally selected after two rounds of review among various factors that are considered to affect the suitability of the mineral spring water quality. Using AutoML, an automated machine learning technology that has recently been attracting attention, we derived the top 5 models based on prediction performance among about 20 machine learning methods. Among them, the catboost model has the highest performance with a prediction classification accuracy of 75.26%. In addition, as a result of examining the absolute influence of the variables used in the analysis through the SHAP method on the prediction, the most important factor was whether or not a water quality test was judged nonconforming in the previous water quality test. It was confirmed that the temperature on the day of the inspection and the altitude of the mineral spring had an influence on whether the water quality was unsuitable.