• Title/Summary/Keyword: Regression Algorithms

Search Result 401, Processing Time 0.045 seconds

Performance Comparison of Machine-learning Models for Analyzing Weather and Traffic Accident Correlations

  • Li Zi Xuan;Hyunho Yang
    • Journal of information and communication convergence engineering
    • /
    • v.21 no.3
    • /
    • pp.225-232
    • /
    • 2023
  • Owing to advancements in intelligent transportation systems (ITS) and artificial-intelligence technologies, various machine-learning models can be employed to simulate and predict the number of traffic accidents under different weather conditions. Furthermore, we can analyze the relationship between weather and traffic accidents, allowing us to assess whether the current weather conditions are suitable for travel, which can significantly reduce the risk of traffic accidents. In this study, we analyzed 30000 traffic flow data points collected by traffic cameras at nearby intersections in Washington, D.C., USA from October 2012 to May 2017, using Pearson's heat map. We then predicted, analyzed, and compared the performance of the correlation between continuous features by applying several machine-learning algorithms commonly used in ITS, including random forest, decision tree, gradient-boosting regression, and support vector regression. The experimental results indicated that the gradient-boosting regression machine-learning model had the best performance.

Symbolic regression based on parallel Genetic Programming (병렬 유전자 프로그래밍을 이용한 Symbolic Regression)

  • Kim, Chansoo;Han, Keunhee
    • Journal of Digital Convergence
    • /
    • v.18 no.12
    • /
    • pp.481-488
    • /
    • 2020
  • Symbolic regression is an analysis method that directly generates a function that can explain the relationsip between dependent and independent variables for a given data in regression analysis. Genetic Programming is the leading technology of research in this field. It has the advantage of being able to directly derive a model that can be interpreted compared to other regression analysis algorithms that seek to optimize parameters from a fixed model. In this study, we propse a symbolic regression algorithm using parallel genetic programming based on a coarse grained parallel model, and apply the proposed algorithm to PMLB data to analyze the effectiveness of the algorithm.

Prediction of the number of public bicycle rental in Seoul using Boosted Decision Tree Regression Algorithm

  • KIM, Hyun-Jun;KIM, Hyun-Ki
    • Korean Journal of Artificial Intelligence
    • /
    • v.10 no.1
    • /
    • pp.9-14
    • /
    • 2022
  • The demand for public bicycles operated by the Seoul Metropolitan Government is increasing every year. The size of the Seoul public bicycle project, which first started with about 5,600 units, increased to 3,7500 units as of September 2021, and the number of members is also increasing every year. However, as the size of the project grows, excessive budget spending and deficit problems are emerging for public bicycle projects, and new bicycles, rental office costs, and bicycle maintenance costs are blamed for the deficit. In this paper, the Azure Machine Learning Studio program and the Boosted Decision Tree Regression technique are used to predict the number of public bicycle rental over environmental factors and time. Predicted results it was confirmed that the demand for public bicycles was high in the season except for winter, and the demand for public bicycles was the highest at 6 p.m. In addition, in this paper compare four additional regression algorithms in addition to the Boosted Decision Tree Regression algorithm to measure algorithm performance. The results showed high accuracy in the order of the First Boosted Decision Tree Regression Algorithm (0.878802), second Decision Forest Regression (0.838232), third Poison Regression (0.62699), and fourth Linear Regression (0.618773). Based on these predictions, it is expected that more public bicycles will be placed at rental stations near public transportation to meet the growing demand for commuting hours and that more bicycles will be placed in rental stations in summer than winter and the life of bicycles can be extended in winter.

Development of Medical Cost Prediction Model Based on the Machine Learning Algorithm (머신러닝 알고리즘 기반의 의료비 예측 모델 개발)

  • Han Bi KIM;Dong Hoon HAN
    • Journal of Korean Artificial Intelligence Association
    • /
    • v.1 no.1
    • /
    • pp.11-16
    • /
    • 2023
  • Accurate hospital case modeling and prediction are crucial for efficient healthcare. In this study, we demonstrate the implementation of regression analysis methods in machine learning systems utilizing mathematical statics and machine learning techniques. The developed machine learning model includes Bayesian linear, artificial neural network, decision tree, decision forest, and linear regression analysis models. Through the application of these algorithms, corresponding regression models were constructed and analyzed. The results suggest the potential of leveraging machine learning systems for medical research. The experiment aimed to create an Azure Machine Learning Studio tool for the speedy evaluation of multiple regression models. The tool faciliates the comparision of 5 types of regression models in a unified experiment and presents assessment results with performance metrics. Evaluation of regression machine learning models highlighted the advantages of boosted decision tree regression, and decision forest regression in hospital case prediction. These findings could lay the groundwork for the deliberate development of new directions in medical data processing and decision making. Furthermore, potential avenues for future research may include exploring methods such as clustering, classification, and anomaly detection in healthcare systems.

Imbalanced Data Improvement Techniques Based on SMOTE and Light GBM (SMOTE와 Light GBM 기반의 불균형 데이터 개선 기법)

  • Young-Jin, Han;In-Whee, Joe
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.11 no.12
    • /
    • pp.445-452
    • /
    • 2022
  • Class distribution of unbalanced data is an important part of the digital world and is a significant part of cybersecurity. Abnormal activity of unbalanced data should be found and problems solved. Although a system capable of tracking patterns in all transactions is needed, machine learning with disproportionate data, which typically has abnormal patterns, can ignore and degrade performance for minority layers, and predictive models can be inaccurately biased. In this paper, we predict target variables and improve accuracy by combining estimates using Synthetic Minority Oversampling Technique (SMOTE) and Light GBM algorithms as an approach to address unbalanced datasets. Experimental results were compared with logistic regression, decision tree, KNN, Random Forest, and XGBoost algorithms. The performance was similar in accuracy and reproduction rate, but in precision, two algorithms performed at Random Forest 80.76% and Light GBM 97.16%, and in F1-score, Random Forest 84.67% and Light GBM 91.96%. As a result of this experiment, it was confirmed that Light GBM's performance was similar without deviation or improved by up to 16% compared to five algorithms.

Estimating Methods on Exponential Regression Models with Censored Data

  • Ha, Il-Do;Lee, Youngjo;Song, Jae-Kee
    • Journal of the Korean Statistical Society
    • /
    • v.28 no.2
    • /
    • pp.195-210
    • /
    • 1999
  • We consider a large class of exponential regression models with censored data and propose two modified Fisher scoring methods with corresponding algorithms. These proposed methods improve the Newton-Raphson method in estimating the model parameters. The simulated and real examples are illustrated in aspect of convergence.

  • PDF

Robust Regression for Right-Censored Data

  • Kim, Chul-Ki
    • Journal of Korean Society for Quality Management
    • /
    • v.25 no.2
    • /
    • pp.47-59
    • /
    • 1997
  • In this paper we develop computational algorithms to calculate M-estimators of regression parameters from right-censored data that are naturally collected in quality control. In the case of M-estimators, a new statistical method is also introduced to incorporate concomitant scale estimation in the presence of right censoring on the observed responses. Furthermore, we illustrate this by simulations.

  • PDF

Comparison Study on the Estimation Algorithm of Land Surface Temperature for MODIS Data at the Korean Peninsula (MODIS 자료를 이용한 한반도 지표면 온도산출 알고리즘의 비교 연구)

  • Lee, Soon-Hwan;Ahn, Ji-Suk;Kim, Hae-Dong;Hwang, Soo-Jin
    • Journal of Environmental Science International
    • /
    • v.18 no.4
    • /
    • pp.355-367
    • /
    • 2009
  • Comparison study on the land surface temperatures, which are calculated from four different algorithms for MODIS data, was carried out and the characteristics of each algorithm on land surface temperature estimation were also analysed in this study. Algorithms, which are well used for various satellite data analysis, in the comparisons are proposed by Price, Becker and Li, Ulivieri et al., and Wan. Verification of estimated land surface temperature from each algorithm is also performed using observation based regression data. The coefficient of determination ($R^2$) for daytime land surface temperature estimated from Wan's algorithm is higher than that of another algorithms at all seasons and the value of $R^2$ reach on 0.92 at spring. Although $R^2$ for Ulivieri's algorithm is slightly lower than that for Wan's algorithm, the variation pattern of land surface temperature for two algorithms are similar. However, the difference of estimated values among four algorithms become small at the region of high land surface temperature.

Neuronal Spike Train Decoding Methods for the Brain-Machine Interface Using Nonlinear Mapping (비선형매핑 기반 뇌-기계 인터페이스를 위한 신경신호 spike train 디코딩 방법)

  • Kim, Kyunn-Hwan;Kim, Sung-Shin;Kim, Sung-June
    • The Transactions of the Korean Institute of Electrical Engineers D
    • /
    • v.54 no.7
    • /
    • pp.468-474
    • /
    • 2005
  • Brain-machine interface (BMI) based on neuronal spike trains is regarded as one of the most promising means to restore basic body functions of severely paralyzed patients. The spike train decoding algorithm, which extracts underlying information of neuronal signals, is essential for the BMI. Previous studies report that a linear filter is effective for this purpose and there is no noteworthy gain from the use of nonlinear mapping algorithms, in spite of the fact that neuronal encoding process is obviously nonlinear. We designed several decoding algorithms based on the linear filter, and two nonlinear mapping algorithms using multilayer perceptron (MLP) and support vector machine regression (SVR), and show that the nonlinear algorithms are superior in general. The MLP often showed unsatisfactory performance especially when it is carelessly trained. The nonlinear SVR showed the highest performance. This may be due to the superiority of the SVR in training and generalization. The advantage of using nonlinear algorithms were more profound for the cases when there are false-positive/negative errors in spike trains.

Development of benthic macroinvertebrate species distribution models using the Bayesian optimization (베이지안 최적화를 통한 저서성 대형무척추동물 종분포모델 개발)

  • Go, ByeongGeon;Shin, Jihoon;Cha, Yoonkyung
    • Journal of Korean Society of Water and Wastewater
    • /
    • v.35 no.4
    • /
    • pp.259-275
    • /
    • 2021
  • This study explored the usefulness and implications of the Bayesian hyperparameter optimization in developing species distribution models (SDMs). A variety of machine learning (ML) algorithms, namely, support vector machine (SVM), random forest (RF), boosted regression tree (BRT), XGBoost (XGB), and Multilayer perceptron (MLP) were used for predicting the occurrence of four benthic macroinvertebrate species. The Bayesian optimization method successfully tuned model hyperparameters, with all ML models resulting an area under the curve (AUC) > 0.7. Also, hyperparameter search ranges that generally clustered around the optimal values suggest the efficiency of the Bayesian optimization in finding optimal sets of hyperparameters. Tree based ensemble algorithms (BRT, RF, and XGB) tended to show higher performances than SVM and MLP. Important hyperparameters and optimal values differed by species and ML model, indicating the necessity of hyperparameter tuning for improving individual model performances. The optimization results demonstrate that for all macroinvertebrate species SVM and RF required fewer numbers of trials until obtaining optimal hyperparameter sets, leading to reduced computational cost compared to other ML algorithms. The results of this study suggest that the Bayesian optimization is an efficient method for hyperparameter optimization of machine learning algorithms.