• Title/Summary/Keyword: Regression

Search Result 35,019, Processing Time 0.057 seconds

Traffic Crash Prediction Models for Expressway Ramps (고속도로 연결로의 교통사고예측모형 개발)

  • Choi, Yoon-Hwan;Oh, Young-Tae;Choi, Kee-Choo;Lee, Choul-Ki;Yun, Il-Soo
    • International Journal of Highway Engineering
    • /
    • v.14 no.5
    • /
    • pp.133-143
    • /
    • 2012
  • PURPOSES: Using the collected data for crash, traffic volume, and design elements on ramps between 2007 and 2009, this research effort was initiated to develop traffic crash prediction models for expressway ramps. METHODS: Three negative binomial regression models and three zero-inflated negative binomial regression models were developed for individual ramp types, including direct, semi-direct and loop, respectively. For validating the developed models, authors compared the estimated crash frequencies with actual crash frequencies of twelve randomly selected interchanges, the ramps of which have not been used for model developing. RESULTS: The results show that the negative binomial regression models for direct, semi-direct and loop ramps showed 60.3%, 63.8% and 48.7% error rates on average whereas the zero-inflated negative binomial regression models showed 82.1%, 120.4% and 57.3%, respectively. CONCLUSIONS: Conclusively, the negative binomial regression models worked better in traffic crash prediction than the zero-inflated negative binomial regression models for estimating the frequency of traffic accidents on expressway ramps.

Development and Evaluation of Simple Regression Model and Multiple Regression Model for TOC Contentation Estimation in Stream Flow (하천수내 TOC 농도 추정을 위한 단순회귀모형과 다중회귀모형의 개발과 평가)

  • Jung, Jaewoon;Cho, Sohyun;Choi, Jinhee;Kim, Kapsoon;Jung, Soojung;Lim, Byungjin
    • Journal of Korean Society on Water Environment
    • /
    • v.29 no.5
    • /
    • pp.625-629
    • /
    • 2013
  • The objective of this study is to develop and evaluate simple and multiple regression models for Total Organic Carbon (TOC) concentration estimation in stream flow. For development (using water quality data in 2012) and evaluation (using water quality data in 2011) of regression models, we used water quality data from downstream of Yeongsan river basin during 2011 and 2012, and correlation analysis between TOC and water quality parameters was conducted. The concentrations of TOC were positively correlated with Chemical Oxygen Demand (COD), Biochemical Oxygen Demand (BOD), TN (Total Nitrogen), Water Temperature (WT) and Electric Conductivity (EC). From these results, simple and multiple regression models for TOC estimation were developed as follows : $TOC=0.5809{\times}BOD+3.1557$, $TOC=0.4365{\times}COD+1.3731$. As a result of the application evaluation of the developed regression models, the multiple regression model was found to estimate TOC better than simple regression models.

Estimation of Pollutant Load Using Genetic-algorithm and Regression Model (유전자 알고리즘과 회귀식을 이용한 오염부하량의 예측)

  • Park, Youn Shik
    • Korean Journal of Environmental Agriculture
    • /
    • v.33 no.1
    • /
    • pp.37-43
    • /
    • 2014
  • BACKGROUND: Water quality data are collected less frequently than flow data because of the cost to collect and analyze, while water quality data corresponding to flow data are required to compute pollutant loads or to calibrate other hydrology models. Regression models are applicable to interpolate water quality data corresponding to flow data. METHODS AND RESULTS: A regression model was suggested which is capable to consider flow and time variance, and the regression model coefficients were calibrated using various measured water quality data with genetic-algorithm. Both LOADEST and the regression using genetic-algorithm were evaluated by 19 water quality data sets through calibration and validation. The regression model using genetic-algorithm displayed the similar model behaviors to LOADEST. The load estimates by both LOADEST and the regression model using genetic-algorithm indicated that use of a large proportion of water quality data does not necessarily lead to the load estimates with smaller error to measured load. CONCLUSION: Regression models need to be calibrated and validated before they are used to interpolate pollutant loads, as separating water quality data into two data sets for calibration and validation.

Correlation and Simple Linear Regression (상관성과 단순선형회귀분석)

  • Pak, Son-Il;Oh, Tae-Ho
    • Journal of Veterinary Clinics
    • /
    • v.27 no.4
    • /
    • pp.427-434
    • /
    • 2010
  • Correlation is a technique used to measure the strength or the degree of closeness of the linear association between two quantitative variables. Common misuses of this technique are highlighted. Linear regression is a technique used to identify a relationship between two continuous variables in mathematical equations, which could be used for comparison or estimation purposes. Specifically, regression analysis can provide answers for questions such as how much does one variable change for a given change in the other, how accurately can the value of one variable be predicted from the knowledge of the other. Regression does not give any indication of how good the association is while correlation provides a measure of how well a least-squares regression line fits the given set of data. The better the correlation, the closer the data points are to the regression line. In this tutorial article, the process of obtaining a linear regression relationship for a given set of bivariate data was described. The least square method to obtain the line which minimizes the total error between the data points and the regression line was employed and illustrated. The coefficient of determination, the ratio of the explained variation of the values of the independent variable to total variation, was described. Finally, the process of calculating confidence and prediction interval was reviewed and demonstrated.

Hybrid Fuzzy Least Squares Support Vector Machine Regression for Crisp Input and Fuzzy Output

  • Shim, Joo-Yong;Seok, Kyung-Ha;Hwang, Chang-Ha
    • Communications for Statistical Applications and Methods
    • /
    • v.17 no.2
    • /
    • pp.141-151
    • /
    • 2010
  • Hybrid fuzzy regression analysis is used for integrating randomness and fuzziness into a regression model. Least squares support vector machine(LS-SVM) has been very successful in pattern recognition and function estimation problems for crisp data. This paper proposes a new method to evaluate hybrid fuzzy linear and nonlinear regression models with crisp inputs and fuzzy output using weighted fuzzy arithmetic(WFA) and LS-SVM. LS-SVM allows us to perform fuzzy nonlinear regression analysis by constructing a fuzzy linear regression function in a high dimensional feature space. The proposed method is not computationally expensive since its solution is obtained from a simple linear equation system. In particular, this method is a very attractive approach to modeling nonlinear data, and is nonparametric method in the sense that we do not have to assume the underlying model function for fuzzy nonlinear regression model with crisp inputs and fuzzy output. Experimental results are then presented which indicate the performance of this method.

Relationship between Aiming Patterns and Scores in Archery Shooting

  • Quan, ChengHao;Lee, Sangmin
    • Korean Journal of Applied Biomechanics
    • /
    • v.26 no.4
    • /
    • pp.353-360
    • /
    • 2016
  • Objective: The aim of this study was to investigate the relationship between aiming patterns and scores in archery shooting. Method: Four (N = 4) elementary-level archers from middle school participated in this study. Aiming pattern was defined by averaged acceleration data measured from accelerometers attached on the body during the aiming phase in archery shooting. Stepwise multiple regression analysis was used to test whether a model incorporating aiming patterns from all nine accelerometers could predict the scores. In order to extract period of interest (POI) data from raw data, a Dynamic Time Warping (DTW)-based extraction method was presented. Results: Regression models for all four subjects are conducted with different significance levels and variables. The significance levels of the regression models are 0.12%, 1.61%, 0.55%, and 0.4% respectively; the $R^2$ of the regression models is 64.04%, 27.93%, 72.02%, and 45.62% respectively; and the maximum significance levels of parameters in the regression models are 1.26%, 4.58%, 5.1%, and 4.98% respectively. Conclusion: Our results indicated that the relationship between aiming patterns and scores was described by a regression model. Analysis of the significance levels, variables, and parameters of the regression model showed that our approach - regression analysis with DTW - is an effective way to raise scores in archery shooting.

Symbolic regression based on parallel Genetic Programming (병렬 유전자 프로그래밍을 이용한 Symbolic Regression)

  • Kim, Chansoo;Han, Keunhee
    • Journal of Digital Convergence
    • /
    • v.18 no.12
    • /
    • pp.481-488
    • /
    • 2020
  • Symbolic regression is an analysis method that directly generates a function that can explain the relationsip between dependent and independent variables for a given data in regression analysis. Genetic Programming is the leading technology of research in this field. It has the advantage of being able to directly derive a model that can be interpreted compared to other regression analysis algorithms that seek to optimize parameters from a fixed model. In this study, we propse a symbolic regression algorithm using parallel genetic programming based on a coarse grained parallel model, and apply the proposed algorithm to PMLB data to analyze the effectiveness of the algorithm.

Robustness of model averaging methods for the violation of standard linear regression assumptions

  • Lee, Yongsu;Song, Juwon
    • Communications for Statistical Applications and Methods
    • /
    • v.28 no.2
    • /
    • pp.189-204
    • /
    • 2021
  • In a regression analysis, a single best model is usually selected among several candidate models. However, it is often useful to combine several candidate models to achieve better performance, especially, in the prediction viewpoint. Model combining methods such as stacking and Bayesian model averaging (BMA) have been suggested from the perspective of averaging candidate models. When the candidate models include a true model, it is expected that BMA generally gives better performance than stacking. On the other hand, when candidate models do not include the true model, it is known that stacking outperforms BMA. Since stacking and BMA approaches have different properties, it is difficult to determine which method is more appropriate under other situations. In particular, it is not easy to find research papers that compare stacking and BMA when regression model assumptions are violated. Therefore, in the paper, we compare the performance among model averaging methods as well as a single best model in the linear regression analysis when standard linear regression assumptions are violated. Simulations were conducted to compare model averaging methods with the linear regression when data include outliers and data do not include them. We also compared them when data include errors from a non-normal distribution. The model averaging methods were applied to the water pollution data, which have a strong multicollinearity among variables. Simulation studies showed that the stacking method tends to give better performance than BMA or standard linear regression analysis (including the stepwise selection method) in the sense of risks (see (3.1)) or prediction error (see (3.2)) when typical linear regression assumptions are violated.

Prediction of the number of public bicycle rental in Seoul using Boosted Decision Tree Regression Algorithm

  • KIM, Hyun-Jun;KIM, Hyun-Ki
    • Korean Journal of Artificial Intelligence
    • /
    • v.10 no.1
    • /
    • pp.9-14
    • /
    • 2022
  • The demand for public bicycles operated by the Seoul Metropolitan Government is increasing every year. The size of the Seoul public bicycle project, which first started with about 5,600 units, increased to 3,7500 units as of September 2021, and the number of members is also increasing every year. However, as the size of the project grows, excessive budget spending and deficit problems are emerging for public bicycle projects, and new bicycles, rental office costs, and bicycle maintenance costs are blamed for the deficit. In this paper, the Azure Machine Learning Studio program and the Boosted Decision Tree Regression technique are used to predict the number of public bicycle rental over environmental factors and time. Predicted results it was confirmed that the demand for public bicycles was high in the season except for winter, and the demand for public bicycles was the highest at 6 p.m. In addition, in this paper compare four additional regression algorithms in addition to the Boosted Decision Tree Regression algorithm to measure algorithm performance. The results showed high accuracy in the order of the First Boosted Decision Tree Regression Algorithm (0.878802), second Decision Forest Regression (0.838232), third Poison Regression (0.62699), and fourth Linear Regression (0.618773). Based on these predictions, it is expected that more public bicycles will be placed at rental stations near public transportation to meet the growing demand for commuting hours and that more bicycles will be placed in rental stations in summer than winter and the life of bicycles can be extended in winter.

Development of Medical Cost Prediction Model Based on the Machine Learning Algorithm (머신러닝 알고리즘 기반의 의료비 예측 모델 개발)

  • Han Bi KIM;Dong Hoon HAN
    • Journal of Korean Artificial Intelligence Association
    • /
    • v.1 no.1
    • /
    • pp.11-16
    • /
    • 2023
  • Accurate hospital case modeling and prediction are crucial for efficient healthcare. In this study, we demonstrate the implementation of regression analysis methods in machine learning systems utilizing mathematical statics and machine learning techniques. The developed machine learning model includes Bayesian linear, artificial neural network, decision tree, decision forest, and linear regression analysis models. Through the application of these algorithms, corresponding regression models were constructed and analyzed. The results suggest the potential of leveraging machine learning systems for medical research. The experiment aimed to create an Azure Machine Learning Studio tool for the speedy evaluation of multiple regression models. The tool faciliates the comparision of 5 types of regression models in a unified experiment and presents assessment results with performance metrics. Evaluation of regression machine learning models highlighted the advantages of boosted decision tree regression, and decision forest regression in hospital case prediction. These findings could lay the groundwork for the deliberate development of new directions in medical data processing and decision making. Furthermore, potential avenues for future research may include exploring methods such as clustering, classification, and anomaly detection in healthcare systems.