• 제목/요약/키워드: multiple linear regression models

검색결과 318건 처리시간 0.028초

다중선형 회귀분석을 이용한 고속도로 터널구간의 교통사고 예측모형 개발 (Development of Accident Forecasting Models in Freeway Tunnels using Multiple Linear Regression Analysis)

  • 박주환;김상구
    • 한국ITS학회 논문지
    • /
    • 제11권6호
    • /
    • pp.145-154
    • /
    • 2012
  • 본 논문은 고속도로 터널구간을 대상으로 교통사고특성을 다각적으로 분석하여 다양한 독립변수를 선정하고 종속변수를 건, 건/km, 건/백만대km로 다양화하여 다중선형회귀모형을 개발하였다. 그리고 개발된 모형들은 상호 비교 검토하여 최종적으로 교통사고영향요인으로 구성된 신뢰성 있는 교통사고예측모형을 결정하였다. 교통사고예측모형은 모형의 $R^2$, F값 등 검정통계량 수준, 다중공선성, 잔차분석 등 모형검증과정이 수행되었고 터널구간의 교통사고특성 반영여부 등을 검토하여 최종적으로 터널길이에 따라 총 2개의 모형을 선정하였다. 선정된 종속변수는 ln(건/백만대km)이며, 독립 변수는 연평균일교통량(AADT), 종단구배, 터널높이로 구성되었다. 추정모형은 RMSE, MAE를 이용하여 예측한 값과 실제 관측값과의 차이를 분석하여 터널구간의 교통사고를 설명하는데 적합한 모형으로 파악되었다.

Determination of Research Octane Number using NIR Spectral Data and Ridge Regression

  • 정호일;이혜선;전지혁
    • Bulletin of the Korean Chemical Society
    • /
    • 제22권1호
    • /
    • pp.37-42
    • /
    • 2001
  • Ridge regression is compared with multiple linear regression (MLR) for determination of Research Octane Number (RON) when the baseline and signal-to-noise ratio are varied. MLR analysis of near-infrared (NIR) spectroscopic data usually encounters a collinearity problem, which adversely affects long-term prediction performance. The collinearity problem can be eliminated or greatly improved by using ridge regression, which is a biased estimation method. To evaluate the robustness of each calibration, the calibration models developed by both calibration methods were used to predict RONs of gasoline spectra in which the baseline and signal-to-noise ratio were varied. The prediction results of a ridge calibration model showed more stable prediction performance as compared to that of MLR, especially when the spectral baselines were varied. . In conclusion, ridge regression is shown to be a viable method for calibration of RON with the NIR data when only a few wavelengths are available such as hand-carry device using a few diodes.

다중선형회귀와 기계학습 모델을 이용한 PM10 농도 예측 및 평가 (Evaluation and Predicting PM10 Concentration Using Multiple Linear Regression and Machine Learning)

  • 손상훈;김진수
    • 대한원격탐사학회지
    • /
    • 제36권6_3호
    • /
    • pp.1711-1720
    • /
    • 2020
  • 최근 급속한 산업화와 도시화로 인해 인위적으로 발생하는 미세먼지(Particulate matter, PM)는 기상 조건에 따라 이동 및 분산되면서 피부와 호흡기 등 인체에 악영향을 미친다. 본 연구는 기상인자를 multiple linear regression(MLR), support vector machine(SVM), 그리고 random forest(RF) 모델의 입력자료로 하여 서울시 PM10 농도를 예측하고, 모델 간 성능을 비교 평가하는데 그 목적을 둔다. 먼저 서울시에 소재한 39개소 대기오염측정망(air quality monitoring sites, AQMS)에서 관측된 PM10 농도 자료를 8:2 비율로 구분하여 모델 훈련과 검증 데이터셋으로 사용되었다. 또한 기상관측소(automatic weather system, AWS)에서 관측되고 있는 자료 중 9개 기상인자(평균기온, 최고기온, 최저기온, 일 강수량, 평균풍속, 최대순간풍속, 최대순간풍속풍향, 황사발생유무, 상대습도)가 모델의 입력자료로 선정되었다. 각 AQMS에서 관측된 PM10 농도와 MLR, SVM, 그리고 RF 모델에 의해 예측된 PM10 농도 간 결정계수(R2)는 각각 0.260, 0.772, 그리고 0.793이었고, RF 모델이 PM10 농도 예측에 가장 높은 성능을 나타냈다. 특히 모델 검증에 사용되는 AQMS 중 관악구와 강남대로 AQMS는 상대적으로 AWS에 가까워 SVM과 RF 모델에서 높은 정확도를 나타냈다. 종로구 AQMS는 AWS에서 비교적 멀리 떨어져 있지만, 인접한 두 AQMS 데이터가 모델 학습에 사용되었기 때문에 두 모델에서 높은 정확도를 나타냈다. 반면 용산구 AQMS는 AQMS 및 AWS에서 비교적 멀리 떨어져 있기에 두 모델의 성능이 낮게 나타냈다.

Water consumption prediction based on machine learning methods and public data

  • Kesornsit, Witwisit;Sirisathitkul, Yaowarat
    • Advances in Computational Design
    • /
    • 제7권2호
    • /
    • pp.113-128
    • /
    • 2022
  • Water consumption is strongly affected by numerous factors, such as population, climatic, geographic, and socio-economic factors. Therefore, the implementation of a reliable predictive model of water consumption pattern is challenging task. This study investigates the performance of predictive models based on multi-layer perceptron (MLP), multiple linear regression (MLR), and support vector regression (SVR). To understand the significant factors affecting water consumption, the stepwise regression (SW) procedure is used in MLR to obtain suitable variables. Then, this study also implements three predictive models based on these significant variables (e.g., SWMLR, SWMLP, and SWSVR). Annual data of water consumption in Thailand during 2006 - 2015 were compiled and categorized by provinces and distributors. By comparing the predictive performance of models with all variables, the results demonstrate that the MLP models outperformed the MLR and SVR models. As compared to the models with selected variables, the predictive capability of SWMLP was superior to SWMLR and SWSVR. Therefore, the SWMLP still provided satisfactory results with the minimum number of explanatory variables which in turn reduced the computation time and other resources required while performing the predictive task. It can be concluded that the MLP exhibited the best result and can be utilized as a reliable water demand predictive model for both of all variables and selected variables cases. These findings support important implications and serve as a feasible water consumption predictive model and can be used for water resources management to produce sufficient tap water to meet the demand in each province of Thailand.

ARTIFICIAL NEURAL NETWORK FOR PREDICTION OF WATER QUALITY IN PIPELINE SYSTEMS

  • Kim, Ju-Hwan;Yoon, Jae-Heung
    • Water Engineering Research
    • /
    • 제4권2호
    • /
    • pp.59-68
    • /
    • 2003
  • The applicabilities and validities of two methodologies fur the prediction of THM (trihalomethane) formation in a water pipeline system were proposed and discussed. One is the multiple regression technique and the other is an artificial neural network technique. There are many factors which influence water quality, especially THMs formations in water pipeline systems. In this study, the prediction models of THM formation in water pipeline systems are developed based on the independent variables proposed by American Water Works Association(AWWA). Multiple linear/nonlinear regression models are estimated and three layer feed-forward artificial neural networks have been used to predict the THM formation in a water pipeline system. Input parameters of the models consist of organic compounds measured in water pipeline systems such as TOC, DOC and UV254. Also, the reaction time to each measuring site along pipeline is used as input parameter calculated by a hydraulic analysis. Using these variables as model parameters, four models are developed. And the predicted results from the four developed models are compared statistically to the measured THMs data set. It is shown that the artificial neural network approaches are much superior to the conventional regression approaches and that the developed models by neural network can be used more efficiently and reproduce more accurately the THMs formation in water pipeline systems, than the conventional regression methods proposed by AWWA.

  • PDF

통계적 축소법을 이용한 한반도 인근해역의 미래 표층수온 추정 (Prediction of Future Sea Surface Temperature around the Korean Peninsular based on Statistical Downscaling)

  • 함희정;김상수;윤우석
    • 산업기술연구
    • /
    • 제31권B호
    • /
    • pp.107-112
    • /
    • 2011
  • Recently, climate change around the world due to global warming has became an important issue and damages by climate change have a bad effect on human life. Changes of Sea Surface Temperature(SST) is associated with natural disaster such as Typhoon and El Nino. So we predicted daily future SST using Statistical Downscaling Method and CGCM 3.1 A1B scenario. 9 points of around Korea peninsular were selected to predict future SST and built up a regression model using Multiple Linear Regression. CGCM 3.1 was simulated with regression model, and that comparing Probability Density Function, Box-Plot, and statistical data to evaluate suitability of regression models, it was validated that regression models were built up properly.

  • PDF

4지 신호교차로의 측면접촉사고 특성 및 사고모형 - 청주시를 사례로 - (Characteristics and Models of the Side-swipe Accident in the Case of Cheongju 4-legged Signalized Intersections)

  • 박상혁;김태영;박병호
    • 한국도로학회논문집
    • /
    • 제11권4호
    • /
    • pp.41-47
    • /
    • 2009
  • 본 연구는 청주시 4지 신호교차로의 측면접촉사고를 다루고 있다. 연구의 목적은 측면접촉사고의 특성을 분석하고 관련모형을 개발하는데 있다. 이를 위해 이 연구에서는 적절한 모형의 방법론을 찾는데 중점을 두고 있다.주요 결과는 다음과 같다. 첫째, 측면접촉사고에서 부상사고는 물피사고의 약 2배 이상으로 교차로 내에서 사고가 가장 많이 일어나는 것으로 평가되었다. 아울러 측면접촉사고는 대부분 승용차 관련 사고이며, 안전운행 불이행으로 인한 것으로 분석되었다. 둘째, 다중선형회귀모형이 다중비선형회귀모형보다 통계적으로 유의한 것으로 평가되었다. 또한 최적 모형은 종속변수가 사고건수인 모형으로 분석되었다. 본 연구에서 분석된 측면접촉사고의 요인은 교통량(ADT), 교차로 면적, 우회전 전용차로, 횡단보도 수, 주도로 제한속도, 최대종단경사 및 현시 수이다.

  • PDF

국내 로터리의 연령대별 사고모형 (Accident Models of Rotary by Age Group in Korea)

  • 박민규;박병호
    • 한국도로학회논문집
    • /
    • 제15권2호
    • /
    • pp.121-129
    • /
    • 2013
  • PURPOSES : This study deals with the traffic accidents of rotary in Korea. The objective of this study is to develop the accident models by age group based on the various data of rotaries. METHODS : In pursuing the above, this study gives particular attentions to classifying the accident data of 17 rotaries by age, collecting the data of geometric structure, traffic volume and others, and developing the models using SPSS 17.0 and EXCEL. RESULTS : First, 3 multiple linear regression models which were all statistically significant were developed. The value of model of under 30-49 age group were, however, evaluated to be 0.688 and be less than those of other models. Second, the most powerful variables were analyzed to be traffic volume in the model of under 30 age group, circulatory roadway width in the model of 30-49 age group, and the number of approach lane in the model of above 50 age group. Finally, the test results of accident models using RMSE were all evaluated to be fitted to the given data. CONCLUSIONS : This study propose install streetlights, speed humps and widen Circulatory as effective improvements for reduction of accident in rotary.

Ensemble variable selection using genetic algorithm

  • Seogyoung, Lee;Martin Seunghwan, Yang;Jongkyeong, Kang;Seung Jun, Shin
    • Communications for Statistical Applications and Methods
    • /
    • 제29권6호
    • /
    • pp.629-640
    • /
    • 2022
  • Variable selection is one of the most crucial tasks in supervised learning, such as regression and classification. The best subset selection is straightforward and optimal but not practically applicable unless the number of predictors is small. In this article, we propose directly solving the best subset selection via the genetic algorithm (GA), a popular stochastic optimization algorithm based on the principle of Darwinian evolution. To further improve the variable selection performance, we propose to run multiple GA to solve the best subset selection and then synthesize the results, which we call ensemble GA (EGA). The EGA significantly improves variable selection performance. In addition, the proposed method is essentially the best subset selection and hence applicable to a variety of models with different selection criteria. We compare the proposed EGA to existing variable selection methods under various models, including linear regression, Poisson regression, and Cox regression for survival data. Both simulation and real data analysis demonstrate the promising performance of the proposed method.

Developed multiple linear regression model using genetic algorithm for predicting top-bead width in GMA welding process

  • ;김일수;손준식;서주환
    • 대한용접접합학회:학술대회논문집
    • /
    • 대한용접접합학회 2006년 추계학술발표대회 개요집
    • /
    • pp.271-273
    • /
    • 2006
  • This paper focuses on the developed empirical models for the prediction on top-bead width in GMA(Gas Metal Arc) welding process. Three empirical models have been developed: linear, curvilinear and an intelligent model. Regression analysis was employed fur optimization of the coefficients of linear and curvilinear model, while Genetic Algorithm(GA) was utilized to estimate the coefficients of intelligent model. Not only the fitting of these models were checked, but also the prediction on top-bead width was carried out. ANOVA analysis and contour plots were respectively employed to represent main and interaction effects between process parameters on top-bead width.

  • PDF