• Title/Summary/Keyword: regression models

Search Result 3,656, Processing Time 0.033 seconds

Evaluation of Regression Models in LOADEST to Estimate Suspended Solid Load in Hangang Waterbody (한강수계에서의 부유사 예측을 위한 LOADEST 모형의 회귀식의 평가)

  • Park, Youn Shik;Lee, Ji Min;Jung, Younghun;Shin, Min Hwan;Park, Ji Hyung;Hwang, Hasun;Ryu, Jichul;Park, Jangho;Kim, Ki-Sung
    • Journal of The Korean Society of Agricultural Engineers
    • /
    • v.57 no.2
    • /
    • pp.37-45
    • /
    • 2015
  • Typically, water quality sampling takes place intermittently since sample collection and following analysis requires substantial cost and efforts. Therefore regression models (or rating curves) are often used to interpolate water quality data. LOADEST has nine regression models to estimate water quality data, and one regression model needs to be selected automatically or manually. The nine regression models in LOADEST and auto-selection by LOADEST were evaluated in the study. Suspended solids data were collected from forty-nine stations from the Water Information System of the Ministry of Environment. Suspended solid data from each station was divided into two groups for calibration and validation. Nash-Stucliffe efficiency (NSE) and coefficient of determination ($R_2$) were used to evaluate estimated suspended solid loads. The regression models numbered 1 and 3 in LOADEST provided higher NSE and $R_2$, compared to the other regression models. The regression modes numbered 2, 5, 6, 8, and 9 in LOADEST provided low NSE. In addition, the regression model selected by LOADEST did not necessarily provide better suspended solid estimations than the other regression models did.

Special-Days Load Handling Method using Neural Networks and Regression Models (신경회로망과 회귀모형을 이용한 특수일 부하 처리 기법)

  • 고희석;이세훈;이충식
    • Journal of the Korean Institute of Illuminating and Electrical Installation Engineers
    • /
    • v.16 no.2
    • /
    • pp.98-103
    • /
    • 2002
  • In case of power demand forecasting, the most important problems are to deal with the load of special-days. Accordingly, this paper presents the method that forecasting long (the Lunar New Year, the Full Moon Festival) and short(the Planting Trees Day, the Memorial Day, etc) special-days peak load using neural networks and regression models. long and short special-days peak load forecast by neural networks models uses pattern conversion ratio and four-order orthogonal polynomials regression models. There are using that special-days peak load data during ten years(1985∼1994). In the result of special-days peak load forecasting, forecasting % error shows good results as about 1 ∼2[%] both neural networks models and four-order orthogonal polynomials regression models. Besides, from the result of analysis of adjusted coefficient of determination and F-test, the significance of the are convinced four-order orthogonal polynomials regression models. When the neural networks models are compared with the four-order orthogonal polynomials regression models at a view of the results of special-days peak load forecasting, the neural networks models which uses pattern conversion ratio are more effective on forecasting long special-days peak load. On the other hand, in case of forecasting short special-days peak load, both are valid.

Tilted beta regression and beta-binomial regression models: Mean and variance modeling

  • Edilberto Cepeda-Cuervo
    • Communications for Statistical Applications and Methods
    • /
    • v.31 no.3
    • /
    • pp.263-277
    • /
    • 2024
  • This paper proposes new parameterizations of the tilted beta binomial distribution, obtained from the combination of the binomial distribution and the tilted beta distribution, where the beta component of the mixture is parameterized as a function of their mean and variance. These new parameterized distributions include as particular cases the beta rectangular binomial and the beta binomial distributions. After that, we propose new linear regression models to deal with overdispersed binomial datasets. These new models are defined from the proposed new parameterization of the tilted beta binomial distribution, and assume regression structures for the mean and variance parameters. These new linear regression models are fitted by applying Bayesian methods and using the OpenBUGS software. The proposed regression models are fitted to a school absenteeism dataset and to the seeds germination rate according to the type seed and root.

Multiple Deletions in Logistic Regression Models

  • Jung, Kang-Mo
    • Communications for Statistical Applications and Methods
    • /
    • v.16 no.2
    • /
    • pp.309-315
    • /
    • 2009
  • We extended the results of Roy and Guria (2008) to multiple deletions in logistic regression models. Since single deletions may not exactly detect outliers or influential observations due to swamping effects and masking effects, it needs multiple deletions. We developed conditional deletion diagnostics which are designed to overcome problems of masking effects. We derived the closed forms for several statistics in logistic regression models. They give useful diagnostics on the statistics.

The strong consistency of the $L_1$-norm estimators in censored nonlinear regression models

  • Park, Seung-Hoe;Kim, Hae-Kyung
    • Bulletin of the Korean Mathematical Society
    • /
    • v.34 no.4
    • /
    • pp.573-581
    • /
    • 1997
  • This paper is concerned with the strong consistency of the $L_1$-norm estimators for the nonlinear regression models when dependent variables are subject to censoring, and provides the sufficient conditions which ensure the strong consistency of $L_1$-norm estimators of the censored regression models.

  • PDF

Development and Evaluation of Simple Regression Model and Multiple Regression Model for TOC Contentation Estimation in Stream Flow (하천수내 TOC 농도 추정을 위한 단순회귀모형과 다중회귀모형의 개발과 평가)

  • Jung, Jaewoon;Cho, Sohyun;Choi, Jinhee;Kim, Kapsoon;Jung, Soojung;Lim, Byungjin
    • Journal of Korean Society on Water Environment
    • /
    • v.29 no.5
    • /
    • pp.625-629
    • /
    • 2013
  • The objective of this study is to develop and evaluate simple and multiple regression models for Total Organic Carbon (TOC) concentration estimation in stream flow. For development (using water quality data in 2012) and evaluation (using water quality data in 2011) of regression models, we used water quality data from downstream of Yeongsan river basin during 2011 and 2012, and correlation analysis between TOC and water quality parameters was conducted. The concentrations of TOC were positively correlated with Chemical Oxygen Demand (COD), Biochemical Oxygen Demand (BOD), TN (Total Nitrogen), Water Temperature (WT) and Electric Conductivity (EC). From these results, simple and multiple regression models for TOC estimation were developed as follows : $TOC=0.5809{\times}BOD+3.1557$, $TOC=0.4365{\times}COD+1.3731$. As a result of the application evaluation of the developed regression models, the multiple regression model was found to estimate TOC better than simple regression models.

Robustness of model averaging methods for the violation of standard linear regression assumptions

  • Lee, Yongsu;Song, Juwon
    • Communications for Statistical Applications and Methods
    • /
    • v.28 no.2
    • /
    • pp.189-204
    • /
    • 2021
  • In a regression analysis, a single best model is usually selected among several candidate models. However, it is often useful to combine several candidate models to achieve better performance, especially, in the prediction viewpoint. Model combining methods such as stacking and Bayesian model averaging (BMA) have been suggested from the perspective of averaging candidate models. When the candidate models include a true model, it is expected that BMA generally gives better performance than stacking. On the other hand, when candidate models do not include the true model, it is known that stacking outperforms BMA. Since stacking and BMA approaches have different properties, it is difficult to determine which method is more appropriate under other situations. In particular, it is not easy to find research papers that compare stacking and BMA when regression model assumptions are violated. Therefore, in the paper, we compare the performance among model averaging methods as well as a single best model in the linear regression analysis when standard linear regression assumptions are violated. Simulations were conducted to compare model averaging methods with the linear regression when data include outliers and data do not include them. We also compared them when data include errors from a non-normal distribution. The model averaging methods were applied to the water pollution data, which have a strong multicollinearity among variables. Simulation studies showed that the stacking method tends to give better performance than BMA or standard linear regression analysis (including the stepwise selection method) in the sense of risks (see (3.1)) or prediction error (see (3.2)) when typical linear regression assumptions are violated.

Performance Comparison Analysis of Artificial Intelligence Models for Estimating Remaining Capacity of Lithium-Ion Batteries

  • Kyu-Ha Kim;Byeong-Soo Jung;Sang-Hyun Lee
    • International Journal of Advanced Culture Technology
    • /
    • v.11 no.3
    • /
    • pp.310-314
    • /
    • 2023
  • The purpose of this study is to predict the remaining capacity of lithium-ion batteries and evaluate their performance using five artificial intelligence models, including linear regression analysis, decision tree, random forest, neural network, and ensemble model. We is in the study, measured Excel data from the CS2 lithium-ion battery was used, and the prediction accuracy of the model was measured using evaluation indicators such as mean square error, mean absolute error, coefficient of determination, and root mean square error. As a result of this study, the Root Mean Square Error(RMSE) of the linear regression model was 0.045, the decision tree model was 0.038, the random forest model was 0.034, the neural network model was 0.032, and the ensemble model was 0.030. The ensemble model had the best prediction performance, with the neural network model taking second place. The decision tree model and random forest model also performed quite well, and the linear regression model showed poor prediction performance compared to other models. Therefore, through this study, ensemble models and neural network models are most suitable for predicting the remaining capacity of lithium-ion batteries, and decision tree and random forest models also showed good performance. Linear regression models showed relatively poor predictive performance. Therefore, it was concluded that it is appropriate to prioritize ensemble models and neural network models in order to improve the efficiency of battery management and energy systems.

An Approach to Applying Multiple Linear Regression Models by Interlacing Data in Classifying Similar Software

  • Lim, Hyun-il
    • Journal of Information Processing Systems
    • /
    • v.18 no.2
    • /
    • pp.268-281
    • /
    • 2022
  • The development of information technology is bringing many changes to everyday life, and machine learning can be used as a technique to solve a wide range of real-world problems. Analysis and utilization of data are essential processes in applying machine learning to real-world problems. As a method of processing data in machine learning, we propose an approach based on applying multiple linear regression models by interlacing data to the task of classifying similar software. Linear regression is widely used in estimation problems to model the relationship between input and output data. In our approach, multiple linear regression models are generated by training on interlaced feature data. A combination of these multiple models is then used as the prediction model for classifying similar software. Experiments are performed to evaluate the proposed approach as compared to conventional linear regression, and the experimental results show that the proposed method classifies similar software more accurately than the conventional model. We anticipate the proposed approach to be applied to various kinds of classification problems to improve the accuracy of conventional linear regression.

Development of Statistical Model and Neural Network Model for Tensile Strength Estimation in Laser Material Processing of Aluminum Alloy (알루미늄 합금의 레이저 가공에서 인장 강도 예측을 위한 회귀 모델 및 신경망 모델의 개발)

  • Park, Young-Whan;Rhee, Se-Hun
    • Journal of the Korean Society for Precision Engineering
    • /
    • v.24 no.4 s.193
    • /
    • pp.93-101
    • /
    • 2007
  • Aluminum alloy which is one of the light materials has been tried to apply to light weight vehicle body. In order to do that, welding technology is very important. In case of the aluminum laser welding, the strength of welded part is reduced due to porosity, underfill, and magnesium loss. To overcome these problems, laser welding of aluminum with filler wire was suggested. In this study, experiment about laser welding of AA5182 aluminum alloy with AA5356 filler wire was performed according to process parameters such as laser power, welding speed and wire feed rate. The tensile strength was measured to find the weldability of laser welding with filler wire. The models to estimate tensile strength were suggested using three regression models and one neural network model. For regression models, one was the multiple linear regression model, another was the second order polynomial regression model, and the other was the multiple nonlinear regression model. Neural network model with 2 hidden layers which had 5 and 3 nodes respectively was investigated to find the most suitable model for the system. Estimation performance was evaluated for each model using the average error rate. Among the three regression models, the second order polynomial regression model had the best estimation performance. For all models, neural network model has the best estimation performance.