• Title/Summary/Keyword: Regression

Search Result 34,932, Processing Time 0.051 seconds

A Study on Stochastic Estimation of Monthly Runoff by Multiple Regression Analysis (다중회귀분석에 의한 하천 월 유출량의 추계학적 추정에 관한 연구)

  • 김태철;정하우
    • Magazine of the Korean Society of Agricultural Engineers
    • /
    • v.22 no.3
    • /
    • pp.75-87
    • /
    • 1980
  • Most hydro]ogic phenomena are the complex and organic products of multiple causations like climatic and hydro-geological factors. A certain significant correlation on the run-off in river basin would be expected and foreseen in advance, and the effect of each these causual and associated factors (independant variables; present-month rainfall, previous-month run-off, evapotranspiration and relative humidity etc.) upon present-month run-off(dependent variable) may be determined by multiple regression analysis. Functions between independant and dependant variables should be treated repeatedly until satisfactory and optimal combination of independant variables can be obtained. Reliability of the estimated function should be tested according to the result of statistical criterion such as analysis of variance, coefficient of determination and significance-test of regression coefficients before first estimated multiple regression model in historical sequence is determined. But some error between observed and estimated run-off is still there. The error arises because the model used is an inadequate description of the system and because the data constituting the record represent only a sample from a population of monthly discharge observation, so that estimates of model parameter will be subject to sampling errors. Since this error which is a deviation from multiple regression plane cannot be explained by first estimated multiple regression equation, it can be considered as a random error governed by law of chance in nature. This unexplained variance by multiple regression equation can be solved by stochastic approach, that is, random error can be stochastically simulated by multiplying random normal variate to standard error of estimate. Finally hybrid model on estimation of monthly run-off in nonhistorical sequence can be determined by combining the determistic component of multiple regression equation and the stochastic component of random errors. Monthly run-off in Naju station in Yong-San river basin is estimated by multiple regression model and hybrid model. And some comparisons between observed and estimated run-off and between multiple regression model and already-existing estimation methods such as Gajiyama formula, tank model and Thomas-Fiering model are done. The results are as follows. (1) The optimal function to estimate monthly run-off in historical sequence is multiple linear regression equation in overall-month unit, that is; Qn=0.788Pn+0.130Qn-1-0.273En-0.1 About 85% of total variance of monthly runoff can be explained by multiple linear regression equation and its coefficient of determination (R2) is 0.843. This means we can estimate monthly runoff in historical sequence highly significantly with short data of observation by above mentioned equation. (2) The optimal function to estimate monthly runoff in nonhistorical sequence is hybrid model combined with multiple linear regression equation in overall-month unit and stochastic component, that is; Qn=0. 788Pn+0. l30Qn-1-0. 273En-0. 10+Sy.t The rest 15% of unexplained variance of monthly runoff can be explained by addition of stochastic process and a bit more reliable results of statistical characteristics of monthly runoff in non-historical sequence are derived. This estimated monthly runoff in non-historical sequence shows up the extraordinary value (maximum, minimum value) which is not appeared in the observed runoff as a random component. (3) "Frequency best fit coefficient" (R2f) of multiple linear regression equation is 0.847 which is the same value as Gaijyama's one. This implies that multiple linear regression equation and Gajiyama formula are theoretically rather reasonable functions.

  • PDF

A study on the properties of sensitivity analysis in principal component regression and latent root regression (주성분회귀와 고유값회귀에 대한 감도분석의 성질에 대한 연구)

  • Shin, Jae-Kyoung;Chang, Duk-Joon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.20 no.2
    • /
    • pp.321-328
    • /
    • 2009
  • In regression analysis, the ordinary least squares estimates of regression coefficients become poor, when the correlations among predictor variables are high. This phenomenon, which is called multicollinearity, causes serious problems in actual data analysis. To overcome this multicollinearity, many methods have been proposed. Ridge regression, shrinkage estimators and methods based on principal component analysis (PCA) such as principal component regression (PCR) and latent root regression (LRR). In the last decade, many statisticians discussed sensitivity analysis (SA) in ordinary multiple regression and same topic in PCR, LRR and logistic principal component regression (LPCR). In those methods PCA plays important role. Many statisticians discussed SA in PCA and related multivariate methods. We introduce the method of PCR and LRR. We also introduce the methods of SA in PCR and LRR, and discuss the properties of SA in PCR and LRR.

  • PDF

An Application of a New Two-Way Regression Model for Rating Curves (수위-유량관계식에 새로운 양방향 회귀모형의 적용)

  • Lee, Chang-Hae
    • Journal of Korea Water Resources Association
    • /
    • v.41 no.1
    • /
    • pp.17-25
    • /
    • 2008
  • Whether rating curves are used in practice or new ones are derived, the characteristics of regression analysis are often neglected. For example, a discharge rating curve, which is established from a regression of observed water levels (H) on observed flowrates(Q), is sometimes used for estimating a design water level corresponding to a simulated design flood runoff. However, if independent and dependent variables are changed with each other, the regression equation is changed in existing regression analysis, which is derived from vertical errors between observed data and regression line. Thus, regression equations should not be applied inversely. To avoid this problem, A new two-way variable least-squares regression analysis is proposed. The new method was applied to the rating curves of five water level stations on main stream of Nakdong River. The three kinds of regression models, which are respectively regression of Q versus H (model 1), H versus Q (model 2) and two-way (model 3), showed that the new method can reduce inadvertent mistakes when applied in practice.

Evaluation of Regression Models in LOADEST to Estimate Suspended Solid Load in Hangang Waterbody (한강수계에서의 부유사 예측을 위한 LOADEST 모형의 회귀식의 평가)

  • Park, Youn Shik;Lee, Ji Min;Jung, Younghun;Shin, Min Hwan;Park, Ji Hyung;Hwang, Hasun;Ryu, Jichul;Park, Jangho;Kim, Ki-Sung
    • Journal of The Korean Society of Agricultural Engineers
    • /
    • v.57 no.2
    • /
    • pp.37-45
    • /
    • 2015
  • Typically, water quality sampling takes place intermittently since sample collection and following analysis requires substantial cost and efforts. Therefore regression models (or rating curves) are often used to interpolate water quality data. LOADEST has nine regression models to estimate water quality data, and one regression model needs to be selected automatically or manually. The nine regression models in LOADEST and auto-selection by LOADEST were evaluated in the study. Suspended solids data were collected from forty-nine stations from the Water Information System of the Ministry of Environment. Suspended solid data from each station was divided into two groups for calibration and validation. Nash-Stucliffe efficiency (NSE) and coefficient of determination ($R_2$) were used to evaluate estimated suspended solid loads. The regression models numbered 1 and 3 in LOADEST provided higher NSE and $R_2$, compared to the other regression models. The regression modes numbered 2, 5, 6, 8, and 9 in LOADEST provided low NSE. In addition, the regression model selected by LOADEST did not necessarily provide better suspended solid estimations than the other regression models did.

Estimating excess post-exercise oxygen consumption using multiple linear regression in healthy Korean adults: a pilot study

  • Jung, Won-Sang;Park, Hun-Young;Kim, Sung-Woo;Kim, Jisu;Hwang, Hyejung;Lim, Kiwon
    • Korean Journal of Exercise Nutrition
    • /
    • v.25 no.1
    • /
    • pp.35-41
    • /
    • 2021
  • [Purpose] This pilot study aimed to develop a regression model to estimate the excess post-exercise oxygen consumption (EPOC) of Korean adults using various easy-to-measure dependent variables. [Methods] The EPOC and dependent variables for its estimation (e.g., sex, age, height, weight, body mass index, fat-free mass [FFM], fat mass, % body fat, and heart rate_sum [HR_sum]) were measured in 75 healthy adults (31 males, 44 females). Statistical analysis was performed to develop an EPOC estimation regression model using the stepwise regression method. [Results] We confirmed that FFM and HR_sum were important variables in the EPOC regression models of various exercise types. The explanatory power and standard errors of estimates (SEE) for EPOC of each exercise type were as follows: the continuous exercise (CEx) regression model was 86.3% (R2) and 85.9% (adjusted R2), and the mean SEE was 11.73 kcal, interval exercise (IEx) regression model was 83.1% (R2) and 82.6% (adjusted R2), while the mean SEE was 13.68 kcal, and the accumulation of short-duration exercise (AEx) regression models was 91.3% (R2) and 91.0% (adjusted R2), while the mean SEE was 27.71 kcal. There was no significant difference between the measured EPOC using a metabolic gas analyzer and the predicted EPOC for each exercise type. [Conclusion] This pilot study developed a regression model to estimate EPOC in healthy Korean adults. The regression model was as follows: CEx = -37.128 + 1.003 × (FFM) + 0.016 × (HR_sum), IEx = -49.265 + 1.442 × (FFM) + 0.013 × (HR_sum), and AEx = -100.942 + 2.209 × (FFM) + 0.020 × (HR_sum).

Correlation Analysis of Reservoir Water Quality with respect to Land Use Types of Watersheds (유역 토지이용과 저수지 수질의 상관관계 분석)

  • Youn, Dong-Koun;Chung, Sang-Ok
    • Current Research on Agriculture and Life Sciences
    • /
    • v.24
    • /
    • pp.49-53
    • /
    • 2006
  • The objective of this study was to present regression equations between reservoir water quality and land use types of the watersheds. In order to derive regression equations, a multiple linear regression analysis was used using observed data from 88 reservoirs in the Kyungpook Provcince. The measured values of BOD, COD, T-N, and T-P were correlated with the areas of land use types. 23 regression equations were obtained for all the water quality items and watershed sizes. The results showed that 2 regression equations have the multiple correlation coefficient(MCC) above 0.90, 10 regression equations have the MCC values from 0.70 to 0.90, 9 equations have the MCC from 0.40 to 0.70, and 2 equations have the MCC from 0.20 to 0.40. The results of this study can be used to estimate reservoir water quality simply and quickly in the planning phase.

  • PDF

Improved Exact Inference in Logistic Regression Model

  • Kim, Donguk;Kim, Sooyeon
    • Communications for Statistical Applications and Methods
    • /
    • v.10 no.2
    • /
    • pp.277-289
    • /
    • 2003
  • We propose modified exact inferential methods in logistic regression model. Exact conditional distribution in logistic regression model is often highly discrete, and ordinary exact inference in logistic regression is conservative, because of the discreteness of the distribution. For the exact inference in logistic regression model we utilize the modified P-value. The modified P-value can not exceed the ordinary P-value, so the test of size $\alpha$ based on the modified P-value is less conservative. The modified exact confidence interval maintains at least a fixed confidence level but tends to be much narrower. The approach inverts results of a test with a modified P-value utilizing the test statistic and table probabilities in logistic regression model.

A Regression Test Selection and Prioritization Technique

  • Malhotra, Ruchika;Kaur, Arvinder;Singh, Yogesh
    • Journal of Information Processing Systems
    • /
    • v.6 no.2
    • /
    • pp.235-252
    • /
    • 2010
  • Regression testing is a very costly process performed primarily as a software maintenance activity. It is the process of retesting the modified parts of the software and ensuring that no new errors have been introduced into previously tested source code due to these modifications. A regression test selection technique selects an appropriate number of test cases from a test suite that might expose a fault in the modified program. In this paper, we propose both a regression test selection and prioritization technique. We implemented our regression test selection technique and demonstrated in two case studies that our technique is effective regarding selecting and prioritizing test cases. The results show that our technique may significantly reduce the number of test cases and thus the cost and resources for performing regression testing on modified software.

Robust Nonparametric Regression Method using Rank Transformation

    • Communications for Statistical Applications and Methods
    • /
    • v.7 no.2
    • /
    • pp.574-574
    • /
    • 2000
  • Consider the problem of estimating regression function from a set of data which is contaminated by a long-tailed error distribution. The linear smoother is a kind of a local weighted average of response, so it is not robust against outliers. The kernel M-smoother and the lowess attain robustness against outliers by down-weighting outliers. However, the kernel M-smoother and the lowess requires the iteration for computing the robustness weights, and as Wang and Scott(1994) pointed out, the requirement of iteration is not a desirable property. In this article, we propose the robust nonparametic regression method which does not require the iteration. Robustness can be achieved not only by down-weighting outliers but also by transforming outliers. The rank transformation is a simple procedure where the data are replaced by their corresponding ranks. Iman and Conover(1979) showed the fact that the rank transformation is a robust and powerful procedure in the linear regression. In this paper, we show that we can also use the rank transformation to nonparametric regression to achieve the robustness.

Robust Nonparametric Regression Method using Rank Transformation

  • Park, Dongryeon
    • Communications for Statistical Applications and Methods
    • /
    • v.7 no.2
    • /
    • pp.575-583
    • /
    • 2000
  • Consider the problem of estimating regression function from a set of data which is contaminated by a long-tailed error distribution. The linear smoother is a kind of a local weighted average of response, so it is not robust against outliers. The kernel M-smoother and the lowess attain robustness against outliers by down-weighting outliers. However, the kernel M-smoother and the lowess requires the iteration for computing the robustness weights, and as Wang and Scott(1994) pointed out, the requirement of iteration is not a desirable property. In this article, we propose the robust nonparametic regression method which does not require the iteration. Robustness can be achieved not only by down-weighting outliers but also by transforming outliers. The rank transformation is a simple procedure where the data are replaced by their corresponding ranks. Iman and Conover(1979) showed the fact that the rank transformation is a robust and powerful procedure in the linear regression. In this paper, we show that we can also use the rank transformation to nonparametric regression to achieve the robustness.

  • PDF