• 제목/요약/키워드: Regression

검색결과 34,948건 처리시간 0.048초

다중회귀분석에 의한 하천 월 유출량의 추계학적 추정에 관한 연구 (A Study on Stochastic Estimation of Monthly Runoff by Multiple Regression Analysis)

  • 김태철;정하우
    • 한국농공학회지
    • /
    • 제22권3호
    • /
    • pp.75-87
    • /
    • 1980
  • Most hydro]ogic phenomena are the complex and organic products of multiple causations like climatic and hydro-geological factors. A certain significant correlation on the run-off in river basin would be expected and foreseen in advance, and the effect of each these causual and associated factors (independant variables; present-month rainfall, previous-month run-off, evapotranspiration and relative humidity etc.) upon present-month run-off(dependent variable) may be determined by multiple regression analysis. Functions between independant and dependant variables should be treated repeatedly until satisfactory and optimal combination of independant variables can be obtained. Reliability of the estimated function should be tested according to the result of statistical criterion such as analysis of variance, coefficient of determination and significance-test of regression coefficients before first estimated multiple regression model in historical sequence is determined. But some error between observed and estimated run-off is still there. The error arises because the model used is an inadequate description of the system and because the data constituting the record represent only a sample from a population of monthly discharge observation, so that estimates of model parameter will be subject to sampling errors. Since this error which is a deviation from multiple regression plane cannot be explained by first estimated multiple regression equation, it can be considered as a random error governed by law of chance in nature. This unexplained variance by multiple regression equation can be solved by stochastic approach, that is, random error can be stochastically simulated by multiplying random normal variate to standard error of estimate. Finally hybrid model on estimation of monthly run-off in nonhistorical sequence can be determined by combining the determistic component of multiple regression equation and the stochastic component of random errors. Monthly run-off in Naju station in Yong-San river basin is estimated by multiple regression model and hybrid model. And some comparisons between observed and estimated run-off and between multiple regression model and already-existing estimation methods such as Gajiyama formula, tank model and Thomas-Fiering model are done. The results are as follows. (1) The optimal function to estimate monthly run-off in historical sequence is multiple linear regression equation in overall-month unit, that is; Qn=0.788Pn+0.130Qn-1-0.273En-0.1 About 85% of total variance of monthly runoff can be explained by multiple linear regression equation and its coefficient of determination (R2) is 0.843. This means we can estimate monthly runoff in historical sequence highly significantly with short data of observation by above mentioned equation. (2) The optimal function to estimate monthly runoff in nonhistorical sequence is hybrid model combined with multiple linear regression equation in overall-month unit and stochastic component, that is; Qn=0. 788Pn+0. l30Qn-1-0. 273En-0. 10+Sy.t The rest 15% of unexplained variance of monthly runoff can be explained by addition of stochastic process and a bit more reliable results of statistical characteristics of monthly runoff in non-historical sequence are derived. This estimated monthly runoff in non-historical sequence shows up the extraordinary value (maximum, minimum value) which is not appeared in the observed runoff as a random component. (3) "Frequency best fit coefficient" (R2f) of multiple linear regression equation is 0.847 which is the same value as Gaijyama's one. This implies that multiple linear regression equation and Gajiyama formula are theoretically rather reasonable functions.

  • PDF

주성분회귀와 고유값회귀에 대한 감도분석의 성질에 대한 연구 (A study on the properties of sensitivity analysis in principal component regression and latent root regression)

  • 신재경;장덕준
    • Journal of the Korean Data and Information Science Society
    • /
    • 제20권2호
    • /
    • pp.321-328
    • /
    • 2009
  • 회귀분석에서 설명변수들 사이에 상관이 높으면 최소제곱추정법에서 구한 회귀계수들의 정도가 떨어진다. 다중공선성이라 불리는 이 현상은 실제 자료분석에서 심각한 문제를 야기시킨다. 이 다중공선성의 문제를 극복하기 위한 여러 가지 방법이 제안되었다. 능형회귀, 축소추정량 그리고 주성분분석에 기초한 주성분회귀와 고유값회귀등이 있다. 지난 수십 년간 많은 통계학자들은 일반적인 중 회귀에서 감도분석에 관해 연구하였으며, 주성분회귀, 고유값회귀와 로지스틱 주성분회귀에 대해서도 같은 주제로 연구하였다. 이 모든 방법에서 주성분분석은 중요한 역할을 하였다. 또한, 많은 통계학자들이 주성분분석과 관련된 다변량 방법에서 감도분석에 대해 연구를 하였다. 본 연구논문에서는 주성분회귀와 고유값회귀를 소개하고, 또한 주성분회귀와 고유값회귀에서 감도분석의 방법을 소개하고, 마지막으로 이들두방법에 대한 감도분석의 성질에 대해 논의하였다.

  • PDF

수위-유량관계식에 새로운 양방향 회귀모형의 적용 (An Application of a New Two-Way Regression Model for Rating Curves)

  • 이창해
    • 한국수자원학회논문집
    • /
    • 제41권1호
    • /
    • pp.17-25
    • /
    • 2008
  • 수위-유량관계식의 유도와 실무적용에 있어 통상적으로 회귀분석의 특성을 간과하고 사용하는 경우가 종종 발생한다. 예를 들어 실무에서는 관측수위로부터 관측유량으로 회귀분석되어 만들어진 수위-유량관계식을 홍수모형으로부터 모의된 설계홍수유출량으로부터 설계홍수위를 환산하는데 사용되기도 한다. 그러나 독립과 종속변수가 서로 바뀌면, 관측치와 회귀식간 연직거리의 잔차들로부터 유도된 기존의 회귀분석에 의하여, 회귀식이 서로 달라지기 때문에 역으로 적용하여서는 안 된다. 본 연구에서는 이런 문제점을 해결하기위해 회귀식의 변수들을 상호 교환할 수 있는 최소자승 회귀분석의 새로운 알고리즘을 제안하였다. 새로운 방법을 낙동강유역의 본류 5개 수위표지점의 수위-유량관계식에 대하여 적용하였다. 3가지 회귀식이 유도되었는데, 이들은 각각 수위로부터 유량으로(model 1), 유량으로부터 수위로(model 2) 그리고 양방향(model 3)으로 유도된 수위-유량관계식을 비교하여 실무에서 잘못 적용되는 실수를 줄일 수 있는 새로운 방법을 제시하였다.

한강수계에서의 부유사 예측을 위한 LOADEST 모형의 회귀식의 평가 (Evaluation of Regression Models in LOADEST to Estimate Suspended Solid Load in Hangang Waterbody)

  • 박윤식;이지민;정영훈;신민환;박지형;황하선;류지철;박장호;김기성
    • 한국농공학회논문집
    • /
    • 제57권2호
    • /
    • pp.37-45
    • /
    • 2015
  • Typically, water quality sampling takes place intermittently since sample collection and following analysis requires substantial cost and efforts. Therefore regression models (or rating curves) are often used to interpolate water quality data. LOADEST has nine regression models to estimate water quality data, and one regression model needs to be selected automatically or manually. The nine regression models in LOADEST and auto-selection by LOADEST were evaluated in the study. Suspended solids data were collected from forty-nine stations from the Water Information System of the Ministry of Environment. Suspended solid data from each station was divided into two groups for calibration and validation. Nash-Stucliffe efficiency (NSE) and coefficient of determination ($R_2$) were used to evaluate estimated suspended solid loads. The regression models numbered 1 and 3 in LOADEST provided higher NSE and $R_2$, compared to the other regression models. The regression modes numbered 2, 5, 6, 8, and 9 in LOADEST provided low NSE. In addition, the regression model selected by LOADEST did not necessarily provide better suspended solid estimations than the other regression models did.

Estimating excess post-exercise oxygen consumption using multiple linear regression in healthy Korean adults: a pilot study

  • Jung, Won-Sang;Park, Hun-Young;Kim, Sung-Woo;Kim, Jisu;Hwang, Hyejung;Lim, Kiwon
    • 운동영양학회지
    • /
    • 제25권1호
    • /
    • pp.35-41
    • /
    • 2021
  • [Purpose] This pilot study aimed to develop a regression model to estimate the excess post-exercise oxygen consumption (EPOC) of Korean adults using various easy-to-measure dependent variables. [Methods] The EPOC and dependent variables for its estimation (e.g., sex, age, height, weight, body mass index, fat-free mass [FFM], fat mass, % body fat, and heart rate_sum [HR_sum]) were measured in 75 healthy adults (31 males, 44 females). Statistical analysis was performed to develop an EPOC estimation regression model using the stepwise regression method. [Results] We confirmed that FFM and HR_sum were important variables in the EPOC regression models of various exercise types. The explanatory power and standard errors of estimates (SEE) for EPOC of each exercise type were as follows: the continuous exercise (CEx) regression model was 86.3% (R2) and 85.9% (adjusted R2), and the mean SEE was 11.73 kcal, interval exercise (IEx) regression model was 83.1% (R2) and 82.6% (adjusted R2), while the mean SEE was 13.68 kcal, and the accumulation of short-duration exercise (AEx) regression models was 91.3% (R2) and 91.0% (adjusted R2), while the mean SEE was 27.71 kcal. There was no significant difference between the measured EPOC using a metabolic gas analyzer and the predicted EPOC for each exercise type. [Conclusion] This pilot study developed a regression model to estimate EPOC in healthy Korean adults. The regression model was as follows: CEx = -37.128 + 1.003 × (FFM) + 0.016 × (HR_sum), IEx = -49.265 + 1.442 × (FFM) + 0.013 × (HR_sum), and AEx = -100.942 + 2.209 × (FFM) + 0.020 × (HR_sum).

유역 토지이용과 저수지 수질의 상관관계 분석 (Correlation Analysis of Reservoir Water Quality with respect to Land Use Types of Watersheds)

  • 윤동균;정상옥
    • Current Research on Agriculture and Life Sciences
    • /
    • 제24권
    • /
    • pp.49-53
    • /
    • 2006
  • The objective of this study was to present regression equations between reservoir water quality and land use types of the watersheds. In order to derive regression equations, a multiple linear regression analysis was used using observed data from 88 reservoirs in the Kyungpook Provcince. The measured values of BOD, COD, T-N, and T-P were correlated with the areas of land use types. 23 regression equations were obtained for all the water quality items and watershed sizes. The results showed that 2 regression equations have the multiple correlation coefficient(MCC) above 0.90, 10 regression equations have the MCC values from 0.70 to 0.90, 9 equations have the MCC from 0.40 to 0.70, and 2 equations have the MCC from 0.20 to 0.40. The results of this study can be used to estimate reservoir water quality simply and quickly in the planning phase.

  • PDF

Improved Exact Inference in Logistic Regression Model

  • Kim, Donguk;Kim, Sooyeon
    • Communications for Statistical Applications and Methods
    • /
    • 제10권2호
    • /
    • pp.277-289
    • /
    • 2003
  • We propose modified exact inferential methods in logistic regression model. Exact conditional distribution in logistic regression model is often highly discrete, and ordinary exact inference in logistic regression is conservative, because of the discreteness of the distribution. For the exact inference in logistic regression model we utilize the modified P-value. The modified P-value can not exceed the ordinary P-value, so the test of size $\alpha$ based on the modified P-value is less conservative. The modified exact confidence interval maintains at least a fixed confidence level but tends to be much narrower. The approach inverts results of a test with a modified P-value utilizing the test statistic and table probabilities in logistic regression model.

A Regression Test Selection and Prioritization Technique

  • Malhotra, Ruchika;Kaur, Arvinder;Singh, Yogesh
    • Journal of Information Processing Systems
    • /
    • 제6권2호
    • /
    • pp.235-252
    • /
    • 2010
  • Regression testing is a very costly process performed primarily as a software maintenance activity. It is the process of retesting the modified parts of the software and ensuring that no new errors have been introduced into previously tested source code due to these modifications. A regression test selection technique selects an appropriate number of test cases from a test suite that might expose a fault in the modified program. In this paper, we propose both a regression test selection and prioritization technique. We implemented our regression test selection technique and demonstrated in two case studies that our technique is effective regarding selecting and prioritizing test cases. The results show that our technique may significantly reduce the number of test cases and thus the cost and resources for performing regression testing on modified software.

Robust Nonparametric Regression Method using Rank Transformation

    • Communications for Statistical Applications and Methods
    • /
    • 제7권2호
    • /
    • pp.574-574
    • /
    • 2000
  • Consider the problem of estimating regression function from a set of data which is contaminated by a long-tailed error distribution. The linear smoother is a kind of a local weighted average of response, so it is not robust against outliers. The kernel M-smoother and the lowess attain robustness against outliers by down-weighting outliers. However, the kernel M-smoother and the lowess requires the iteration for computing the robustness weights, and as Wang and Scott(1994) pointed out, the requirement of iteration is not a desirable property. In this article, we propose the robust nonparametic regression method which does not require the iteration. Robustness can be achieved not only by down-weighting outliers but also by transforming outliers. The rank transformation is a simple procedure where the data are replaced by their corresponding ranks. Iman and Conover(1979) showed the fact that the rank transformation is a robust and powerful procedure in the linear regression. In this paper, we show that we can also use the rank transformation to nonparametric regression to achieve the robustness.

Robust Nonparametric Regression Method using Rank Transformation

  • Park, Dongryeon
    • Communications for Statistical Applications and Methods
    • /
    • 제7권2호
    • /
    • pp.575-583
    • /
    • 2000
  • Consider the problem of estimating regression function from a set of data which is contaminated by a long-tailed error distribution. The linear smoother is a kind of a local weighted average of response, so it is not robust against outliers. The kernel M-smoother and the lowess attain robustness against outliers by down-weighting outliers. However, the kernel M-smoother and the lowess requires the iteration for computing the robustness weights, and as Wang and Scott(1994) pointed out, the requirement of iteration is not a desirable property. In this article, we propose the robust nonparametic regression method which does not require the iteration. Robustness can be achieved not only by down-weighting outliers but also by transforming outliers. The rank transformation is a simple procedure where the data are replaced by their corresponding ranks. Iman and Conover(1979) showed the fact that the rank transformation is a robust and powerful procedure in the linear regression. In this paper, we show that we can also use the rank transformation to nonparametric regression to achieve the robustness.

  • PDF