DOI QR코드

DOI QR Code

Comments on the regression coefficients

다중회귀에서 회귀계수 추정량의 특성

  • 강명욱 (숙명여자대학교 통계학과)
  • Received : 2021.06.02
  • Accepted : 2021.06.07
  • Published : 2021.08.31

Abstract

In simple and multiple regression, there is a difference in the meaning of regression coefficients, and not only are the estimates of regression coefficients different, but they also have different signs. Understanding the relative contribution of explanatory variables in a regression model is an important part of regression analysis. In a standardized regression model, the regression coefficient can be interpreted as the change in the response variable with respect to the standard deviation when the explanatory variable increases by the standard deviation in a situation where the values of the explanatory variables other than the corresponding explanatory variable are fixed. However, the size of the standardized regression coefficient is not a proper measure of the relative importance of each explanatory variable. In this paper, the estimator of the regression coefficient in multiple regression is expressed as a function of the correlation coefficient and the coefficient of determination. Furthermore, it is considered in terms of the effect of an additional explanatory variable and additional increase in the coefficient of determination. We also explore the relationship between estimates of regression coefficients and correlation coefficients in various plots. These results are specifically applied when there are two explanatory variables.

단순회귀와 다중회귀에서 회귀계수의 의미는 차이가 있고 회귀계수의 추정값은 같지 않을 뿐 아니라 그 부호가 서로 다른 경우도 발생한다. 회귀모형에서 설명변수의 상대적 기여도의 파악은 회귀분석의 수행의 중요한 부분이다. 표준화 회귀모형에서 표준화 회귀계수는 해당 설명변수를 제외한 나머지 설명변수의 값이 고정되어있는 상황에서 설명변수가 표준편차만큼 증가하였을 때 반응변수가 표준편차를 기준으로 얼마나 변화했는가로 해석할 수 있지만 표준화 회귀계수의 크기가 각 설명변수의 상대적 중요도를 나타내는 척도라고 할 수 없음은 잘 알려져 있다. 본 논문에서는 다중회귀에서 회귀계수의 추정량을 상관계수와 결정계수의 함수로 나타내고 이를 추가적인 설명력과 추가적인 결정계수의 관점에서 생각해 본다. 또한 다양한 산점도에서의 상관계수와 회귀계수 추정값의 관계를 알아보고 설명변수가 두 개인 경우에 구체적으로 적용해 본다.

Keywords

References

  1. Bring J (1994). How to standardize regression coefficients, The American Statistician, 48, 209-213. https://doi.org/10.2307/2684719
  2. Draper NR and Smith H (1998). Applied Regression Analysis(3rd ed.), Wiley, New York.
  3. Hamilton D (1987). Sometimes R2 > r2yx1 + r2yx2: correlated variables are not always redundant, The American Statistician, 41, 129-132. https://doi.org/10.2307/2684224
  4. Kahng M (2017). Some remarks on standardized regression coefficient, Journal of the Korean Data Analysis Society, 19, 151-158. https://doi.org/10.37727/jkdas.2017.19.1.151
  5. Kahng M, Kim Y, and Ahn CH (2000). A systematic view on residual plots, The Korean Communications in Statistics, 7, 37-46.
  6. Montgomery DC, Peck E, and Vining GG (2006). Introduction to Linear Regression Analysis(4th ed.), Wiley, Hoboken, NJ.
  7. Myers RH (1990). Classical and Modern Regression with Applications(2nd ed.), Duxbury Press, Belmont, CA.
  8. Packer PE (1951). An approach to watershed protection criteria, Journal of Forestry, 49, 638-644.
  9. Weisberg S (2014). Applied Linear Regression(4th ed.), Wiley, Hoboken, NJ.