• Title/Summary/Keyword: 능형회귀 분석

Search Result 15, Processing Time 0.022 seconds

回歸分析에 있어서의 多共線性과 名稱을 保全시키는 資料變換 技法

  • 兪浣
    • Journal of the Korean Statistical Society
    • /
    • v.8 no.2
    • /
    • pp.109-116
    • /
    • 1979
  • 두 개의 변수의 대체효과(substitution effect)를 연구하기 위하여 수요 또는 공급의 모형을 만들었을 경우 이에 관련된 변수들의 이름이 중요시 된다. 실제 관측 자료를 사용하였을 경우 흔히 일어나는 다공선성(multicollinearity) 문제를 다루기 위한 대안으로써 선형회귀선을 예로 들어 능형회귀기법(ridge regression technique)과 요인분석기법(factor analytic technique)을 소개하였으며 이에서 얻어지는 계수(coefficient)를 OLS 추정치로 설명하기 위하여 원래의 자료를 변환하였다. 실지 수요와 공급의 모형이 비선형일 경우 일반적으로 능형회귀나 요인분석을 쓰지 못한다는 점을 감안, 이러한 방법을 자료의 변환방법으로 설명함으로써 비선형모형에서도 다공선성문제를 위하여 능형회귀분석법이나 요인분석기법을 사용할 수 있도록 하였다.

  • PDF

Calibration of the Ridge Regression Model with the Genetic Algorithm:Study on the Regional Flood Frequency Analysis (유전알고리즘을 이용한 능형회귀모형의 검정 : 빈도별 홍수량의 지역분석을 대상으로)

  • Seong, Gi-Won
    • Journal of Korea Water Resources Association
    • /
    • v.31 no.1
    • /
    • pp.59-69
    • /
    • 1998
  • A regression model with basin physiographic characteristics as independent variables was calibrated for regional flood frequency analysis. In case that high correlations existing among the independent variables the ridge regression has been known to have capability of overcoming the problems of multicollinearity. To optimize the ridge regression model the cost function including regularization parameter must be minimized. In this research the genetic algorithm was applied on this optimization problem. The genetic algorithm is a stochastic search method that mimic the metaphor of natural biological heredity. Using this method the regression model could have optimized and stable weights of variables.

  • PDF

A study on semi-supervised kernel ridge regression estimation (준지도 커널능형회귀모형에 관한 연구)

  • Seok, Kyungha
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.2
    • /
    • pp.341-353
    • /
    • 2013
  • In many practical machine learning and data mining applications, unlabeled data are inexpensive and easy to obtain. Semi-supervised learning try to use such data to improve prediction performance. In this paper, a semi-supervised regression method, semi-supervised kernel ridge regression estimation, is proposed on the basis of kernel ridge regression model. The proposed method does not require a pilot estimation of the label of the unlabeled data. This means that the proposed method has good advantages including less number of parameters, easy computing and good generalization ability. Experiments show that the proposed method can effectively utilize unlabeled data to improve regression estimation.

Development of Regression Models Resolving High-Dimensional Data and Multicollinearity Problem for Heavy Rain Damage Data (호우피해자료에서의 고차원 자료 및 다중공선성 문제를 해소한 회귀모형 개발)

  • Kim, Jeonghwan;Park, Jihyun;Choi, Changhyun;Kim, Hung Soo
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.38 no.6
    • /
    • pp.801-808
    • /
    • 2018
  • The learning of the linear regression model is stable on the assumption that the sample size is sufficiently larger than the number of explanatory variables and there is no serious multicollinearity between explanatory variables. In this study, we investigated the difficulty of model learning when the assumption was violated by analyzing a real heavy rain damage data and we proposed to use a principal component regression model or a ridge regression model after integrating data to overcome the difficulty. We evaluated the predictive performance of the proposed models by using the test data independent from the training data, and confirmed that the proposed methods showed better predictive performances than the linear regression model.

A study on the properties of sensitivity analysis in principal component regression and latent root regression (주성분회귀와 고유값회귀에 대한 감도분석의 성질에 대한 연구)

  • Shin, Jae-Kyoung;Chang, Duk-Joon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.20 no.2
    • /
    • pp.321-328
    • /
    • 2009
  • In regression analysis, the ordinary least squares estimates of regression coefficients become poor, when the correlations among predictor variables are high. This phenomenon, which is called multicollinearity, causes serious problems in actual data analysis. To overcome this multicollinearity, many methods have been proposed. Ridge regression, shrinkage estimators and methods based on principal component analysis (PCA) such as principal component regression (PCR) and latent root regression (LRR). In the last decade, many statisticians discussed sensitivity analysis (SA) in ordinary multiple regression and same topic in PCR, LRR and logistic principal component regression (LPCR). In those methods PCA plays important role. Many statisticians discussed SA in PCA and related multivariate methods. We introduce the method of PCR and LRR. We also introduce the methods of SA in PCR and LRR, and discuss the properties of SA in PCR and LRR.

  • PDF

Robust ridge regression for nonlinear mixed effects models with applications to quantitative high throughput screening assay data (비선형 혼합효과모형에서의 로버스트 능형회귀 방법과 정량적 고속 대량 스크리닝 자료에의 응용)

  • Yoo, Jiseon;Lim, Changwon
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.1
    • /
    • pp.123-137
    • /
    • 2018
  • A nonlinear mixed effects model is mainly used to analyze repeated measurement data in various fields. A nonlinear mixed effects model consists of two stages: the first-stage individual-level model considers intra-individual variation and the second-stage population model considers inter-individual variation. The individual-level model, which is the first stage of the nonlinear mixed effects model, estimates the parameters of the nonlinear regression model. It is the same as the general nonlinear regression model, and usually estimates parameters using the least squares estimation method. However, the least squares estimation method may have a problem that the estimated value of the parameters and standard errors become extremely large if the assumed nonlinear function is not explicitly revealed by the data. In this paper, a new estimation method is proposed to solve this problem by introducing the ridge regression method recently proposed in the nonlinear regression model into the first-stage individual-level model of the nonlinear mixed effects model. The performance of the proposed estimator is compared with the performance with the standard estimator through a simulation study. The proposed methodology is also illustrated using quantitative high throughput screening data obtained from the US National Toxicology Program.

Using Ridge Regression to Improve the Accuracy and Interpretation of the Hedonic Pricing Model : Focusing on apartments in Guro-gu, Seoul (능형회귀분석을 활용한 부동산 헤도닉 가격모형의 정확성 및 해석력 향상에 관한 연구 - 서울시 구로구 아파트를 대상으로 -)

  • Koo, Bonsang;Shin, Byungjin
    • Korean Journal of Construction Engineering and Management
    • /
    • v.16 no.5
    • /
    • pp.77-85
    • /
    • 2015
  • The Hedonic Pricing model is the predominant approach used today to model the effect of relevant factors on real estate prices. These factors include intrinsic elements of a property such as floor areas, number of rooms, and parking spaces. Also, The model also accounts for the impact of amenities or undesirable facilities of a property's value. In the latter case, euclidean distances are typically used as the parameter to represent the proximity and its impact on prices. However, in situations where multiple facilities exist, multi-colinearity may exist between these parameters, which can result in multi-regression models with erroneous coefficients. This research uses Variance Inflation Factors(VIF) and Ridge Regression to identify these errors and thus create more accurate and stable models. The techniques were applied to apartments in Guro-gu of Seoul, whose prices are impacted by subway stations as well as a public prison, a railway terminal and a digital complex. The VIF identified colinearity between variables representing the terminal and the digital complex as well as the latitudinal coordinates. The ridge regression showed the need to remove two of these variables. The case study demonstrated that the application of these techniques were critical in developing accurate and robust Hedonic Pricing models.

A Derivation of a Hydrograph by Using Smoothed Dimensionless Unit Kernel Function (평활화된 무차원 단위핵함수를 이용한 단위도의 유도)

  • Seong, Kee-Won
    • Journal of Korea Water Resources Association
    • /
    • v.41 no.6
    • /
    • pp.559-564
    • /
    • 2008
  • A practical method is derived for determining the unit hydrograph and S-curve from complex storm events by using a smoothed unit kernel approach. The using a unit kernel yields more convenient way of constructing a unit hydrograph and its S-curve than a conventional method. However, with use of real data, the unit kernel oscillates and is unstable so that a unit hydrograph and S-curve cannot easily obtained. The use of non-parametric ridge regression with a Laplacian matrix is suggested for deriving an event averaged unit kernel which reduces the computational efforts when dealing with the Nash instantaneous unit hydrograph as a basis of the kernel. A method changing the unit hydrograph duration is also presented. The procedure shown in this work will play an efficient role when any unit hydrograph works is involved.

Hydrologic Response Estimation Using Mallows' $C_L$ Statistics (Mallows의 $C_L$ 통계량을 이용한 수문응답 추정)

  • Seong, Gi-Won;Sim, Myeong-Pil
    • Journal of Korea Water Resources Association
    • /
    • v.32 no.4
    • /
    • pp.437-445
    • /
    • 1999
  • The present paper describes the problem of hydrologic response estimation using non-parametric ridge regression method. The method adapted in this work is based on the minimization of the $C_L$ statistics, which is an estimate of the mean square prediction error. For this method, effects of using both the identity matrix and the Laplacian matrix were considered. In addition, we evaluated methods for estimating the error variance of the impulse response. As a result of analyzing synthetic and real data, a good estimation was made when the Laplacian matrix for the weighting matrix and the bias corrected estimate for the error variance were used. The method and procedure presented in present paper will play a robust and effective role on separating hydrologic response.

  • PDF

Prediction of golf scores on the PGA tour using statistical models (PGA 투어의 골프 스코어 예측 및 분석)

  • Lim, Jungeun;Lim, Youngin;Song, Jongwoo
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.1
    • /
    • pp.41-55
    • /
    • 2017
  • This study predicts the average scores of top 150 PGA golf players on 132 PGA Tour tournaments (2013-2015) using data mining techniques and statistical analysis. This study also aims to predict the Top 10 and Top 25 best players in 4 different playoffs. Linear and nonlinear regression methods were used to predict average scores. Stepwise regression, all best subset, LASSO, ridge regression and principal component regression were used for the linear regression method. Tree, bagging, gradient boosting, neural network, random forests and KNN were used for nonlinear regression method. We found that the average score increases as fairway firmness or green height or average maximum wind speed increases. We also found that the average score decreases as the number of one-putts or scrambling variable or longest driving distance increases. All 11 different models have low prediction error when predicting the average scores of PGA Tournaments in 2015 which is not included in the training set. However, the performances of Bagging and Random Forest models are the best among all models and these two models have the highest prediction accuracy when predicting the Top 10 and Top 25 best players in 4 different playoffs.