• Title/Summary/Keyword: 능형회귀

Search Result 24, Processing Time 0.02 seconds

Using Ridge Regression to Improve the Accuracy and Interpretation of the Hedonic Pricing Model : Focusing on apartments in Guro-gu, Seoul (능형회귀분석을 활용한 부동산 헤도닉 가격모형의 정확성 및 해석력 향상에 관한 연구 - 서울시 구로구 아파트를 대상으로 -)

  • Koo, Bonsang;Shin, Byungjin
    • Korean Journal of Construction Engineering and Management
    • /
    • v.16 no.5
    • /
    • pp.77-85
    • /
    • 2015
  • The Hedonic Pricing model is the predominant approach used today to model the effect of relevant factors on real estate prices. These factors include intrinsic elements of a property such as floor areas, number of rooms, and parking spaces. Also, The model also accounts for the impact of amenities or undesirable facilities of a property's value. In the latter case, euclidean distances are typically used as the parameter to represent the proximity and its impact on prices. However, in situations where multiple facilities exist, multi-colinearity may exist between these parameters, which can result in multi-regression models with erroneous coefficients. This research uses Variance Inflation Factors(VIF) and Ridge Regression to identify these errors and thus create more accurate and stable models. The techniques were applied to apartments in Guro-gu of Seoul, whose prices are impacted by subway stations as well as a public prison, a railway terminal and a digital complex. The VIF identified colinearity between variables representing the terminal and the digital complex as well as the latitudinal coordinates. The ridge regression showed the need to remove two of these variables. The case study demonstrated that the application of these techniques were critical in developing accurate and robust Hedonic Pricing models.

A Ridge-type Estimator For Generalized Linear Models (일반화 선형모형에서의 능형형태의 추정량)

  • Byoung Jin Ahn
    • The Korean Journal of Applied Statistics
    • /
    • v.7 no.1
    • /
    • pp.75-82
    • /
    • 1994
  • It is known that collinearity among the explanatory variables in generalized linear models inflates the variance of maximum likelihood estimators. A ridge-type estimator is presented using penalized likelihood. A method for choosing a shrinkage parameter is discussed and this method is based on a prediction-oriented criterion, which is Mallow's $C_L$ statistic in a linear regression setting.

  • PDF

A study on the properties of sensitivity analysis in principal component regression and latent root regression (주성분회귀와 고유값회귀에 대한 감도분석의 성질에 대한 연구)

  • Shin, Jae-Kyoung;Chang, Duk-Joon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.20 no.2
    • /
    • pp.321-328
    • /
    • 2009
  • In regression analysis, the ordinary least squares estimates of regression coefficients become poor, when the correlations among predictor variables are high. This phenomenon, which is called multicollinearity, causes serious problems in actual data analysis. To overcome this multicollinearity, many methods have been proposed. Ridge regression, shrinkage estimators and methods based on principal component analysis (PCA) such as principal component regression (PCR) and latent root regression (LRR). In the last decade, many statisticians discussed sensitivity analysis (SA) in ordinary multiple regression and same topic in PCR, LRR and logistic principal component regression (LPCR). In those methods PCA plays important role. Many statisticians discussed SA in PCA and related multivariate methods. We introduce the method of PCR and LRR. We also introduce the methods of SA in PCR and LRR, and discuss the properties of SA in PCR and LRR.

  • PDF

Penalized logistic regression models for determining the discharge of dyspnea patients (호흡곤란 환자 퇴원 결정을 위한 벌점 로지스틱 회귀모형)

  • Park, Cheolyong;Kye, Myo Jin
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.1
    • /
    • pp.125-133
    • /
    • 2013
  • In this paper, penalized binary logistic regression models are employed as statistical models for determining the discharge of 668 patients with a chief complaint of dyspnea based on 11 blood tests results. Specifically, the ridge model based on $L^2$ penalty and the Lasso model based on $L^1$ penalty are considered in this paper. In the comparison of prediction accuracy, our models are compared with the logistic regression models with all 11 explanatory variables and the selected variables by variable selection method. The results show that the prediction accuracy of the ridge logistic regression model is the best among 4 models based on 10-fold cross-validation.

A Derivation of a Hydrograph by Using Smoothed Dimensionless Unit Kernel Function (평활화된 무차원 단위핵함수를 이용한 단위도의 유도)

  • Seong, Kee-Won
    • Journal of Korea Water Resources Association
    • /
    • v.41 no.6
    • /
    • pp.559-564
    • /
    • 2008
  • A practical method is derived for determining the unit hydrograph and S-curve from complex storm events by using a smoothed unit kernel approach. The using a unit kernel yields more convenient way of constructing a unit hydrograph and its S-curve than a conventional method. However, with use of real data, the unit kernel oscillates and is unstable so that a unit hydrograph and S-curve cannot easily obtained. The use of non-parametric ridge regression with a Laplacian matrix is suggested for deriving an event averaged unit kernel which reduces the computational efforts when dealing with the Nash instantaneous unit hydrograph as a basis of the kernel. A method changing the unit hydrograph duration is also presented. The procedure shown in this work will play an efficient role when any unit hydrograph works is involved.

Hydrologic Response Estimation Using Mallows' $C_L$ Statistics (Mallows의 $C_L$ 통계량을 이용한 수문응답 추정)

  • Seong, Gi-Won;Sim, Myeong-Pil
    • Journal of Korea Water Resources Association
    • /
    • v.32 no.4
    • /
    • pp.437-445
    • /
    • 1999
  • The present paper describes the problem of hydrologic response estimation using non-parametric ridge regression method. The method adapted in this work is based on the minimization of the $C_L$ statistics, which is an estimate of the mean square prediction error. For this method, effects of using both the identity matrix and the Laplacian matrix were considered. In addition, we evaluated methods for estimating the error variance of the impulse response. As a result of analyzing synthetic and real data, a good estimation was made when the Laplacian matrix for the weighting matrix and the bias corrected estimate for the error variance were used. The method and procedure presented in present paper will play a robust and effective role on separating hydrologic response.

  • PDF

A Study on Sensitivity Analysis in Ridge Regression (능형 회귀에서의 민감도 분석에 관한 연구)

  • Kim, Soon-Kwi
    • Journal of Korean Society for Quality Management
    • /
    • v.19 no.1
    • /
    • pp.1-15
    • /
    • 1991
  • In this paper, we discuss and review various measures which have been presented for studying outliers, high-leverage points, and influential observations when ridge regression estimation is adopted. We derive the influence function for ${\underline{\hat{\beta}}}\small{R}$, the ridge regression estimator, and discuss its various finite sample approximations when ridge regression is postulated. We also study several diagnostic measures such as Welsh-Kuh's distance, Cook's distance etc.

  • PDF

Prediction of golf scores on the PGA tour using statistical models (PGA 투어의 골프 스코어 예측 및 분석)

  • Lim, Jungeun;Lim, Youngin;Song, Jongwoo
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.1
    • /
    • pp.41-55
    • /
    • 2017
  • This study predicts the average scores of top 150 PGA golf players on 132 PGA Tour tournaments (2013-2015) using data mining techniques and statistical analysis. This study also aims to predict the Top 10 and Top 25 best players in 4 different playoffs. Linear and nonlinear regression methods were used to predict average scores. Stepwise regression, all best subset, LASSO, ridge regression and principal component regression were used for the linear regression method. Tree, bagging, gradient boosting, neural network, random forests and KNN were used for nonlinear regression method. We found that the average score increases as fairway firmness or green height or average maximum wind speed increases. We also found that the average score decreases as the number of one-putts or scrambling variable or longest driving distance increases. All 11 different models have low prediction error when predicting the average scores of PGA Tournaments in 2015 which is not included in the training set. However, the performances of Bagging and Random Forest models are the best among all models and these two models have the highest prediction accuracy when predicting the Top 10 and Top 25 best players in 4 different playoffs.

Construction of Delay Predictine Models on Freeway Ramp Junctions with 70mph Speed Limit (70mph 제한속도를 갖는 고속도로 진출입램프 접속부상의 지체예측모형 구축에 관한 연구)

  • 김정훈;김태곤
    • Proceedings of the Korean Institute of Navigation and Port Research Conference
    • /
    • 1999.10a
    • /
    • pp.131-140
    • /
    • 1999
  • Today freeway is experiencing a severe congestion with incoming or outgoing traffic through freeway ramps during the peak periods. Thus, the objectives of this study is to identify the traffic characteristics, analyze the relationships between the traffic characteristics and finally construct the delay predictive models on the ramp junctions of freeway with 70mph speed limit. From the traffic analyses, and model constructions and verifications for delay prediction on the ramp junctions of freeway, the following results were obtained: ⅰ) Traffic flow showed a big difference depending on the time periods. Especially, more traffic flows were concentrated on the freeway junctions in the morning peak period when compared with the afternoon peak period. ⅱ) The occupancy also showed a big difference depending on the time periods, and the downstream occupancy(Od) was especially shown to have a higher explanatory power for the delay predictive model construction on the ramp junction of freeway. ⅲ) The speed-occupancy curve showed a remarkable shift based on the occupancies observed ; Od < 9% and Od$\geq$9%. Especially, volume and occupancy were shown to be highly explanatory for delay prediction on the ramp junctions of freeway under Od$\geq$9%, but lowly for delay predicion on the ramp junctions of freeway under Od<9%. Rather, the driver characteristics or transportation conditions around the freeway were through to be a little higher explanatory for the delay perdiction under Od<9%. ⅳ) Integrated delay predictive models showed a higher explanatory power in the morning peak period, but a lower explanatory power in the non-peak periods.

Development of Ridge Regression Model of Pollutant Load Using Runoff Weighted Value Based on Distributed Curve-Number (분포형 CN 기반 토지피복별 유출가중치를 이용한 오염부하량 능형회귀모형 개발)

  • Song, Chul Min;Kim, Jin Soo
    • Journal of The Korean Society of Agricultural Engineers
    • /
    • v.60 no.1
    • /
    • pp.111-120
    • /
    • 2018
  • The purpose of this study was to develop a ridge regression (RR) model to estimate BOD and TP load using runoff weighted value. The concept of runoff weighted value, based on distributed curve-number (CN), was introduced to reflect the impact of land covers on runoff. The estimated runoff depths by distributed CN were closer to the observed values than those by area weighted mean CN. The RR is a technique used when the data suffers from multicollinearity. The RR model was developed for five flow duration intervals with the independent variables of daily runoff discharge of seven land covers and dependent variables of daily pollutant load. The RR model was applied to Heuk river watershed, a subwatershed of the Han river watershed. The variance inflation factors of the RR model decreased to the value less than 10. The RR model showed a good performance with Nash-Sutcliffe efficiency (NSE) of 0.73 and 0.87, and Pearson correlation coefficient of 0.88 and 0.93 for BOD and TP, respectively. The results suggest that the methods used in the study can be applied to estimate pollutant load of different land cover watersheds using limited data.