• Title/Summary/Keyword: 다중 공선성

Search Result 120, Processing Time 0.024 seconds

Using Ridge Regression to Improve the Accuracy and Interpretation of the Hedonic Pricing Model : Focusing on apartments in Guro-gu, Seoul (능형회귀분석을 활용한 부동산 헤도닉 가격모형의 정확성 및 해석력 향상에 관한 연구 - 서울시 구로구 아파트를 대상으로 -)

  • Koo, Bonsang;Shin, Byungjin
    • Korean Journal of Construction Engineering and Management
    • /
    • v.16 no.5
    • /
    • pp.77-85
    • /
    • 2015
  • The Hedonic Pricing model is the predominant approach used today to model the effect of relevant factors on real estate prices. These factors include intrinsic elements of a property such as floor areas, number of rooms, and parking spaces. Also, The model also accounts for the impact of amenities or undesirable facilities of a property's value. In the latter case, euclidean distances are typically used as the parameter to represent the proximity and its impact on prices. However, in situations where multiple facilities exist, multi-colinearity may exist between these parameters, which can result in multi-regression models with erroneous coefficients. This research uses Variance Inflation Factors(VIF) and Ridge Regression to identify these errors and thus create more accurate and stable models. The techniques were applied to apartments in Guro-gu of Seoul, whose prices are impacted by subway stations as well as a public prison, a railway terminal and a digital complex. The VIF identified colinearity between variables representing the terminal and the digital complex as well as the latitudinal coordinates. The ridge regression showed the need to remove two of these variables. The case study demonstrated that the application of these techniques were critical in developing accurate and robust Hedonic Pricing models.

A study on the properties of sensitivity analysis in principal component regression and latent root regression (주성분회귀와 고유값회귀에 대한 감도분석의 성질에 대한 연구)

  • Shin, Jae-Kyoung;Chang, Duk-Joon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.20 no.2
    • /
    • pp.321-328
    • /
    • 2009
  • In regression analysis, the ordinary least squares estimates of regression coefficients become poor, when the correlations among predictor variables are high. This phenomenon, which is called multicollinearity, causes serious problems in actual data analysis. To overcome this multicollinearity, many methods have been proposed. Ridge regression, shrinkage estimators and methods based on principal component analysis (PCA) such as principal component regression (PCR) and latent root regression (LRR). In the last decade, many statisticians discussed sensitivity analysis (SA) in ordinary multiple regression and same topic in PCR, LRR and logistic principal component regression (LPCR). In those methods PCA plays important role. Many statisticians discussed SA in PCA and related multivariate methods. We introduce the method of PCR and LRR. We also introduce the methods of SA in PCR and LRR, and discuss the properties of SA in PCR and LRR.

  • PDF

Principal Components Logistic Regression based on Robust Estimation (로버스트추정에 바탕을 둔 주성분로지스틱회귀)

  • Kim, Bu-Yong;Kahng, Myung-Wook;Jang, Hea-Won
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.3
    • /
    • pp.531-539
    • /
    • 2009
  • Logistic regression is widely used as a datamining technique for the customer relationship management. The maximum likelihood estimator has highly inflated variance when multicollinearity exists among the regressors, and it is not robust against outliers. Thus we propose the robust principal components logistic regression to deal with both multicollinearity and outlier problem. A procedure is suggested for the selection of principal components, which is based on the condition index. When a condition index is larger than the cutoff value obtained from the model constructed on the basis of the conjoint analysis, the corresponding principal component is removed from the logistic model. In addition, we employ an algorithm for the robust estimation, which strives to dampen the effect of outliers by applying the appropriate weights and factors to the leverage points and vertical outliers identified by the V-mask type criterion. The Monte Carlo simulation results indicate that the proposed procedure yields higher rate of correct classification than the existing method.

Development of Regression Models Resolving High-Dimensional Data and Multicollinearity Problem for Heavy Rain Damage Data (호우피해자료에서의 고차원 자료 및 다중공선성 문제를 해소한 회귀모형 개발)

  • Kim, Jeonghwan;Park, Jihyun;Choi, Changhyun;Kim, Hung Soo
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.38 no.6
    • /
    • pp.801-808
    • /
    • 2018
  • The learning of the linear regression model is stable on the assumption that the sample size is sufficiently larger than the number of explanatory variables and there is no serious multicollinearity between explanatory variables. In this study, we investigated the difficulty of model learning when the assumption was violated by analyzing a real heavy rain damage data and we proposed to use a principal component regression model or a ridge regression model after integrating data to overcome the difficulty. We evaluated the predictive performance of the proposed models by using the test data independent from the training data, and confirmed that the proposed methods showed better predictive performances than the linear regression model.

Statistical review and explanation for Lanchester model (란체스터 모형에 대한 통계적 고찰과 해석)

  • Yoo, Byung Joo
    • The Korean Journal of Applied Statistics
    • /
    • v.33 no.3
    • /
    • pp.335-345
    • /
    • 2020
  • This paper deals with the problem of estimating the log-transformed linear regression model to fit actual battle data from the Ardennes Campaign of World War II into the Lanchester model. The problem of determining a global solution for parameters and multicollinearity problems are identified and modified by examining the results of previous studies on data. The least squares method requires attention because a local solution can be found rather than a global solution if considering a specific constraint or a limited candidate group. The method of exploring this multicollinearity problem can be confirmed by a statistic known as a variance inflation factor. Therefore, the Lanchester model is simplified to avoid these problems, and the combat power attrition rate model was proposed which is statistically significant and easy to explain. When fitting the model, the dependence problem between the data has occurred due to autocorrelation. Matters that might be underestimated or overestimated were resolved by the Cochrane-Orcutt method as well as guaranteeing independence and normality.

Defect Severity-based Dimension Reduction Model using PCA (PCA를 적용한 결함 심각도 기반 차원 축소 모델)

  • Kwon, Ki Tae;Lee, Na-Young
    • Journal of Software Assessment and Valuation
    • /
    • v.15 no.1
    • /
    • pp.79-86
    • /
    • 2019
  • Software dimension reduction identifies the commonality of elements and extracts important feature elements. So it reduces complexity by simplify and solves multi-collinearity problems. And it reduces redundancy by performing redundancy and noise detection. In this study, we proposed defect severity-based dimension reduction model. Proposed model is applied defect severity-based NASA dataset. And it is verified the number of dimensions in the column that affect the severity of the defect. Then it is compares and analyzes the dimensions of the data before and after reduction. In this study experiment result, the number of dimensions of PC4's dataset is 2 to 3. It was possible to reduce the dimension.

Multivariate Analysis for Clinicians (임상의를 위한 다변량 분석의 실제)

  • Oh, Joo Han;Chung, Seok Won
    • Clinics in Shoulder and Elbow
    • /
    • v.16 no.1
    • /
    • pp.63-72
    • /
    • 2013
  • In medical research, multivariate analysis, especially multiple regression analysis, is used to analyze the influence of multiple variables on the result. Multiple regression analysis should include variables in the model and the problem of multi-collinearity as there are many variables as well as the basic assumption of regression analysis. The multiple regression model is expressed as the coefficient of determination, $R^2$ and the influence of independent variables on result as a regression coefficient, ${\beta}$. Multiple regression analysis can be divided into multiple linear regression analysis, multiple logistic regression analysis, and Cox regression analysis according to the type of dependent variables (continuous variable, categorical variable (binary logit), and state variable, respectively), and the influence of variables on the result is evaluated by regression coefficient${\beta}$, odds ratio, and hazard ratio, respectively. The knowledge of multivariate analysis enables clinicians to analyze the result accurately and to design the further research efficiently.

Tributary Flood Forecasting Using Statistical Analysis Method (통계적 모형을 이용한 지천 홍수예측)

  • Sung, Ji-Youn;Heo, Jun-Haeng
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2009.05a
    • /
    • pp.1524-1527
    • /
    • 2009
  • 본 연구는 주요지천 홍수예측에 적용된 통계적 모형을 개선하여 예측 결과의 정확성 향상을 도모하는 데 목적이 있다. 중랑천, 탄천, 왕숙천 등 한강수계 주요 지천은 홍수예보 지점으로 유역면적이 작고 도달 시간이 짧아 기존의 대하천 홍수예보에 이용되고 있는 수문학적 홍수예측 모형을 적용하기에는 한계가 있다. 이러한 문제점을 해결하기 위해 주요 지천 홍수예측에 통계적 모형인 다중선형 회귀모형을 이용하는 방법이 제안되어 활용되었다. 본 연구에서는 지천홍수예측에 기 적용된 다중선형 회귀 모형의 다중공선성 문제를 해결하기 위해 독립변수를 조정하고, 10분 단위 관측 자료를 활용한 예측 결과를 얻기 위해 매개변수를 재산정하였다. 그 결과 기존 모형에 비해 적은 수의 독립변수와 재 산정된 매개변수를 이용한 통계적 모형으로 예측 수위의 오차를 줄일 수 있었다.

  • PDF

Robust selection rules of k in ridge regression (능형회귀에서의 로버스트한 k의 선택 방법)

  • 임용빈
    • The Korean Journal of Applied Statistics
    • /
    • v.6 no.2
    • /
    • pp.371-381
    • /
    • 1993
  • When the multicollinearity presents in the standard linear regression model, ridge regression might be used to mitigate the effects of collinearity. As the prediction-oriented criterion, the integrated mean sqare error criterion $J_w(k)$ was introduced by Lim, Choi & Park(1980). By noting the equivalent relationship between the $C_k$ criterion and $J_w(k)$ with a special choice of weight function $W(x)$, we propose a more reasonable selection rule of k w.r.t. the $C_k$ criterion than that given in Myers(1986). Next, to find the $\beta(k)$ which behaves reasonably well w.r.t. competing criteria, we adopt the minimax principle in the sense of maximizing the worst relative efficiency of k among competing criteria.

  • PDF

A Study on the Characteristics of Algae Occurrence in Lower Watershed of Nam River Dam by Using Multiple Regression Analysis (다중회귀분석을 이용한 남강댐 하류지역의 조류발생 특성 연구)

  • Jung, Woo Suk;Kim, Young Do
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2016.05a
    • /
    • pp.126-126
    • /
    • 2016
  • 남강은 낙동강 주요 지류인 동시에 낙동강 하류지역의 유지용수, 생활, 공업, 농업용수 공급 등에 중요 역할을 하고 있어 오염원 및 수질관리가 매우 중요하다고 볼 수 있다. 최근 남강댐 하류 및 남강합류 후 낙동강 본류인 창녕함안보 지점에서의 녹조 발생이 빈번해지고 있으며, 녹조현상에 대한 관심과 우려가 높아지고 있는 실정이다. 따라서 기존 호소의 녹조관리는 '조류경보제'에 의해서 관리되고 있지만 4대강 16개의 보 건설 이후 '수질예보제'와 같이 녹조관리를 위한 제도 및 정책이 시행되면서 조류관리의 중요성이 대두되고 있다. 본 연구에서는 기존의 많은 문헌들을 참고하여 조류의 영향인자를 파악하였으며, 남강유역의 물관리 기초자료를 수집하고 구축된 데이터 기반의 각 항목별 주요항목 영향인자 분석을 위한 상관성 분석을 실시하여 영향인자별 상관관계 우선순위를 선정하여 입력변수로 이용하였다. 그에 따른 데이터 마이닝을 통한 조류 발생특성을 고려하여 예측 모형인 다중회귀분석(Multiple Regression Analysis)을 구현하였다. 회귀분석 과정에서 다중공선성이 발생하는 변수에 대해서는 모형에서 제거하였으며, 잔차분석을 통해 이상치와 영향치를 검토하여 고려하였다.

  • PDF