• Title/Summary/Keyword: Ridge 회귀분석

Search Result 37, Processing Time 0.019 seconds

Using Ridge Regression to Improve the Accuracy and Interpretation of the Hedonic Pricing Model : Focusing on apartments in Guro-gu, Seoul (능형회귀분석을 활용한 부동산 헤도닉 가격모형의 정확성 및 해석력 향상에 관한 연구 - 서울시 구로구 아파트를 대상으로 -)

  • Koo, Bonsang;Shin, Byungjin
    • Korean Journal of Construction Engineering and Management
    • /
    • v.16 no.5
    • /
    • pp.77-85
    • /
    • 2015
  • The Hedonic Pricing model is the predominant approach used today to model the effect of relevant factors on real estate prices. These factors include intrinsic elements of a property such as floor areas, number of rooms, and parking spaces. Also, The model also accounts for the impact of amenities or undesirable facilities of a property's value. In the latter case, euclidean distances are typically used as the parameter to represent the proximity and its impact on prices. However, in situations where multiple facilities exist, multi-colinearity may exist between these parameters, which can result in multi-regression models with erroneous coefficients. This research uses Variance Inflation Factors(VIF) and Ridge Regression to identify these errors and thus create more accurate and stable models. The techniques were applied to apartments in Guro-gu of Seoul, whose prices are impacted by subway stations as well as a public prison, a railway terminal and a digital complex. The VIF identified colinearity between variables representing the terminal and the digital complex as well as the latitudinal coordinates. The ridge regression showed the need to remove two of these variables. The case study demonstrated that the application of these techniques were critical in developing accurate and robust Hedonic Pricing models.

A study on the properties of sensitivity analysis in principal component regression and latent root regression (주성분회귀와 고유값회귀에 대한 감도분석의 성질에 대한 연구)

  • Shin, Jae-Kyoung;Chang, Duk-Joon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.20 no.2
    • /
    • pp.321-328
    • /
    • 2009
  • In regression analysis, the ordinary least squares estimates of regression coefficients become poor, when the correlations among predictor variables are high. This phenomenon, which is called multicollinearity, causes serious problems in actual data analysis. To overcome this multicollinearity, many methods have been proposed. Ridge regression, shrinkage estimators and methods based on principal component analysis (PCA) such as principal component regression (PCR) and latent root regression (LRR). In the last decade, many statisticians discussed sensitivity analysis (SA) in ordinary multiple regression and same topic in PCR, LRR and logistic principal component regression (LPCR). In those methods PCA plays important role. Many statisticians discussed SA in PCA and related multivariate methods. We introduce the method of PCR and LRR. We also introduce the methods of SA in PCR and LRR, and discuss the properties of SA in PCR and LRR.

  • PDF

Robust ridge regression for nonlinear mixed effects models with applications to quantitative high throughput screening assay data (비선형 혼합효과모형에서의 로버스트 능형회귀 방법과 정량적 고속 대량 스크리닝 자료에의 응용)

  • Yoo, Jiseon;Lim, Changwon
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.1
    • /
    • pp.123-137
    • /
    • 2018
  • A nonlinear mixed effects model is mainly used to analyze repeated measurement data in various fields. A nonlinear mixed effects model consists of two stages: the first-stage individual-level model considers intra-individual variation and the second-stage population model considers inter-individual variation. The individual-level model, which is the first stage of the nonlinear mixed effects model, estimates the parameters of the nonlinear regression model. It is the same as the general nonlinear regression model, and usually estimates parameters using the least squares estimation method. However, the least squares estimation method may have a problem that the estimated value of the parameters and standard errors become extremely large if the assumed nonlinear function is not explicitly revealed by the data. In this paper, a new estimation method is proposed to solve this problem by introducing the ridge regression method recently proposed in the nonlinear regression model into the first-stage individual-level model of the nonlinear mixed effects model. The performance of the proposed estimator is compared with the performance with the standard estimator through a simulation study. The proposed methodology is also illustrated using quantitative high throughput screening data obtained from the US National Toxicology Program.

A Derivation of a Hydrograph by Using Smoothed Dimensionless Unit Kernel Function (평활화된 무차원 단위핵함수를 이용한 단위도의 유도)

  • Seong, Kee-Won
    • Journal of Korea Water Resources Association
    • /
    • v.41 no.6
    • /
    • pp.559-564
    • /
    • 2008
  • A practical method is derived for determining the unit hydrograph and S-curve from complex storm events by using a smoothed unit kernel approach. The using a unit kernel yields more convenient way of constructing a unit hydrograph and its S-curve than a conventional method. However, with use of real data, the unit kernel oscillates and is unstable so that a unit hydrograph and S-curve cannot easily obtained. The use of non-parametric ridge regression with a Laplacian matrix is suggested for deriving an event averaged unit kernel which reduces the computational efforts when dealing with the Nash instantaneous unit hydrograph as a basis of the kernel. A method changing the unit hydrograph duration is also presented. The procedure shown in this work will play an efficient role when any unit hydrograph works is involved.

Hydrologic Response Estimation Using Mallows' $C_L$ Statistics (Mallows의 $C_L$ 통계량을 이용한 수문응답 추정)

  • Seong, Gi-Won;Sim, Myeong-Pil
    • Journal of Korea Water Resources Association
    • /
    • v.32 no.4
    • /
    • pp.437-445
    • /
    • 1999
  • The present paper describes the problem of hydrologic response estimation using non-parametric ridge regression method. The method adapted in this work is based on the minimization of the $C_L$ statistics, which is an estimate of the mean square prediction error. For this method, effects of using both the identity matrix and the Laplacian matrix were considered. In addition, we evaluated methods for estimating the error variance of the impulse response. As a result of analyzing synthetic and real data, a good estimation was made when the Laplacian matrix for the weighting matrix and the bias corrected estimate for the error variance were used. The method and procedure presented in present paper will play a robust and effective role on separating hydrologic response.

  • PDF

A Study on the Prediction Model of Total Construction Period according to the Type of Machine Learning Regression (머신러닝 회귀분석 유형에 따른 총 공사기간 예측 모델에 관한 연구)

  • Kang, Yun-Ho;Yun, Seok-Heon
    • Proceedings of the Korean Institute of Building Construction Conference
    • /
    • 2023.05a
    • /
    • pp.361-362
    • /
    • 2023
  • In construction work, there is often a difference between the estimated construction period and the actual construction period. Accordingly, the project may be delayed from the scheduled date, leading to huge losses due to problems such as increased costs during construction. In this way, it is important to calculate the appropriate construction period at the project planning stage in construction work. To solve this problem, we would like to study a model that will increase the accuracy of the scheduled construction period at the project planning stage. This study compared and analyzed linear regression, Lasso regression, Ridge regression among the types of regression analysis to select an appropriate construction period prediction model to secure an appropriate construction period at the project planning stage to reduce problems during construction.

  • PDF

A Study on the Prediction of Strawberry Production in Machine Learning Infrastructure (머신러닝 기반 시설재배 딸기 생산량 예측 연구)

  • Oh, HanByeol;Lim, JongHyun;Yang, SeungWeon;Cho, YongYun;Shin, ChangSun
    • Smart Media Journal
    • /
    • v.11 no.5
    • /
    • pp.9-16
    • /
    • 2022
  • Recently, agricultural sites are automating into digital agricultural smart farms by applying technologies such as big data and Internet of Things (IoT). These smart farms aim to increase production and improve crop quality by measuring the environment of crops, investigating and processing data. Production prediction is an important study in smart farm digital agriculture, which is a high-tech agriculture, and it is necessary to analyze environmental data using big data and further standardized research to manage the quality of growth information data. In this paper, environmental and production data collected from smart farm strawberry farms were analyzed and studied. Based on regression analysis, crop production prediction models were analyzed using Ridge Regression, LightGBM, and XGBoost. Among the three models, the optimal model was XGBoost, and R2 showed 82.5 percent explanatory power. As a result of the study, the correlation between the amount of positive fluid absorption and environmental data was confirmed, and significant results were obtained for the production prediction study. In the future, it is expected to contribute to the prevention of environmental pollution and reduction of sheep through the management of sheep by studying the amount of sheep absorption, such as information on the growing environment of crops and the ingredients of sheep.

Analysis of cycle racing ranking using statistical prediction models (통계적 예측모형을 활용한 경륜 경기 순위 분석)

  • Park, Gahee;Park, Rira;Song, Jongwoo
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.1
    • /
    • pp.25-39
    • /
    • 2017
  • Over 5 million people participate in cycle racing betting and its revenue is more than 2 trillion won. This study predicts the ranking of cycle racing using various statistical analyses and identifies important variables which have influence on ranking. We propose competitive ranking prediction models using various classification and regression methods. Our model can predict rankings with low misclassification rates most of the time. We found that the ranking increases as the grade of a racer decreases and as overall scores increase. Inversely, we can observe that the ranking decreases when the grade of a racer increases, race number four is given, and the ranking of the last race of a racer decreases. We also found that prediction accuracy can be improved when we use centered data per race instead of raw data. However, the real profit from the future data was not high when we applied our prediction model because our model can predict only low-return events well.

A Modeling of Realtime Fuel Comsumption Prediction Using OBDII Data (OBDII 데이터 기반의 실시간 연료 소비량 예측 모델 연구)

  • Yang, Hee-Eun;Kim, Do-Hyun;Choe, Hoseop
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.10 no.2
    • /
    • pp.57-64
    • /
    • 2021
  • This study presents a method for realtime fuel consumption prediction using real data collected from OBDII. With the advent of the era of self-driving cars, electronic control units(ECU) are getting more complex, and various studies are being attempted to extract and analyze more accurate data from vehicles. But since ECU is getting more complex, it is getting harder to get the data from ECU. To solve this problem, the firmware was developed for acquiring accurate vehicle data in this study, which extracted 53,580 actual driving data sets from vehicles from January to February 2019. Using these data, the ensemble stacking technique was used to increase the accuracy of the realtime fuel consumption prediction model. In this study, Ridge, Lasso, XGBoost, and LightGBM were used as base models, and Ridge was used for meta model, and the predicted performance was MAE 0.011, RMSE 0.017.

Negative Ion Generation Index according to Altitude in the Autumn of Pine Forest in Gyeongju Namsan (경주 남산 소나무림의 가을철 해발고도별 음이온 발생지수)

  • Kim, Jeong Ho;Yoon, Ji Hun;Lee, Sang Hoon;Choi, Won Jun;Yoon, Yong Han
    • Korean Journal of Environment and Ecology
    • /
    • v.32 no.4
    • /
    • pp.413-424
    • /
    • 2018
  • The study analyzed the effects of topographic structures and altitude in mountainous parks in Mt. Namsan in Gyeongju on the generation of anions. The temperature was at ridge ($9.82^{\circ}C$) > valley ($8.44^{\circ}C$), the relative humidity valley (59.01 %) > ridge (58.64 %), the solar radiation ridge ($34.40W/m^2$) > valley($14.69W/m^2$), the wind speed ridge (0.63m/s) > valley(0.37m/s), and the negative ion valley($636.81ea/cm^3$) > ridge($580.04ea/cm^3$). In the valley, the correlation with altitude was verified for the temperature, relative humidity, solar radiation, and negative ion generation in the valley. The relative humidity, solar radiation, and negative ion indicated a positive correlation while the temperature had a negative correlation. In the ridge, the correlation with altitude was verified for the temperature, relative humidity, wind speed, solar radiation, and negative ion generation. The relative humidity, solar radiation, and negative ion generation indicated a positive correlation while the temperature and wind speed had a negative correlation. The regression analysis showed the prediction equation of y=-0.006x+9.663 (x=altitude, y=temperature) in the valley and y=-0.009x+11.595 (x=altitude, y=temperature) in the ridge for the temperature, y=0.027x+53.561 (x=altitude, y=relative humidity) in the valley and y=0.008x+56.646 (x=altitude, y=relative humidity) in the ridges for the relative humidity, and y=0.027x+53.561 (x=altitude, y=negative Ion generation) in the valley and y= 0.008x+56.646 (x=altitude, y=negative Ion generation) in the ridge for the negative ion generation.