• Title/Summary/Keyword: 단계적 선형회귀 분석

Search Result 99, Processing Time 0.028 seconds

Development of Multiple Linear Regression Model to Predict Agricultural Reservoir Storage based on Naive Bayes Classification and Weather Forecast Data (나이브 베이즈 분류와 기상예보자료 기반의 농업용 저수지 저수율 전망을 위한 저수율 예측 다중선형 회귀모형 개발)

  • Kim, Jin Uk;Jung, Chung Gil;Lee, Ji Wan;Kim, Seong Joon
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2018.05a
    • /
    • pp.112-112
    • /
    • 2018
  • 최근 이상기후로 인한 국부적인 혹은 광역적인 가뭄이 빈번하게 발생하고 있는 추세이며 발생횟수 뿐 아니라 가뭄 심도 및 지속기간이 과거보다 크게 증가하여 그에 따른 피해가 커질 것으로 예측되고 있다. 특히, 2014~2015년도의 유례없는 가뭄으로 인해 저수지 용수공급이 제한되면서 많은 농가들이 피해를 입었다. 본 연구의 목적은 전국 농업용 저수지를 대상으로 기상청 3개월 예보자료를 활용 할 수 있는 농업용 저수지 저수율 다중선형 회귀 모형을 개발하여 저수율 전망정보를 생산하는 것이다. 본 연구에서는 전국에 적용 가능한 저수율 다중선형 회귀 모형개발을 위해 5개의 기상요소(강수량, 최고기온, 최저기온, 평균기온, 평균풍속)와 관측 저수지 저수율을 활용했다. 기상자료는 2002년부터 2017년까지의 기상청 63개 지상관측소로부터 기상관측자료를 수집하였다. 본 연구에서는 저수율 전망 단계를 세 단계로 나누었다. 첫 번째 단계로 농어촌공사에서 전국 511개 용수구역을 대상으로 군집분석 및 의사결정나무 분석을 통해 제시한 65개 대표저수지를 대상으로 기상자료 및 관측 저수율 자료를 이용하여 다중선형 회귀분석을 실시하였다. 수집한 기상요소와 저수율을 독립변수로 하여 월별 회귀식을 산정한 결과 결정계수($R^2$)는 0.51~0.95로 나타났다. 두 번째 단계로 대표저수지의 회귀분석 결과를 전국의 저수지로 확대하기 위해 나이브 베이즈 분류법을 적용하여 전국 3098개의 저수지를 65의 군집으로 분류하고 각각의 군집에 해당되는 월별 회귀식을 산정하였다. 마지막으로 전국 저수지로 산정된 회귀식과 농업 가뭄 예측을 위해 기상청의 GS5(Global Seasonal Forecasting System 5) 3개월 예보자료를 수집하여 회귀식에 적용해 2017년 전국 저수지의 3개월 저수율 전망정보를 생산하였다. 본 연구의 전국 저수지 군집결과 기반의 저수율 전망기술은 2017년도 관측 저수율과 비교한 결과 유의한 상관성을 나타냈으며 이 결과는 추후 농업용 저수지의 물 공급 및 농업가뭄 전망 자료로서 이용이 가능할 것으로 판단된다.

  • PDF

Robust ridge regression for nonlinear mixed effects models with applications to quantitative high throughput screening assay data (비선형 혼합효과모형에서의 로버스트 능형회귀 방법과 정량적 고속 대량 스크리닝 자료에의 응용)

  • Yoo, Jiseon;Lim, Changwon
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.1
    • /
    • pp.123-137
    • /
    • 2018
  • A nonlinear mixed effects model is mainly used to analyze repeated measurement data in various fields. A nonlinear mixed effects model consists of two stages: the first-stage individual-level model considers intra-individual variation and the second-stage population model considers inter-individual variation. The individual-level model, which is the first stage of the nonlinear mixed effects model, estimates the parameters of the nonlinear regression model. It is the same as the general nonlinear regression model, and usually estimates parameters using the least squares estimation method. However, the least squares estimation method may have a problem that the estimated value of the parameters and standard errors become extremely large if the assumed nonlinear function is not explicitly revealed by the data. In this paper, a new estimation method is proposed to solve this problem by introducing the ridge regression method recently proposed in the nonlinear regression model into the first-stage individual-level model of the nonlinear mixed effects model. The performance of the proposed estimator is compared with the performance with the standard estimator through a simulation study. The proposed methodology is also illustrated using quantitative high throughput screening data obtained from the US National Toxicology Program.

Settlement Prediction Accuracy Analysis of Weighted Nonlinear Regression Hyperbolic Method According to the Weighting Method (가중치 부여 방법에 따른 가중 비선형 회귀 쌍곡선법의 침하 예측 정확도 분석)

  • Kwak, Tae-Young ;Woo, Sang-Inn;Hong, Seongho ;Lee, Ju-Hyung;Baek, Sung-Ha
    • Journal of the Korean Geotechnical Society
    • /
    • v.39 no.4
    • /
    • pp.45-54
    • /
    • 2023
  • The settlement prediction during the design phase is primarily conducted using theoretical methods. However, measurement-based settlement prediction methods that predict future settlements based on measured settlement data over time are primarily used during construction due to accuracy issues. Among these methods, the hyperbolic method is commonly used. However, the existing hyperbolic method has accuracy issues and statistical limitations. Therefore, a weighted nonlinear regression hyperbolic method has been proposed. In this study, two weighting methods were applied to the weighted nonlinear regression hyperbolic method to compare and analyze the accuracy of settlement prediction. Measured settlement plate data from two sites located in Busan New Port were used. The settlement of the remaining sections was predicted by setting the regression analysis section to 30%, 50%, and 70% of the total data. Thus, regardless of the weight assignment method, the settlement prediction based on the hyperbolic method demonstrated a remarkable increase in accuracy as the regression analysis section increased. The weighted nonlinear regression hyperbolic method predicted settlement more accurately than the existing linear regression hyperbolic method. In particular, despite a smaller regression analysis section, the weighted nonlinear regression hyperbolic method showed higher settlement prediction performance than the existing linear regression hyperbolic method. Thus, it was confirmed that the weighted nonlinear regression hyperbolic method could predict settlement much faster and more accurately.

Trip Generation Model based on Geographically Weighted Regression (공간가중회귀분석을 이용한 통행발생모형)

  • Kim, Jin-Hui;Park, Il-Seop;Jeong, Jin-Hyeok
    • Journal of Korean Society of Transportation
    • /
    • v.29 no.2
    • /
    • pp.101-109
    • /
    • 2011
  • In most of the urbanized cities, socio-economic attributes tend to cluster as patterns of similarity in space, namely spatial autocorrelation, by agglomeration forces. The classical linear regression model, the most frequently adopted in the trip generation step, cannot sufficiently represent this effect. In order to take into account the effect properly, we need a model which adequately deals with the spatial dependence patterns. In this study, the Geographically Weighted Regression (GWR) model is adopted as an alternative method for the local analysis of relationships in multivariate data sets; that is GWR extends this traditional regression framework by estimating local rather than global parameters. This study shows the existence of spatial effects in the production and attraction of home base/non-home based trips through the GWR model using travel data collected in Daegu metropolitan area. Furthermore, LISA is employed to verify the fact that the local spatial autocorrelation exists.

Prediction of golf scores on the PGA tour using statistical models (PGA 투어의 골프 스코어 예측 및 분석)

  • Lim, Jungeun;Lim, Youngin;Song, Jongwoo
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.1
    • /
    • pp.41-55
    • /
    • 2017
  • This study predicts the average scores of top 150 PGA golf players on 132 PGA Tour tournaments (2013-2015) using data mining techniques and statistical analysis. This study also aims to predict the Top 10 and Top 25 best players in 4 different playoffs. Linear and nonlinear regression methods were used to predict average scores. Stepwise regression, all best subset, LASSO, ridge regression and principal component regression were used for the linear regression method. Tree, bagging, gradient boosting, neural network, random forests and KNN were used for nonlinear regression method. We found that the average score increases as fairway firmness or green height or average maximum wind speed increases. We also found that the average score decreases as the number of one-putts or scrambling variable or longest driving distance increases. All 11 different models have low prediction error when predicting the average scores of PGA Tournaments in 2015 which is not included in the training set. However, the performances of Bagging and Random Forest models are the best among all models and these two models have the highest prediction accuracy when predicting the Top 10 and Top 25 best players in 4 different playoffs.

Fast robust variable selection using VIF regression in large datasets (대형 데이터에서 VIF회귀를 이용한 신속 강건 변수선택법)

  • Seo, Han Son
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.4
    • /
    • pp.463-473
    • /
    • 2018
  • Variable selection algorithms for linear regression models of large data are considered. Many algorithms are proposed focusing on the speed and the robustness of algorithms. Among them variance inflation factor (VIF) regression is fast and accurate due to the use of a streamwise regression approach. But a VIF regression is susceptible to outliers because it estimates a model by a least-square method. A robust criterion using a weighted estimator has been proposed for the robustness of algorithm; in addition, a robust VIF regression has also been proposed for the same purpose. In this article a fast and robust variable selection method is suggested via a VIF regression with detecting and removing potential outliers. A simulation study and an analysis of a dataset are conducted to compare the suggested method with other methods.

Recommended Practice for a Reasonable Design Demand Factor and Analysis of Power Consumption Characteristics by loads in Office Buildings (사무소용 빌딩의 부하종별 국내외 수용률 적용실태 분석에 관한 연구)

  • Kim, Se-Dong
    • Proceedings of the KIEE Conference
    • /
    • 2005.10a
    • /
    • pp.113-117
    • /
    • 2005
  • 사무소용 빌딩과 같은 전력다소비 건물에서는 전력의 효율적 이용에 의한 에너지절감을 위해서 설계 단계부터 합리적인 전기설비 설계가 요청되고 있다. 본 연구에서는 사무소용 빌딩을 중심으로 일본과 우리나라에 있어서 일반전등전열부하 및 일반동력부하의 전력 소비특성을 조사 분석하였고, 전기설계사무소의 설계단계에서 적용하는 수용률값을 조사하였다. 조사된 자료의 전체 특징과 중심적인 경향을 알아 보기 위해서 평균값 표준편차, 최대값, 최소값, 중앙값 등의 특징파라메터를 분석하였고, 회귀분석을 통한 선형적인 방법과 비선형적인 방법으로 그 경향을 확인하였다. 그 결과 국내 합성 수용률/부등률 평균값은 46.4%로 나타나 전력용변압기는 용량에 있어서 많은 여유를 가지고 있는 것으로 나타났다. 이를 토대로 변전설비용량의 합리적인 설계를 위하여 부하종별 수용률 기준(안)을 제시하였고, 변전설비용량 산정에 필요한 자료를 데이터베이스화하였다.

  • PDF

Tension infiltrometer를 이용한 토양의 침투특성 분석

  • 하규철;전철민;김재곤
    • Proceedings of the Korean Society of Soil and Groundwater Environment Conference
    • /
    • 2004.04a
    • /
    • pp.362-365
    • /
    • 2004
  • 토양오염의 확산과정중의 서로 다른 지질적 기반위에 놓인 토양에 대한 침투특성과 4단계 이상의 장력을 적용하여 침투율을 산정하였다. 장력과 침투율과는 지수함수로 비선형 회귀시켜서 산정하였으며, 단지 몇단계의 장력만으로 구하여진 값보다 더 많은 불포화대수리특성과 신뢰성있는 포화수리전도도를 제공할 수 있을 것이다. 구하여진 침투율은 토양분석결과를 비교했을 경우 점토함량이 적은 시료에서 적게나오는 경향이 있고, sand가 많은 토양의 경우 높게 산정되었다.

  • PDF

Estimated Headwater Stream Temperature Using Environmental Factors with Seasonal Variations in a Forested Catchment (환경인자를 이용한 산지계류의 계절별 수온변화 예측)

  • Nam, Sooyoun;Jang, Su-Jin;Kim, Suk-Woo;Lee, Youn-Tae;Chun, Kun-Woo
    • Korean Journal of Environment and Ecology
    • /
    • v.34 no.1
    • /
    • pp.55-62
    • /
    • 2020
  • To estimate headwater stream temperature with seasonal variations, we analyzed precipitation, runoff and air temperature in experimental forest of Kangwon National University, Gangwon-do (2017~2018 years). The daily mean value of headwater stream temperature for spring was 6.9~17.7℃ and correlated with air temperature, that for summer and fall were 12.2~26.3℃ and 3.6~19.3℃, correlated with air temperature and runoff. Based on seasonal variations, we applied for stepwise multiple linear regression analyses to estimate headwater stream temperature with seasonal variations. The equations were headwater stream temperature(WT)spring=(0.553×Air temperature)+(0.086×Runoff)+4.145 (R2=0.505; p<0.01), WTsummer=(0.756×Air temperature)+(-0.072×Runoff)+2.670 (R2=0.510; p<0.01), and WTfall=(0.738×Air temperature)+(0.028×Precipitation)+2.660 (R2=0.844; p<0.01). The coefficient of determination (R2) was greater than when it was estimated by air temperature in all seasons and progressively increased from spring to winter. Therefore, we indicated difference on estimated magnitude of stepwise multiple linear regression, due to effects on headwater stream temperature of different environmental factors with seasonal variations. Furthermore, temporal factors with spatial characteristics (e.g., river versus headwater stream) could be recommended for estimating headwater stream temperature.

Analysis of Horse Races: Prediction of Winning Horses in Horse Races Using Statistical Models (서울 경마 경기 우승마 예측 모형 연구)

  • Choe, Hyemin;Hwang, Nayoung;Hwang, Chankyoung;Song, Jongwoo
    • The Korean Journal of Applied Statistics
    • /
    • v.28 no.6
    • /
    • pp.1133-1146
    • /
    • 2015
  • The Horse race industry has the largest proportion of the domestic legal gambling industry. However, there is limited statistical analysis on horse races versus other sports. We propose prediction models for winning horses in horse races using data mining techniques such as logistic regression, linear regression, and random forest. Horse races data are from the Korea Racing Authority and we use horse racing reports, information of racehorses, jockeys, and horse trainers. We consider two models based on ranks and time records. The analysis results show that prediction of ranks is affected by information on racehorses, number of wins of racehorses and jockeys. We place wagers for the last month of races based on our prediction models that produce serious profits.