• Title/Summary/Keyword: 확률적 회귀모형

Search Result 184, Processing Time 0.027 seconds

Improvement in probabilistic drought prediction method using Bayes' theorem (베이즈이론을 이용한 가뭄 확률 전망 기법 고도화)

  • Kim, Daeho;Kim, Young-Oh
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2020.06a
    • /
    • pp.153-153
    • /
    • 2020
  • 우리나라에선 크고 작은 가뭄 피해가 자주 일어나고 있으며 최근엔 유래 없는 다년가뭄이 발생하면서 가뭄에 대한 경각심이 커지고 있다. 가뭄에 적절하게 대응하여 피해를 경감시키기 위해서는 신뢰도 높은 가뭄 예측이 선행되어야 한다. 이에 본 연구는 앙상블 예측과 베이즈이론(Bayes' theorem)을 수문학적 가뭄지수 중 하나인 SRI(Standardized Runoff Index)에 적용해 가뭄 확률 전망을 실시했으며 이를 EDP(Ensemble Drought Prediction)라고 칭하였다. 국내 8개 댐유역에서 EDP를 생성하고 개선하는 과정은 다음과 같이 진행된다. 우선 TANK모형을 활용한 1개월 선행 유량 예측(Ensemble Streamflow Prediction, ESP)의 결과를 SRI로 변환하여 EDP 확률분포를 생성한다. 그런 다음, EDP를 개선하기 위해 그 기초인 ESP에서 미흡한 토양수분 초기조건을 보완하고자 베이즈이론을 활용했다. APCC(APEC Climate Center)의 위성 관측 SMI(Soil Moisture Index) 자료로 SRI와의 회귀식을 구축, 이를 우도함수로 정의해 사전 EDP 분포를 업데이트한 EDP+ 확률분포를 생성했다. 그 결과, EDP와 EDP+ 모두 심도가 깊은 가뭄을 전망할수록 예측력이 기후학적 예측보다 좋지 않았다. 그럼에도 우도함수로 사용한 회귀식의 정확도가 높을수록 EDP+의 정확도도 향상되는 경향이 나타났으며, 이는 베이즈이론을 사용한다면 가뭄 확률 전망을 개선할 수 있다는 것을 의미하고 있다. 하지만, 확정 전망 정확도는 확률 전망 정확도와는 관계가 없었는데 이는 확정 전망과 확률 전망이 본질적으로 다르기 때문인 것으로 사료된다.

  • PDF

Effects of Multicollinearity in Logit Model (로짓모형에 있어서 다중공선성의 영향에 관한 연구)

  • Ryu, Si-Kyun
    • Journal of Korean Society of Transportation
    • /
    • v.26 no.1
    • /
    • pp.113-126
    • /
    • 2008
  • This research aims to explore the effects of multicollinearity on the reliability and goodness of fit of logit model. To investigate the effects of multicollinearity on the multinominal logit model, numerical experiments are performed. The exploratory variables(attributes of utility functions) which have a certain degree of correlations from (rho=) 0.0 to (rho=) 0.9 are generated and rho-squares and t-statistics which are the indices of goodness of fit and reliability of logit model are traced. From the well designed numerical experiments, following findings are validated : 1) When a new exploratory variable is added, some of rho-squares increase while the others decrease. 2) The higher relations between generic variables lead a logit model worse with respect to goodness of fit. 3) Multicollinearity has a tendency to produce over-evaluated parameters. 4) The reliability of the estimated parameter has a tendency to decrease when the correlations between attributes are high. These results suggest that we have to examine the existence of multicollinearity and perform the proper treatments to diminish multicollinearity when we develop logit model.

Traffic Accidents Analysis on Expressway using Spatial Autoregressive Model (공간자기회귀모형을 이용한 고속도로 교통사고 분석)

  • 강경우
    • Journal of Korean Society of Transportation
    • /
    • v.15 no.1
    • /
    • pp.5-15
    • /
    • 1997
  • 공간통계분석은 공간적으로 연계된 변수들간의 관계를 분석하는 통계분야이다. 일 반적으로 공간적으로 연계된 변수들간의 관계는 각 변수간의 공간적 분포정도에 따라서 영 향을 받는다. 전통적인 통계 분석의 방법은 동질의 자료발생과정에 의하여 확률적으로 축출 된 표본자료를 가정하고 있으나, 공간적인 자료는 이와 같은 동질의 자료발생과정의 가정을 부정한다. 교통류 및 교통사고 등과 같은 교통분야의 자료는 대부분 공간적인 상관관계에 의하여 축출된 이질적인 표본자료이며 따라서 공간상관관계를 동질적으로 가정한 전통적인 통계적 분석 방법은 오류를 범할 수 있다. 본 논문은 공간적인 관계를 고려한 공간자기상관 분석기법을 이용하여 고속도로상의 교통사고에 관하여 분석하였다. 분석의 결과에 의하면 4 개 고속도로 중 경인고속도로를 제외한 3개의 고속도로상의 교통사고건수는 통계적으로 현 저한 양의 공간적 상관관계가 있음을 알 수 있었다. 이에 따라 공간적 상관관계를 고려한 교통사고분석을 위하여 종속변수로 단위구간별 교통사고건수를 그리고 설명변수로서는 단위 구간별 교통량, I.C. 유무 및 화물차량비율을 이용하여 공간 자기회귀분석을 시도하였다. 분 석의 분석에서는 구간별 교통량과 화물차량의 비율이 호남/남해 고속도로의 경우에는 구간 별 교통량과 I.C. 유무가 통계적으로 유의한 것으로 분석되었다.

  • PDF

Development of Ingrowth Estimation Equations for Pinus densiflora in Korea Derived from National Forest Inventory Data (국가산림자원조사 자료를 이용한 소나무의 진계생장 추정식 개발)

  • Moon, Ga Hyun;Yim, Jong Su;Shin, Man Yong
    • Journal of Korean Society of Forest Science
    • /
    • v.107 no.4
    • /
    • pp.402-411
    • /
    • 2018
  • This study was conducted to develop ingrowth estimation equations on Pinus densiflora found in Gangwon Province and in the center of Korean Peninsula, based on the National Forest Inventory (NFI)'s permanent sampling plot data. For this study, identical sampling plots in $5^{th}$ and $6^{th}$ NFI data were collected in order to identify ingrowth amounts for the last 5 years. Following two-stage approaches in developing the ingrowth estimation equations, the logistic regression model was used in the first stage to estimate the ingrowth probability. In the second stage, regression analysis on sampling plots with ingrowth occurrence was used to estimate the ingrowth amount. A candidate model was finally selected as an optimal model after a verification based on three evaluation statistics which include mean difference (MD), standard deviation of difference (SDD) and standard error of difference (SED). In results, a logistic regression model based on the number of sampling plot which did not result in ingrowth (model VI), was selected for an ingrowth probability estimation equation and exponential function including the species composition (SC) variable was optimal for an ingrowth estimation equation (model VII). The ingrowth estimation equations developed in this study also evaluated the estimation ability in various forest stand conditions, and no particular issue in fitness or applicability was observed.

Determinants of job finding using student's characteristic information (학생정보를 이용한 대졸 취업에 미치는 영향력 분석)

  • Cho, Jang-Sik
    • Journal of the Korean Data and Information Science Society
    • /
    • v.22 no.5
    • /
    • pp.849-856
    • /
    • 2011
  • In this paper, we study the influence analysis of admission and enrollment variables including individual characteristics variables on employment of graduate students at K university. First, logistic regression analysis is used to examine the main effects of admission, enrollment variables including student's individual characteristics on employment. Also, decision tree analysis is used to examine the interaction effects for the variables on employment. The results of this paper may be helpful to K university in designing effective job finding strategies for graduate students.

Analysis of Influential Factors of Roadkill Occurrence - A Case Study of Seorak National Park - (로드킬 발생 영향요인 분석 - 설악산 국립공원 44번 국도를 대상으로 -)

  • Son, Seung-Woo;Kil, Sung-Ho;Yun, Young-Jo;Yoon, Jeong-Ho;Jeon, Hyung-Jin;Son, Young-Hoon;Kim, Min-Sun
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.44 no.3
    • /
    • pp.1-12
    • /
    • 2016
  • This study aimed to interpret the fundamental cause of road-kill occurrences and analyzed spatial characteristics of the road-kill locations from Route 44 in Seorak National Park, Korea. Logistic regression analysis was utilized for backward elimination on variables. Seorak National Park Service has constructed GIS-data of 81 road-kill occurrences from 2008 to 2013 and these data were assigned as dependent variables in this study. Considered as independent variables from previous studies and field surveys, vegetation age-class, distance to streams, coverage of fences and retaining walls, and distance to building sites were assigned as road-kill impact factors. The coverage of fences and retaining walls(-1.0135) was shown as the most influential factor whereas vegetation age-class(0.0001) was the least influential among all of the significant factor estimates. Accordingly, the rate of road-kill occurrence can increase as the distance to building sites and stream becomes closer and vegetation age-class becomes higher. The predictive accuracy of road-kill occurrence was shown to be 72.2% as a result of analysis, assuming as partial causes of road-kill occurrences reflecting spatial characteristics. This study can be regarded as beneficial to provide objective basis for spatial decision making including road-kill occurrence mitigation policies and plans in the future.

A development of stochastic simulation model based on vector autoregressive model (VAR) for groundwater and river water stages (벡터자기회귀(VAR) 모형을 이용한 지하수위와 하천수위의 추계학적 모의기법 개발)

  • Kwon, Yoon Jeong;Won, Chang-Hee;Choi, Byoung-Han;Kwon, Hyun-Han
    • Journal of Korea Water Resources Association
    • /
    • v.55 no.12
    • /
    • pp.1137-1147
    • /
    • 2022
  • River and groundwater stages are the main elements in the hydrologic cycle. They are spatially correlated and can be used to evaluate hydrological and agricultural drought. Stochastic simulation is often performed independently on hydrological variables that are spatiotemporally correlated. In this setting, interdependency across mutual variables may not be maintained. This study proposes the Bayesian vector autoregression model (VAR) to capture the interdependency between multiple variables over time. VAR models systematically consider the lagged stages of each variable and the lagged values of the other variables. Further, an autoregressive model (AR) was built and compared with the VAR model. It was confirmed that the VAR model was more effective in reproducing observed interdependency (or cross-correlation) between river and ground stages, while the AR generally underestimated that of the observed.

Development of Return flow rate Prediction Algorithm with Data Variation based on LSTM (LSTM기반의 자료 변동성을 고려한 하천수 회귀수량 예측 알고리즘 개발연구)

  • Lee, Seung Yeon;Yoo, Hyung Ju;Lee, Seung Oh
    • Journal of Korean Society of Disaster and Security
    • /
    • v.15 no.2
    • /
    • pp.45-56
    • /
    • 2022
  • The countermeasure for the shortage of water during dry season and drought period has not been considered with return flowrate in detail. In this study, the outflow of STP was predicted through a data-based machine learning model, LSTM. As the first step, outflow, inflow, precipitation and water elevation were utilized as input data, and the distribution of variance was additionally considered to improve the accuracy of the prediction. When considering the variability of the outflow data, the residual between the observed value and the distribution was assumed to be in the form of a complex trigonometric function and presented in the form of the optimal distribution of the outflow along with the theoretical probability distribution. It was apparently found that the degree of error was reduced when compared to the case not considering where the variance distribution. Therefore, it is expected that the outflow prediction model constructed in this study can be used as basic data for establishing an efficient river management system as more accurate prediction is possible.

Orographic Precipitation Analysis with Regional Frequency Analysis and Multiple Linear Regression (지역빈도해석 및 다중회귀분석을 이용한 산악형 강수해석)

  • Yun, Hye-Seon;Um, Myoung-Jin;Cho, Won-Cheol;Heo, Jun-Haeng
    • Journal of Korea Water Resources Association
    • /
    • v.42 no.6
    • /
    • pp.465-480
    • /
    • 2009
  • In this study, single and multiple linear regression model were used to derive the relationship between precipitation and altitude, latitude and longitude in Jejudo. The single linear regression analysis was focused on whether orographic effect was existed in Jejudo by annual average precipitation, and the multiple linear regression analysis on whether orographic effect was applied to each duration and return period of quantile from regional frequency analysis by index flood method. As results of the regression analysis, it shows the relationship between altitude and precipitation strongly form a linear relationship as the length of duration and return period increase. The multiple linear regression precipitation estimates(which used altitude, latitude, and longitude information) were found to be more reasonable than estimates obtained using altitude only or altitude-latitude and altitude-longitude. Especially, as results of spatial distribution analysis by kriging method using GIS, it also provides realistic estimates for precipitation that the precipitation was occurred the southeast region as real climate of Jejudo. However, the accuracy of regression model was decrease which derived a short duration of precipitation or estimated high region precipitation even had long duration. Consequently the other factor caused orographic effect would be needed to estimate precipitation to improve accuracy.

Nomogram comparison conducted by logistic regression and naïve Bayesian classifier using type 2 diabetes mellitus (T2D) (제 2형 당뇨병을 이용한 로지스틱과 베이지안 노모그램 구축 및 비교)

  • Park, Jae-Cheol;Kim, Min-Ho;Lee, Jea-Young
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.5
    • /
    • pp.573-585
    • /
    • 2018
  • In this study, we fit the logistic regression model and naïve Bayesian classifier model using 11 risk factors to predict the incidence rate probability for type 2 diabetes mellitus. We then introduce how to construct a nomogram that can help people visually understand it. We use data from the 2013-2015 Korean National Health and Nutrition Examination Survey (KNHANES). We take 3 interactions in the logistic regression model to improve the quality of the analysis and facilitate the application of the left-aligned method to the Bayesian nomogram. Finally, we compare the two nomograms and examine their utility. Then we verify the nomogram using the ROC curve.