• Title/Summary/Keyword: Bayesian 다중회귀분석

Search Result 15, Processing Time 0.032 seconds

Regional Low Flow Frequency Analysis Using Bayesian Multiple Regression (Bayesian 다중회귀분석을 이용한 저수량(Low flow) 지역빈도분석)

  • Kim, Sang-Ug;Lee, Kil-Seong;Sung, Jin-Young
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2008.05a
    • /
    • pp.169-173
    • /
    • 2008
  • 본 연구는 저수량 지역 빈도분석(regional low flow frequency analysis)을 수행하기 위하여 일반최소자승법(ordinary least squares method)을 이용한 Bayesian 다중회귀분석을 적용하였으며, 불확실성측면에서의 효과를 탐색하기 위하여 Bayesian 다중회귀분석에 의한 추정치와 t 분포를 이용하여 산정한 일반 다중회귀분석의 추정치의 신뢰구간을 비교분석하였다. 각 재현기간별 비교결과를 보면 t 분포를 이용하여 산정된 평균 추정치와 Bayesian 다중회귀분석에 의한 평균 추정치는 크게 다르지 않았다. 그러나 불확실성 측면에서 평가해볼 때 신뢰구간의 상한추정치와 하한추정치의 차이는 Bayesian 다중회귀분석을 사용한 경우가 기존 방법을 사용한 경우보다 훨씬 작은 것으로 나타났으며, 이로부터 저수량(low flow) 지역 빈도분석을 수행하는 경우 Bayesian 다중회귀분석이 일반 회귀분석보다 불확실성을 표현하는데 있어서 우수하다는 결과를 얻을 수 있었다. 또한 낙동강 유역에 2개의 미계측 유역을 선정하고 구축된 Bayesian 다중회귀모형을 적용하여 불확실성을 포함한 미계측 유역에서의 저수량(low flow)을 추정하였으며 이와 같은 방법이 미계측 유역에서의 저수(low flow) 특성을 나타내는 데 있어서 효과적일 수 있음을 입증하였다.

  • PDF

Regional Low Flow Frequency Analysis Using Bayesian Multiple Regression (Bayesian 다중회귀분석을 이용한 저수량(Low flow) 지역 빈도분석)

  • Kim, Sang-Ug;Lee, Kil-Seong
    • Journal of Korea Water Resources Association
    • /
    • v.41 no.3
    • /
    • pp.325-340
    • /
    • 2008
  • This study employs Bayesian multiple regression analysis using the ordinary least squares method for regional low flow frequency analysis. The parameter estimates using the Bayesian multiple regression analysis were compared to conventional analysis using the t-distribution. In these comparisons, the mean values from the t-distribution and the Bayesian analysis at each return period are not significantly different. However, the difference between upper and lower limits is remarkably reduced using the Bayesian multiple regression. Therefore, from the point of view of uncertainty analysis, Bayesian multiple regression analysis is more attractive than the conventional method based on a t-distribution because the low flow sample size at the site of interest is typically insufficient to perform low flow frequency analysis. Also, we performed low flow prediction, including confidence interval, at two ungauged catchments in the Nakdong River basin using the developed Bayesian multiple regression model. The Bayesian prediction proves effective to infer the low flow characteristic at the ungauged catchment.

A Study on Regionalization of Parameters of Continuous Rainfall-Runoff Model (연속 강우-유출모형의 매개변수 지역화에 관한 연구)

  • Jeong, Ga-In;Kim, Tae-Jeong;Kwon, Hyun-Han
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2015.05a
    • /
    • pp.182-182
    • /
    • 2015
  • 우리나라에서는 강우관측시스템의 지역적 불균형으로 상대적으로 소규모 저수지의 경우 미계측유역의 특성을 가지며, 신뢰성 있는 강우량, 유출량, 증발량 자료가 매우 부족한 실정이다. 다목적댐 유역과 같은 계측유역의 경우 상류유역의 유입량 자료의 확보가 용이하지만 대부분의 유역의 경우 계측장비가 부족하여 신뢰성이 확보된 유입량 자료를 얻는데 많은 어려움이 있다. 본 연구에서는 미계측유역의 유입량 산정을 위하여 계측유역을 대상으로 강우-유출 모형의 매개변수를 산정하였으며, 산정된 매개변수를 유역특성인자와의 상관성을 토대로 다중선형회귀분석기법(multiple linear regression, MLR)을 적용하여 지역화(regionalization)를 위한 회귀식을 도출하였다. 이를 위해 양질의 유량자료가 확보된 K-water 17개 댐 유역을 대상으로 매개변수를 산정하였으며 이 중 2개의 댐 유역을 미계측유역으로 간주하여 개발된 모형을 검증하였다. 대부분의 통계 지표에서 우수한 모의능력을 확인하였으며, 본 연구를 통하여 개발된 지역화 기법을 미계측유역에 활용한다면 보다 정량적이고 효율적인 수자원 계획이 가능할 것으로 판단된다. 향후 연구로는 불확실성을 고려한 Bayesian GLM 모형을 이용한 지역화기법을 개발하여 매개변수의 불확실성까지 고려할 수 있는 방안을 모색하고자 한다.

  • PDF

A Study on Regionalization of Parameters for Sacramento Continuous Rainfall-Runoff Model Using Watershed Characteristics (유역특성인자를 활용한 Sacramento 장기유출모형의 매개변수 지역화 기법 연구)

  • Kim, Tae-Jeong;Jeong, Ga-In;Kim, Ki-Young;Kwon, Hyun-Han
    • Journal of Korea Water Resources Association
    • /
    • v.48 no.10
    • /
    • pp.793-806
    • /
    • 2015
  • The simulation of natural streamflow at ungauged basins is one of the fundamental challenges in hydrology community. The key to runoff simulation in ungauged basins is generally involved with a reliable parameter estimation in a rainfall-runoff model. However, the parameter estimation of the rainfall-runoff model is a complex issue due to an insufficient hydrologic data. This study aims to regionalize the parameters of a continuous rainfall-runoff model in conjunction with a Bayesian statistical technique to consider uncertainty more precisely associated with the parameters. First, this study employed Bayesian Markov Chain Monte Carlo scheme for the estimation of the Sacramento rainfall-runoff model. The Sacramento model is calibrated against observed daily runoff data, and finally, the posterior density function of the parameters is derived. Second, we applied a multiple linear regression model to the set of the parameters with watershed characteristics, to obtain a functional relationship between pairs of variables. The proposed model was also validated with gauged watersheds in accordance with the efficiency criteria such as the Nash-Sutcliffe efficiency, index of agreement and the coefficient of correlation.

Bayesian analysis of latent factor regression model (내재된 인자회귀모형의 베이지안 분석법)

  • Kyung, Minjung
    • The Korean Journal of Applied Statistics
    • /
    • v.33 no.4
    • /
    • pp.365-377
    • /
    • 2020
  • We discuss latent factor regression when constructing a common structure inherent among explanatory variables to solve multicollinearity and use them as regressors to construct a linear model of a response variable. Bayesian estimation with LASSO prior of a large penalty parameter to construct a significant factor loading matrix of intrinsic interests among infinite latent structures. The estimated factor loading matrix with estimated other parameters can be inversely transformed into linear parameters of each explanatory variable and used as prediction models for new observations. We apply the proposed method to Product Service Management data of HBAT and observe that the proposed method constructs the same factors of general common factor analysis for the fixed number of factors. The calculated MSE of predicted values of Bayesian latent factor regression model is also smaller than the common factor regression model.

Development of Bayesian Multiple Quantile Regression model and Estimation fo Future Design Rainfall with Increased Temperature (베이지안 다중분위회귀분석모형 개발 및 온도상승에 따른 미래 확률강수량 전망)

  • Uranchimeg, Sumiya;Kim, Jin-Guk;Kwon, Hyun-Han
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2019.05a
    • /
    • pp.22-22
    • /
    • 2019
  • 최근 전 세계적으로 급증하는 기후변화의 영향으로 인해 강우량 증가에 따른 이상홍수 발생 및 댐 여유고 부족 등 다양한 위험인자가 노출되고 있다. 이러한 예상치 못한 이상홍수는 실제 거주하고 있는 사람들을 위협할 수 있으며, 하천 범람으로 인해 2차 3차 피해가 일어날 가능성이 존재하고 있다. 이에 다양한 자연재해로부터 인명 및 재산 피해를 방지 및 저감하기 위한 목적으로 다양한 수공구조물이 존재하며, 수자원 관리계획 수립의 목적에 따라 다양한 강수량이 활용되고 있다. 특히, 지구온난화에 따른 기후변화 영향을 고려한 연최대 강수량 및 확률강수량 산정이 필요한 시점이며, 온도변화에 따른 증기압 계산식인 Clausius-Clapeyron 관계에 따르면 대기 온도가 $1^{\circ}C$ 상승할 때 대기수분량이 6~7% 증가하여 평균 온도상승에 따라 극치강수량 발생 잠재력이 향상 될 것으로 전망되고 있다. 본 연구에서는 온도상승에 따른 극치강수량의 변화를 베이지안 다중분위회귀분석모형을 통해 산정하여 CORDEX 온도자료 기반의 미래 극치강수량을 전망하였다. 본 연구결과 100년 이상 빈도의 강수량은 온도상승에 따라 급격히 증가하는 추세를 확인하였으며, 2100년까지 온도상승을 고려한 최대 극치강수량은 1500mm를 넘을 가능성을 확인하였다.

  • PDF

Multiple imputation and synthetic data (다중대체와 재현자료 작성)

  • Kim, Joungyoun;Park, Min-Jeong
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.1
    • /
    • pp.83-97
    • /
    • 2019
  • As society develops, the dissemination of microdata has increased to respond to diverse analytical needs of users. Analysis of microdata for policy making, academic purposes, etc. is highly desirable in terms of value creation. However, the provision of microdata, whose usefulness is guaranteed, has a risk of exposure of personal information. Several methods have been considered to ensure the protection of personal information while ensuring the usefulness of the data. One of these methods has been studied to generate and utilize synthetic data. This paper aims to understand the synthetic data by exploring methodologies and precautions related to synthetic data. To this end, we first explain muptiple imputation, Bayesian predictive model, and Bayesian bootstrap, which are basic foundations for synthetic data. And then, we link these concepts to the construction of fully/partially synthetic data. To understand the creation of synthetic data, we review a real longitudinal synthetic data example which is based on sequential regression multivariate imputation.

The probabilistic estimation of inundation region using a multiple logistic regression analysis (다중 Logistic 회귀분석을 통한 침수지역의 확률적 도출)

  • Jung, Minkyu;Kim, Jin-Guk;Uranchimeg, Sumiya;Kwon, Hyun-Han
    • Journal of Korea Water Resources Association
    • /
    • v.53 no.2
    • /
    • pp.121-129
    • /
    • 2020
  • The increase of impervious surface and development along the river due to urbanization not only causes an increase in the number of associated flood risk factors but also exacerbates flood damage, leading to difficulties in flood management. Flood control measures should be prioritized based on various geographical information in urban areas. In this study, a probabilistic flood hazard assessment was applied to flood-prone areas near an urban river. Flood hazard maps were alternatively considered and used to describe the expected inundation areas for a given set of predictors such as elevation, slope, runoff curve number, and distance to river. This study proposes a Bayesian logistic regression-based flood risk model that aims to provide a probabilistic risk metric such as population-at-risk (PAR). Finally, the logistic regression model demonstrates the probabilistic flood hazard maps for the entire area.

Robust multiple imputation method for missings with boundary and outliers (한계와 이상치가 있는 결측치의 로버스트 다중대체 방법)

  • Park, Yousung;Oh, Do Young;Kwon, Tae Yeon
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.6
    • /
    • pp.889-898
    • /
    • 2019
  • The problem of missing value imputation for variables in surveys that include item missing becomes complicated if outliers and logical boundary conditions between other survey items cannot be ignored. If there are outliers and boundaries in a variable including missing values, imputed values based on previous regression-based imputation methods are likely to be biased and not meet boundary conditions. In this paper, we approach these difficulties in imputation by combining various robust regression models and multiple imputation methods. Through a simulation study on various scenarios of outliers and boundaries, we find and discuss the optimal combination of robust regression and multiple imputation method.

Bayesian quantile regression analysis of private education expenses for high scool students in Korea (일반계 고등학생 사교육비 지출에 대한 베이지안 분위회귀모형 분석)

  • Oh, Hyun Sook
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.6
    • /
    • pp.1457-1469
    • /
    • 2017
  • Private education expenses is one of the key issues in Korea and there have been many discussions about it. Academically, most of previous researches for private education expenses have used multiple regression linear model based on ordinary least squares (OLS) method. However, if the data do not satisfy the basic assumptions of the OLS method such as the normality and homoscedasticity, there is a problem with the reliability of estimations of parameters. In this case, quantile regression model is preferred to OLS model since it does not depend on the assumptions of nonnormality and heteroscedasticity for the data. In the present study, the data from a survey on private education expenses, conducted by Statistics Korea in 2015 has been analyzed for investigation of the impacting factors for private education expenses. Since the data do not satisfy the OLS assumptions, quantile regression model has been employed in Bayesian approach by using gibbs sampling method. The analysis results show that the gender of the student, parent's age, and the time and cost of participating after school are not significant. Household income is positively significant in proportion to the same size for all levels (quantiles) of private education expenses. Spending on private education in Seoul is higher than other regions and the regional difference grows as private education expenditure increases. Total time for private education and student's achievement have positive effect on the lower quantiles than the higher quantiles. Education level of father is positively significant for midium-high quantiles only, but education level of mother is for all but low quantiles. Participating after school is positively significant for the lower quantiles but EBS textbook cost is positively significant for the higher quantiles.