• 제목/요약/키워드: Bayesian multiple regression

검색결과 41건 처리시간 0.024초

다중대체와 재현자료 작성 (Multiple imputation and synthetic data)

  • 김정연;박민정
    • 응용통계연구
    • /
    • 제32권1호
    • /
    • pp.83-97
    • /
    • 2019
  • 사회가 발전함에 따라 이용자의 다양한 분석 요구에 대응하기 위해 개인 단위로 구성된 마이크로데이터 제공이 증가했다. 나아가 센서스, 행정자료와 같은 전수자료를 마이크로데이터 형태로 제공받아 연구하고자 하는 요구 역시 커지고 있다. 정책결정, 학술목적 등을 위한 마이크로데이터 분석은 가치 창출 측면에서 대단히 바람직하다. 하지만 자료 유용성이 확보된 마이크로데이터 제공은 개인정보가 노출될 가능성이라는 위험을 가질 수 밖에 없다. 이에, 자료의 유용성을 확보하면서 개인정보보호를 보장할 수 있는 여러 방법들이 고려되어 왔다. 이러한 방법 중 하나로 재현자료(synthetic data)를 생성해서 활용하는 방법이 연구되어 왔다. 본 논문은 재현자료 생성과 관련된 방법론 및 주의사항을 소개하여, 재현자료의 이해를 도모하고자 한다. 이를 위해 재현자료 작성에 필수적인 다중대체, 베이지안 예측 모형 및 베이지안 붓스트랩 등의 개념들을 먼저 설명하고, 완전 재현자료 및 부분 재현자료에 대해 살펴본다. 특히, 재현자료 작성을 심도 깊이 이해하기 위해 순차회귀 다중대체(sequential regression multivariate imputation)를 이용해 경시적(longitudinal) 자료를 재현자료로 작성하는 구체적 사례를 살펴본다.

Optimal fractions in terms of a prediction-oriented measure

  • Lee, Won-Woo
    • Journal of the Korean Statistical Society
    • /
    • 제22권2호
    • /
    • pp.209-217
    • /
    • 1993
  • The multicollinearity problem in a multiple linear regression model may present deleterious effects on predictions. Thus, its is desirable to consider the optimal fractions with respect to the unbiased estimate of the mean squares errors of the predicted values. Interstingly, the optimal fractions can be also illuminated by the Bayesian inerpretation of the general James-Stein estimators.

  • PDF

Estimation for misclassified data with ultra-high levels

  • Kang, Moonsu
    • Journal of the Korean Data and Information Science Society
    • /
    • 제27권1호
    • /
    • pp.217-223
    • /
    • 2016
  • Outcome misclassification is widespread in classification problems, but methods to account for it are rarely used. In this paper, the problem of inference with misclassified multinomial logit data with a large number of multinomial parameters is addressed. We have had a significant swell of interest in the development of novel methods to infer misclassified data. One simulation study is shown regarding how seriously misclassification issue occurs if the number of categories increase. Then, using the group lasso regression, we will show how the best model should be fitted for that kind of multinomial regression problems comprehensively.

Bayesian smoothing under structural measurement error model with multiple covariates

  • Hwang, Jinseub;Kim, Dal Ho
    • Journal of the Korean Data and Information Science Society
    • /
    • 제28권3호
    • /
    • pp.709-720
    • /
    • 2017
  • In healthcare and medical research, many important variables have a measurement error such as body mass index and laboratory data. It is also not easy to collect samples of large size because of high cost and long time required to collect the target patient satisfied with inclusion and exclusion criteria. Beside, the demand for solving a complex scientific problem has highly increased so that a semiparametric regression approach could be of substantial value solving this problem. To address the issues of measurement error, small domain and a scientific complexity, we conduct a multivariable Bayesian smoothing under structural measurement error covariate in this article. Specifically we enhance our previous model by incorporating other useful auxiliary covariates free of measurement error. For the regression spline, we use a radial basis functions with fixed knots for the measurement error covariate. We organize a fully Bayesian approach to fit the model and estimate parameters using Markov chain Monte Carlo. Simulation results represent that the method performs well. We illustrate the results using a national survey data for application.

Analyzing effect and importance of input predictors for urban streamflow prediction based on a Bayesian tree-based model

  • Nguyen, Duc Hai;Bae, Deg-Hyo
    • 한국수자원학회:학술대회논문집
    • /
    • 한국수자원학회 2022년도 학술발표회
    • /
    • pp.134-134
    • /
    • 2022
  • Streamflow forecasting plays a crucial role in water resource control, especially in highly urbanized areas that are very vulnerable to flooding during heavy rainfall event. In addition to providing the accurate prediction, the evaluation of effects and importance of the input predictors can contribute to water manager. Recently, machine learning techniques have applied their advantages for modeling complex and nonlinear hydrological processes. However, the techniques have not considered properly the importance and uncertainty of the predictor variables. To address these concerns, we applied the GA-BART, that integrates a genetic algorithm (GA) with the Bayesian additive regression tree (BART) model for hourly streamflow forecasting and analyzing input predictors. The Jungrang urban basin was selected as a case study and a database was established based on 39 heavy rainfall events during 2003 and 2020 from the rain gauges and monitoring stations. For the goal of this study, we used a combination of inputs that included the areal rainfall of the subbasins at current time step and previous time steps and water level and streamflow of the stations at time step for multistep-ahead streamflow predictions. An analysis of multiple datasets including different input predictors was performed to define the optimal set for streamflow forecasting. In addition, the GA-BART model could reasonably determine the relative importance of the input variables. The assessment might help water resource managers improve the accuracy of forecasts and early flood warnings in the basin.

  • PDF

베이지안 다중 비교차 분위회귀 분석 기법을 이용한 비정상성 빈도해석 모형 개발 (A Development of Nonstationary Frequency Analysis Model using a Bayesian Multiple Non-crossing Quantile Regression Approach)

  • 오랑치맥 솜야;김용탁;권영준;권현한
    • 한국연안방재학회지
    • /
    • 제4권3호
    • /
    • pp.119-131
    • /
    • 2017
  • Global warming under the influence of climate change and its direct impact on glacial and sea level are known issue. However, there is a lack of research on an indirect impact of climate change such as coastal structure design which is mainly based on a frequency analysis of water level under the stationary assumption, meaning that maximum sea level will not vary significantly over time. In general, stationary assumption does not hold and may not be valid under a changing climate. Therefore, this study aims to develop a novel approach to explore possible distributional changes in annual maximum sea levels (AMSLs) and provide the estimate of design water level for coastal structures using a multiple non-crossing quantile regression based nonstationary frequency analysis within a Bayesian framework. In this study, 20 tide gauge stations, where more than 30 years of hourly records are available, are considered. First, the possible distributional changes in the AMSLs are explored, focusing on the change in the scale and location parameter of the probability distributions. The most of the AMSLs are found to be upward-convergent/divergent pattern in the distribution, and the significance test on distributional changes is then performed. In this study, we confirm that a stationary assumption under the current climate characteristic may lead to underestimation of the design sea level, which results in increase in the failure risk in coastal structures. A detailed discussion on the role of the distribution changes for design water level is provided.

다중 Logistic 회귀분석을 통한 침수지역의 확률적 도출 (The probabilistic estimation of inundation region using a multiple logistic regression analysis)

  • 정민규;김진국;오랑치맥 솜야;권현한
    • 한국수자원학회논문집
    • /
    • 제53권2호
    • /
    • pp.121-129
    • /
    • 2020
  • 도시화로 인한 불투수층 증가와 하천 주변 개발은 홍수 시 위험에 노출되는 재해요인의 증가뿐 아니라 피해의 파급을 발생시켜 홍수 관리 측면에서 어려움을 낳는다. 홍수 방재대책을 위해서는 도시지역에 분포하는 다양한 지표면 공간특성을 반영하여 침수가 예상되는 지역에 대한 파악이 우선시되어야 한다. 본 연구에서는 도시하천의 홍수 위험지역을 대상으로 확률적 홍수위험 평가가 수행되었다. 홍수와 관련된 지형적 영향요인인 고도, 경사, 유출곡선지수, 하천까지 거리를 예측변수로 하여 하천 주변 침수 예상지역을 설명하기 위해 모형의 학습데이터로 100년 빈도 홍수위험 지도가 사용되었다. 연구 대상 지역은 격자로 변환하여 Bayesian Logistic 회귀분석을 수행하여 각 격자별로 홍수영향요인이 침수 여부를 설명하는 모형을 구축하였다. 최종적으로 모형을 통해 대상 지역 전체에 대하여 침수위험도를 확률적으로 제시하였다.

Simultaneous outlier detection and variable selection via difference-based regression model and stochastic search variable selection

  • Park, Jong Suk;Park, Chun Gun;Lee, Kyeong Eun
    • Communications for Statistical Applications and Methods
    • /
    • 제26권2호
    • /
    • pp.149-161
    • /
    • 2019
  • In this article, we suggest the following approaches to simultaneous variable selection and outlier detection. First, we determine possible candidates for outliers using properties of an intercept estimator in a difference-based regression model, and the information of outliers is reflected in the multiple regression model adding mean shift parameters. Second, we select the best model from the model including the outlier candidates as predictors using stochastic search variable selection. Finally, we evaluate our method using simulations and real data analysis to yield promising results. In addition, we need to develop our method to make robust estimates. We will also to the nonparametric regression model for simultaneous outlier detection and variable selection.

유역특성인자를 활용한 Sacramento 장기유출모형의 매개변수 지역화 기법 연구 (A Study on Regionalization of Parameters for Sacramento Continuous Rainfall-Runoff Model Using Watershed Characteristics)

  • 김태정;정가인;김기영;권현한
    • 한국수자원학회논문집
    • /
    • 제48권10호
    • /
    • pp.793-806
    • /
    • 2015
  • 미계측유역의 유출량 모의는 수문학 분야에서 필수적인 사항이다. 강우-유출 모형을 이용하여 신뢰성 있는 유출량을 모의하기 위한 핵심사항은 강우-유출 모형의 매개변수를 추정하는 것이다. 하지만 현재 우리나라는 불충분한 수문자료로 인해 매개변수 추정에 어려움이 존재한다. 본 연구의 목표는 불확실성 반영을 위한 Bayesian 통계기법 기반의 강우-유출 모형의 매개변수를 지역화 하는 것이다. 그 방법은 다음과 같다. 첫째, 본 연구는 세계적으로 널리 사용되고 있는 Sacramento 강우-유출 모형에 Bayesian Markov Chain Monte Carlo 기법을 연계한 Bayesian Sacramento 강우-유출 모형을 사용하여 계측유역을 대상으로 13개 매개변수를 최적화하고 각 매개변수의 사후분포를 도출하였다. 둘째, 매개변수와 유역특성인자 사이에 회귀특성을 얻기 위해 다중선형회귀분석을 적용하여 유역특성을 고려한 지역화 매개변수를 결정하였다. 다중회귀분석을 통하여 산정된 지역화 매개변수를 계측유역에 전이하여 유출량을 모의 후 통계적 효율기준인 N-S계수, 일치계수 및 상관계수를 사용하여 지역화 매개변수 검증을 수행하였다.

Quality Variable Prediction for Dynamic Process Based on Adaptive Principal Component Regression with Selective Integration of Multiple Local Models

  • Tian, Ying;Zhu, Yuting
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제15권4호
    • /
    • pp.1193-1215
    • /
    • 2021
  • The measurement of the key product quality index plays an important role in improving the production efficiency and ensuring the safety of the enterprise. Since the actual working conditions and parameters will inevitably change to some extent with time, such as drift of working point, wear of equipment and temperature change, etc., these will lead to the degradation of the quality variable prediction model. To deal with this problem, the selective integrated moving windows based principal component regression (SIMV-PCR) is proposed in this study. In the algorithm of traditional moving window, only the latest local process information is used, and the global process information will not be enough. In order to make full use of the process information contained in the past windows, a set of local models with differences are selected through hypothesis testing theory. The significance levels of both T - test and χ2 - test are used to judge whether there is identity between two local models. Then the models are integrated by Bayesian quality estimation to improve the accuracy of quality variable prediction. The effectiveness of the proposed adaptive soft measurement method is verified by a numerical example and a practical industrial process.