• Title/Summary/Keyword: 회귀분포

Search Result 981, Processing Time 0.032 seconds

깁스표본기법을 이용한 설명변수 선택문제에서 사전분포의 설정-선형회귀모형을 중심으로-

  • 박종선;남궁평;한숙영
    • Communications for Statistical Applications and Methods
    • /
    • v.4 no.2
    • /
    • pp.333-343
    • /
    • 1997
  • 선형회귀분석에서 변수의 선택문제는 최적의 모형을 찾는데 아주 중요한 부분을 차지한다. George와 McCulloch(1993)는 계층적 베이즈 모형과 깁스표본법을 이용하여 선형회귀모형에서 변수를 선택하는 문제를 고려하였다. 이 논문에서는 George와 McCulloch의 모형을 바탕으로 각각의 설명변수가 모형에 포함될 사전확률을 객관적인 기준에 의하여 결정하는 문제를 고려하여 보았다.

  • PDF

확률화 블럭 계획법에서 동위회귀를 이용한 우산형 대립가설의 비모수검정법

  • 김동희;김영철
    • Communications for Statistical Applications and Methods
    • /
    • v.4 no.1
    • /
    • pp.167-175
    • /
    • 1997
  • 확률화 블럭 계획법에서 동위회귀를 이용하여 우산형 대립가설에 대한 비모수검정법을 제안하고자 한다. 제안된 검정통계량은 Mack과 Wolfe (1981)의 통계량에서 처리들에 가중치를 준 형태가 되며, 동위회귀를 이용하여 확률변수인 가중치를 구하고 붓스트랩을 이용한 소표본에서 모의 실험을 통하여 몇가지 사례 및 분포에 대해 제안된 통계량의 검정력을 알아본다.

  • PDF

Relative Error Prediction via Penalized Regression (벌점회귀를 통한 상대오차 예측방법)

  • Jeong, Seok-Oh;Lee, Seo-Eun;Shin, Key-Il
    • The Korean Journal of Applied Statistics
    • /
    • v.28 no.6
    • /
    • pp.1103-1111
    • /
    • 2015
  • This paper presents a new prediction method based on relative error incorporated with a penalized regression. The proposed method consists of fully data-driven procedures that is fast, simple, and easy to implement. An example of real data analysis and some simulation results were given to prove that the proposed approach works in practice.

Bayesian ordinal probit semiparametric regression models: KNHANES 2016 data analysis of the relationship between smoking behavior and coffee intake (베이지안 순서형 프로빗 준모수 회귀 모형 : 국민건강영양조사 2016 자료를 통한 흡연양태와 커피섭취 간의 관계 분석)

  • Lee, Dasom;Lee, Eunji;Jo, Seogil;Choi, Taeryeon
    • The Korean Journal of Applied Statistics
    • /
    • v.33 no.1
    • /
    • pp.25-46
    • /
    • 2020
  • This paper presents ordinal probit semiparametric regression models using Bayesian Spectral Analysis Regression (BSAR) method. Ordinal probit regression is a way of modeling ordinal responses - usually more than two categories - by connecting the probability of falling into each category explained by a combination of available covariates using a probit (an inverse function of normal cumulative distribution function) link. The Bayesian probit model facilitates posterior sampling by bringing a latent variable following normal distribution, therefore, the responses are categorized by the cut-off points according to values of latent variables. In this paper, we extend the latent variable approach to a semiparametric model for the Bayesian ordinal probit regression with nonparametric functions using a spectral representation of Gaussian processes based BSAR method. The latent variable is decomposed into a parametric component and a nonparametric component with or without a shape constraint for modeling ordinal responses and predicting outcomes more flexibly. We illustrate the proposed methods with simulation studies in comparison with existing methods and real data analysis applied to a Korean National Health and Nutrition Examination Survey (KNHANES) 2016 for investigating nonparametric relationship between smoking behavior and coffee intake.

Exploring interaction using 3-D residual plots in logistic regression model (3차원 잔차산점도를 이용한 로지스틱회귀모형에서 교호작용의 탐색)

  • Kahng, Myung-Wook
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.1
    • /
    • pp.177-185
    • /
    • 2014
  • Under bivariate normal distribution assumptions, the interaction and quadratic terms are needed in the logistic regression model with two predictors. However, depending on the correlation coefficient and the variances of two conditional distributions, the interaction and quadratic terms may not be necessary. Although the need for these terms can be determined by comparing the two scatter plots, it is not as useful for interaction terms. We explore the structure and usefulness of the 3-D residual plot as a tool for dealing with interaction in logistic regression models. If predictors have an interaction effect, a 3-D residual plot can show the effect. This is illustrated by simulated and real data.

A study on log-density with log-odds graph for variable selection in logistic regression (로지스틱회귀모형의 변수선택에서 로그-오즈 그래프를 통한 로그-밀도비 연구)

  • Kahng, Myung-Wook;Shin, Eun-Young
    • Journal of the Korean Data and Information Science Society
    • /
    • v.23 no.1
    • /
    • pp.99-111
    • /
    • 2012
  • The log-density ratio of the conditional densities of the predictors given the response variable provides useful information for variable selection in the logistic regression model. In this paper, we consider the predictors that are needed and how they should be included in the model. If the conditional distributions are skewed, the distributions can be considered as gamma distributions. Under this assumption, linear and log terms are generally included in the model. The log-odds graph is a very useful graphical tool in this study. A graphical study is presented which shows that if the conditional distributions of x|y for the two groups overlap significantly, we need both the linear and quadratic terms. On the contrary, if they are well separated, only the linear or log term is needed in the model.

Identification of Uncertainty in Fitting Rating Curve with Bayesian Regression (베이지안 회귀분석을 이용한 수위-유량 관계곡선의 불확실성 분석)

  • Kim, Sang-Ug;Lee, Kil-Seong
    • Journal of Korea Water Resources Association
    • /
    • v.41 no.9
    • /
    • pp.943-958
    • /
    • 2008
  • This study employs Bayesian regression analysis for fitting discharge rating curves. The parameter estimates using the Bayesian regression analysis were compared to ordinary least square method using the t-distribution. In these comparisons, the mean values from the t-distribution and the Bayesian regression are not significantly different. However, the difference between upper and lower limits are remarkably reduced with the Bayesian regression. Therefore, from the point of view of uncertainty analysis, the Bayesian regression is more attractive than the conventional method based on a t-distribution because the data size at the site of interest is typically insufficient to estimate the parameters in rating curve. The merits and demerits of the two types of estimation methods are analyzed through the statistical simulation considering heteroscedasticity. The validation of the Bayesian regression is also performed using real stage-discharge data which were observed at 5 gauges on the Anyangcheon basin. Because the true parameters at 5 gauges are unknown, the quantitative accuracy of the Bayesian regression can not be assessed. However, it can be suggested that the uncertainty in rating curves at 5 gauges be reduced by Bayesian regression.

Theoretical Derivation of IDF curve Using Probability Distribution Function of Rainfall Data (강우자료의 확률분포함수를 이용한 강우강도식의 이론적 유도)

  • Kim, Kew-Tae;Kim, Soo-Young;Kim, Tae-Soon;Heo, Jun-Haeng
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2007.05a
    • /
    • pp.1503-1506
    • /
    • 2007
  • 수공구조물의 설계를 위해서 주로 사용되는 강우강도식은 연최대치 강우자료를 이용하여 빈도별 혹은 지속기간별 확률강우량을 구한 후 이 값들을 선형 혹은 비선형식의 형태로 회귀분석하여 구하게 된다. 그러나, 이와 같이 회귀분석을 이용하여 추정된 강우강도식은 원래의 강우자료가 가지고 있는 확률적인 특성을 재현한다고 하기는 어렵기 때문에, 본 연구에서는 연최대치 강우자료에 대한 적정 확률분포형으로부터 직접 강우 강도식을 유도하는 방법을 적용하여 대상지역 강우강도식의 매개변수를 산정하였다. 선정된 적정 확률분포형을 이용하여 강우강도식의 매개변수를 추정하는데 있어서, 평균제곱오차의 제곱근을 최소화하는 형태의 목적함수를 구성한 후 유전자알고리즘을 이용하여 적절한 매개변수를 산정하였다. 산정된 매개변수를 사용한 강우강도식으로 구한 결과값과 기존의 강우강도식에 의한 결과값 그리고 지점빈도해석에 의한 결과값을 비교하여 본 연구에서 산정된 강우강도식의 적용성을 평가해 보았다.

  • PDF

Measurement Error Model with Skewed Normal Distribution (왜도정규분포 기반의 측정오차모형)

  • Heo, Tae-Young;Choi, Jungsoon;Park, Man Sik
    • The Korean Journal of Applied Statistics
    • /
    • v.26 no.6
    • /
    • pp.953-958
    • /
    • 2013
  • This study suggests a measurement error model based on skewed normal distribution instead of normal distribution to identify slope parameter properties in a simple liner regression model. We prove that the slope parameter in a simple linear regression model is underestimated.