• Title/Summary/Keyword: 확률적 회귀모형

Search Result 184, Processing Time 0.031 seconds

Bayesian ordinal probit semiparametric regression models: KNHANES 2016 data analysis of the relationship between smoking behavior and coffee intake (베이지안 순서형 프로빗 준모수 회귀 모형 : 국민건강영양조사 2016 자료를 통한 흡연양태와 커피섭취 간의 관계 분석)

  • Lee, Dasom;Lee, Eunji;Jo, Seogil;Choi, Taeryeon
    • The Korean Journal of Applied Statistics
    • /
    • v.33 no.1
    • /
    • pp.25-46
    • /
    • 2020
  • This paper presents ordinal probit semiparametric regression models using Bayesian Spectral Analysis Regression (BSAR) method. Ordinal probit regression is a way of modeling ordinal responses - usually more than two categories - by connecting the probability of falling into each category explained by a combination of available covariates using a probit (an inverse function of normal cumulative distribution function) link. The Bayesian probit model facilitates posterior sampling by bringing a latent variable following normal distribution, therefore, the responses are categorized by the cut-off points according to values of latent variables. In this paper, we extend the latent variable approach to a semiparametric model for the Bayesian ordinal probit regression with nonparametric functions using a spectral representation of Gaussian processes based BSAR method. The latent variable is decomposed into a parametric component and a nonparametric component with or without a shape constraint for modeling ordinal responses and predicting outcomes more flexibly. We illustrate the proposed methods with simulation studies in comparison with existing methods and real data analysis applied to a Korean National Health and Nutrition Examination Survey (KNHANES) 2016 for investigating nonparametric relationship between smoking behavior and coffee intake.

Estimating Probability of Mode Choice at Regional Level by Considering Spatial Association of Departure Place (출발지 공간 연관성을 고려한 지역별 수단선택확률 추정 연구)

  • Eom, Jin-Ki;Park, Man-Sik;Heo, Tae-Young
    • Journal of the Korean Society for Railway
    • /
    • v.12 no.5
    • /
    • pp.656-662
    • /
    • 2009
  • In general, the analysis of travelers' mode choice behavior is accomplished by developing the utility functions which reflect individual's preference of mode choice according to their demographic and travel characteristics. In this paper, we propose a methodology that takes the spatial effects of individuals' departure locations into account in the mode choice model. The statistical models considered here are spatial logistic regression model and conditional autoregressive model taking a spatial association parameter into account. We employed the Bayesian approach in order to obtain more reliable parameter estimates. The proposed methodology allows us to estimate mode shares by departure places even though the survey does not cover all areas.

Patent Keyword Analysis using Gamma Regression Model and Visualization

  • Jun, Sunghae
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.8
    • /
    • pp.143-149
    • /
    • 2022
  • Since patent documents contain detailed results of research and development technologies, many studies on various patent analysis methods for effective technology analysis have been conducted. In particular, research on quantitative patent analysis by statistics and machine learning algorithms has been actively conducted recently. The most used patent data in quantitative patent analysis is technology keywords. Most of the existing methods for analyzing the keyword data were models based on the Gaussian probability distribution with random variable on real space from negative infinity to positive infinity. In this paper, we propose a model using gamma probability distribution to analyze the frequency data of patent keywords that can theoretically have values from zero to positive infinity. In addition, in order to determine the regression equation of the gamma-based regression model, two-mode network is constructed to visualize the technological association between keywords. Practical patent data is collected and analyzed for performance evaluation between the proposed method and the existing Gaussian-based analysis models.

Density estimation of summer extreme temperature over South Korea using mixtures of conditional autoregressive species sampling model (혼합 조건부 종추출모형을 이용한 여름철 한국지역 극한기온의 위치별 밀도함수 추정)

  • Jo, Seongil;Lee, Jaeyong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.5
    • /
    • pp.1155-1168
    • /
    • 2016
  • This paper considers a probability density estimation problem of climate values. In particular, we focus on estimating probability densities of summer extreme temperature over South Korea. It is known that the probability density of climate values at one location is similar to those at near by locations and one doesn't follow well known parametric distributions. To accommodate these properties, we use a mixture of conditional autoregressive species sampling model, which is a nonparametric Bayesian model with a spatial dependency. We apply the model to a dataset consisting of summer maximum temperature and minimum temperature over South Korea. The dataset is obtained from University of East Anglia.

An estimation method based on autocovariance in the simple linear regression model (단순 선형회귀 모형에서 자기공분산에 근거한 최적 추정 방법)

  • Park, Cheol-Yong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.20 no.2
    • /
    • pp.251-260
    • /
    • 2009
  • In this study, we propose a new estimation method based on autocovariance for selecting optimal estimators of the regression coefficients in the simple linear regression model. Although this method does not seem to be intuitively attractive, these estimators are unbiased for the corresponding regression coefficients. When the exploratory variable takes the equally spaced values between 0 and 1, under mild conditions which are satisfied when errors follow an autoregressive moving average model, we show that these estimators have asymptotically the same distributions as the least squares estimators. Additionally, under the same conditions as before, we provide a self-contained proof that these estimators converge in probability to the corresponding regression coefficients.

  • PDF

Stochastic Volatility Model vs. GARCH Model : A Comparative Study (확률적 변동성 모형과 자기회귀이분산 모형의 비교분석)

  • 이용흔;김삼용;황선영
    • The Korean Journal of Applied Statistics
    • /
    • v.16 no.2
    • /
    • pp.217-224
    • /
    • 2003
  • The volatility in the financial data is usually measured by conditional variance. Two main streams for gauging conditional variance are stochastic volatility (SV) model and autoregressive type approach (GARCH). This article is conducting comparative study between SV and GARCH through the Korean Stock Prices Index (KOSPI) data. It is seen that SV model is slightly better than GARCH(1,1) in analyzing KOSPI data.

Comparison of nomogram construction methods using chronic obstructive pulmonary disease (만성 폐쇄성 폐질환을 이용한 노모그램 구축과 비교)

  • Seo, Ju-Hyun;Lee, Jea-Young
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.3
    • /
    • pp.329-342
    • /
    • 2018
  • Nomogram is a statistical tool that visualizes the risk factors of the disease and then helps to understand the untrained people. This study used risk factors of chronic obstructive pulmonary disease (COPD) and compared with logistic regression model and naïve Bayesian classifier model. Data were analyzed using the Korean National Health and Nutrition Examination Survey 6th (2013-2015). First, we used 6 risk factors about COPD. We constructed nomogram using logistic regression model and naïve Bayesian classifier model. We also compared the nomograms constructed using the two methods to find out which method is more appropriate. The receiver operating characteristic curve and the calibration plot were used to verify each nomograms.

Analysis of Probability Density Function of Deposition Spot in Open Channel Flow (하천에서 유사의 침전 위치에 대한 확률밀도함수 분석)

  • Oh, Jungsun;Choi, Sung-Uk
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2016.05a
    • /
    • pp.50-50
    • /
    • 2016
  • 하천에서 유사 및 오염물질의 이동을 예측하기 위하여 초점을 두는 것에는 두 가지 요소가 있다. 입자의 농도로 나타낼 수 있는 양의 개념과 입자의 위치로 나타낼 수 있는 공간의 개념이 그것이다. 유사 입자와 같이 그 비중이 물보다 큰 경우, 흐름 내에서 침전과 부상의 메커니즘을 반복하게 되는데 최종적으로 바닥에 침적하는 위치는 하상변동, 서식처 등 하천관리의 다양한 측면에서 매우 중요하다. 유사 입자가 바닥에 침적하는 위치를 예측하는 데에는 난류와 지형 같은 많은 불확실한 요소가 내포되어 있어, 같은 크기의 유사 입자라 하여도 하나의 exact point로 도달하지 않는다. 이러한 불확실한 요소를 고려하여 침전 위치를 산정하는 방법에 대한 연구가 필요하다. 따라서 본 연구에서는 침전 위치를 확률밀도함수로 나타내어 분석하고자 한다. 입자의 침전 위치를 확률밀도함수로 나타내기 위하여 입자 기반의 추적 모형을 사용하여 위치 데이터를 얻었으며, 이를 실험데이터와 비교하여 검증 후 확률밀도함수로 나타내었다. 그 결과 입자의 침적 위치에 대한 확률밀도함수는 로그정규분포를 띠고 있음을 확인하였으며, 확률밀도함수를 나타내는 매개변수를 물리 기반 회귀모형식으로 일반화 하여 나타낼 수 있었다.

  • PDF

Relationship between Interstate Highway Accidents and Heterogeneous Geometrics by Random Parameter Negative Binomial Model - A case of Interstate Highway in Washington State, USA (확률적 모수를 고려한 음이항모형에 의한 교통사고와 기하구조와의 관계 - 미국 워싱턴 주(州) 고속도로를 중심으로)

  • Park, Minho
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.33 no.6
    • /
    • pp.2437-2445
    • /
    • 2013
  • The objective of this study is finding the relationship between interstate highway accident frequencies and geometrics using Random Parameter Negative Binomial model. Even though it is impossible to take account of the same design criteria to the all segments or corridors on the road in reality, previous research estimated the fixed value of coefficients without considering each segment's characteristic. The drawback of the traditional negative binomial is not to explain the integrated variations in terms of time and the distinct characters specific segment has. This results in under-estimation of the standard error which inflates the t-value and finally, affects the modeling estimation. Therefore, this study tries to find the relationship of accident frequencies with the heterogeneous geometrics using 9-years and 7-interstate highway data in Washington State area. 16-types of geometrics are used to derive the model which is compared with the traditional negative binomial Model to understand which Model is more suitable. In addition, by calculating marginal effect and elasticity, heterogeneous variables' effect to the accidents are estimated. Hopefully, this study will help to estiblish the future policy of geometrics.

Estimation of performance for random binary search trees (확률적 이진 검색 트리 성능 추정)

  • 김숙영
    • Journal of the Korea Computer Industry Society
    • /
    • v.2 no.2
    • /
    • pp.203-210
    • /
    • 2001
  • To estimate relational models and test the theoretical hypotheses of binary tree search algorithms, we built binary search trees with random permutations of n (number of nodes) distinct numbers, which ranged from three to seven. Probabilities for building binary search trees corresponding to each possible height and balance factor were estimated. Regression models with variables of number of nodes, height, and average number of comparisons were estimated and the theorem of O(1g(n)) was accepted experimentally by a Lack of Test procedure. Analysis of Variance model was applied to compare the average number of comparisons with three groups by height and balance factor of the trees to test theoretical hypotheses of a binary search tree performance statistically.

  • PDF