• Title/Summary/Keyword: Bayesian logistic regression

Search Result 36, Processing Time 0.026 seconds

Comparison of nomograms designed to predict hypertension with a complex sample (고혈압 예측을 위한 노모그램 구축 및 비교)

  • Kim, Min Ho;Shin, Min Seok;Lee, Jea Young
    • The Korean Journal of Applied Statistics
    • /
    • v.33 no.5
    • /
    • pp.555-567
    • /
    • 2020
  • Hypertension has a steadily increasing incidence rate as well as represents a risk factors for secondary diseases such as cardiovascular disease. Therefore, it is important to predict the incidence rate of the disease. In this study, we constructed nomograms that can predict the incidence rate of hypertension. We use data from the Korean National Health and Nutrition Examination Survey (KNHANES) for 2013-2016. The complex sampling data required the use of a Rao-Scott chi-squared test to identify 10 risk factors for hypertension. Smoking and exercise variables were not statistically significant in the Logistic regression; therefore, eight effects were selected as risk factors for hypertension. Logistic and Bayesian nomograms constructed from the selected risk factors were proposed and compared. The constructed nomograms were then verified using a receiver operating characteristics curve and calibration plot.

Effective Computation for Odds Ratio Estimation in Nonparametric Logistic Regression

  • Kim, Young-Ju
    • Communications for Statistical Applications and Methods
    • /
    • v.16 no.4
    • /
    • pp.713-722
    • /
    • 2009
  • The estimation of odds ratio and corresponding confidence intervals for case-control data have been done by traditional generalized linear models which assumed that the logarithm of odds ratio is linearly related to risk factors. We adapt a lower-dimensional approximation of Gu and Kim (2002) to provide a faster computation in nonparametric method for the estimation of odds ratio by allowing flexibility of the estimating function and its Bayesian confidence interval under the Bayes model for the lower-dimensional approximations. Simulation studies showed that taking larger samples with the lower-dimensional approximations help to improve the smoothing spline estimates of odds ratio in this settings. The proposed method can be used to analyze case-control data in medical studies.

A Study of Performance Comparison of MOOC Dropout Prediction utilizing Machine Learning (기계학습 방법을 이용한 MOOC 학습자의 중도 포기 예측 성능 비교 연구)

  • Hur, Yun-A;Lim, Heui-Seok
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2016.10a
    • /
    • pp.323-326
    • /
    • 2016
  • 웹 서비스를 기반으로 이루어진 MOOC(Massive Open Online Course)는 대규모 학습자에게 공개된 온라인 교육이다. MOOC는 교수와 학습자 사이 커뮤니티를 통해 상호 참여적으로 수업을 진행한다. 그러나 무료로 강의를 들을 수 있고 성적을 내지 않기 때문에 학습자들에게 큰 동기 부여가 되지 않아 등록하는 학습자는 많지만 수료하는 학습자는 현저히 적게 나타났다. 본 논문은 이러한 문제 해결 방안 마련을 위해 KDD Cup 2015에서 제공한 MOOC 데이터를 통해 중도 포기와 관련된 변수들을 선정하였으며, Decision Tree, KNN, Logistic Regression, Naive Bayesian, SVM, Neural Network인 6가지 머신 러닝 알고리즘을 통해 데이터 예측의 정확률을 확인하였다. 그 결과 Naive Bayesian이 89.3%로 가장 높은 정확률을 보였다. 본 연구를 통해 중도포기를 정확히 예측하며, 향후 학습자들에게 특정 동기부여의 효과로 학습을 수료하는 결과를 기대할 수 있다.

On a Bayes Criterion for the Goodness-of-Link Test for Binary Response Regression Models : Probit Link versus Logit Link

  • Kim, Hea-Jung
    • Journal of the Korean Statistical Society
    • /
    • v.26 no.2
    • /
    • pp.261-276
    • /
    • 1997
  • In the context of binary response regression, the problem of constructing Bayesian goodness-of-link test for testing logit link versus probit link is considered. Based upon the well known facts that cdf of logistic variate .approx. cdf of $t_{8}$/.634 and, as .nu. .to. .infty., cdf of $t_{\nu}$ approximates to that of N(0,1), Bayes factor is derived as a test criterion. A synthesis of the Gibbs sampling and a marginal likelihood estimation scheme is also proposed to compute the Bayes factor. Performance of the test is investigated via Monte Carlo study. The new test is also illustrated with an empirical data example.e.

  • PDF

Imputation for Binary or Ordered Categorical Traits Based on the Bayesian Threshold Model (베이지안 분계점 모형에 의한 순서 범주형 변수의 대체)

  • Lee Seung-Chun
    • The Korean Journal of Applied Statistics
    • /
    • v.18 no.3
    • /
    • pp.597-606
    • /
    • 2005
  • The nonresponse in sample survey causes a problem when it comes time to analyze dataset in public-use files where the user has only complete-data methods available and has limited information about the reasons for nonresponse. Recently imputation for nonresponse is becoming a standard approach for handling nonresponse and various imputation methods have been devised . However, most imputation methods concern with continuous traits while many interesting features are measured by binary or ordered categorical scales in sample survey. In this note. an imputation method for ignorable nonresponse in binary or ordered categorical traits is considered.

Estimating Probability of Mode Choice at Regional Level by Considering Spatial Association of Departure Place (출발지 공간 연관성을 고려한 지역별 수단선택확률 추정 연구)

  • Eom, Jin-Ki;Park, Man-Sik;Heo, Tae-Young
    • Journal of the Korean Society for Railway
    • /
    • v.12 no.5
    • /
    • pp.656-662
    • /
    • 2009
  • In general, the analysis of travelers' mode choice behavior is accomplished by developing the utility functions which reflect individual's preference of mode choice according to their demographic and travel characteristics. In this paper, we propose a methodology that takes the spatial effects of individuals' departure locations into account in the mode choice model. The statistical models considered here are spatial logistic regression model and conditional autoregressive model taking a spatial association parameter into account. We employed the Bayesian approach in order to obtain more reliable parameter estimates. The proposed methodology allows us to estimate mode shares by departure places even though the survey does not cover all areas.

Bayesian logit models with auxiliary mixture sampling for analyzing diabetes diagnosis data (보조 혼합 샘플링을 이용한 베이지안 로지스틱 회귀모형 : 당뇨병 자료에 적용 및 분류에서의 성능 비교)

  • Rhee, Eun Hee;Hwang, Beom Seuk
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.1
    • /
    • pp.131-146
    • /
    • 2022
  • Logit models are commonly used to predicting and classifying categorical response variables. Most Bayesian approaches to logit models are implemented based on the Metropolis-Hastings algorithm. However, the algorithm has disadvantages of slow convergence and difficulty in ensuring adequacy for the proposal distribution. Therefore, we use auxiliary mixture sampler proposed by Frühwirth-Schnatter and Frühwirth (2007) to estimate logit models. This method introduces two sequences of auxiliary latent variables to make logit models satisfy normality and linearity. As a result, the method leads that logit model can be easily implemented by Gibbs sampling. We applied the proposed method to diabetes data from the Community Health Survey (2020) of the Korea Disease Control and Prevention Agency and compared performance with Metropolis-Hastings algorithm. In addition, we showed that the logit model using auxiliary mixture sampling has a great classification performance comparable to that of the machine learning models.

A Comparative Study on the Accuracy of Important Statistical Prediction Techniques for Marketing Data (마케팅 데이터를 대상으로 중요 통계 예측 기법의 정확성에 대한 비교 연구)

  • Cho, Min-Ho
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.14 no.4
    • /
    • pp.775-780
    • /
    • 2019
  • Techniques for predicting the future can be categorized into statistics-based and deep-run-based techniques. Among them, statistic-based techniques are widely used because simple and highly accurate. However, working-level officials have difficulty using many analytical techniques correctly. In this study, we compared the accuracy of prediction by applying multinomial logistic regression, decision tree, random forest, support vector machine, and Bayesian inference to marketing related data. The same marketing data was used, and analysis was conducted by using R. The prediction results of various techniques reflecting the data characteristics of the marketing field will be a good reference for practitioners.

Extraction of Hazardous Freeway Sections Using GPS-Based Probe Vehicle Speed Data (GPS 프로브 차량 속도자료를 이용한 고속도로 사고 위험구간 추출기법)

  • Park, Jae-Hong;Oh, Cheol;Kim, Tae-Hyung;Joo, Shin-Hye
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.9 no.3
    • /
    • pp.73-84
    • /
    • 2010
  • This study presents a novel method to identify hazardous segments of freeway using global positioning system(GPS) based probe vehicle data. A variety of candidate contributing factors leading to higher potential of accident occurrence were extracted from the probe vehicle dataset. The research problem was defined as a classification problem, then a well-known classifier, bayesian neural network was adopted to solve the problem. A binary logistic regression technique was also used for selecting salient input variables. Test results showed that the proposed method is promising in extracting hazardous freeway sections. The outcome of this study will be effectively used for evaluating the safety of freeway sections and deriving countermeasures to prevent accidents.

Investigating Opinion Mining Performance by Combining Feature Selection Methods with Word Embedding and BOW (Bag-of-Words) (속성선택방법과 워드임베딩 및 BOW (Bag-of-Words)를 결합한 오피니언 마이닝 성과에 관한 연구)

  • Eo, Kyun Sun;Lee, Kun Chang
    • Journal of Digital Convergence
    • /
    • v.17 no.2
    • /
    • pp.163-170
    • /
    • 2019
  • Over the past decade, the development of the Web explosively increased the data. Feature selection step is an important step in extracting valuable data from a large amount of data. This study proposes a novel opinion mining model based on combining feature selection (FS) methods with Word embedding to vector (Word2vec) and BOW (Bag-of-words). FS methods adopted for this study are CFS (Correlation based FS) and IG (Information Gain). To select an optimal FS method, a number of classifiers ranging from LR (logistic regression), NN (neural network), NBN (naive Bayesian network) to RF (random forest), RS (random subspace), ST (stacking). Empirical results with electronics and kitchen datasets showed that LR and ST classifiers combined with IG applied to BOW features yield best performance in opinion mining. Results with laptop and restaurant datasets revealed that the RF classifier using IG applied to Word2vec features represents best performance in opinion mining.