• 제목/요약/키워드: Bayesian linear regression

검색결과 72건 처리시간 0.022초

머신러닝 알고리즘 기반의 의료비 예측 모델 개발 (Development of Medical Cost Prediction Model Based on the Machine Learning Algorithm)

  • Han Bi KIM;Dong Hoon HAN
    • Journal of Korea Artificial Intelligence Association
    • /
    • 제1권1호
    • /
    • pp.11-16
    • /
    • 2023
  • Accurate hospital case modeling and prediction are crucial for efficient healthcare. In this study, we demonstrate the implementation of regression analysis methods in machine learning systems utilizing mathematical statics and machine learning techniques. The developed machine learning model includes Bayesian linear, artificial neural network, decision tree, decision forest, and linear regression analysis models. Through the application of these algorithms, corresponding regression models were constructed and analyzed. The results suggest the potential of leveraging machine learning systems for medical research. The experiment aimed to create an Azure Machine Learning Studio tool for the speedy evaluation of multiple regression models. The tool faciliates the comparision of 5 types of regression models in a unified experiment and presents assessment results with performance metrics. Evaluation of regression machine learning models highlighted the advantages of boosted decision tree regression, and decision forest regression in hospital case prediction. These findings could lay the groundwork for the deliberate development of new directions in medical data processing and decision making. Furthermore, potential avenues for future research may include exploring methods such as clustering, classification, and anomaly detection in healthcare systems.

Bayesian Modeling of Mortality Rates for Colon Cancer

  • Kim Hyun-Joong
    • Communications for Statistical Applications and Methods
    • /
    • 제13권1호
    • /
    • pp.177-190
    • /
    • 2006
  • The aim of this study is to propose a Bayesian model for fitting mortality rate of colon cancer. For the analysis of mortality rate of a disease, factors such as age classes of population and spatial characteristics of the location are very important. The model proposed in this study allows the age class to be a random effect in addition to its conventional role as the covariate of a linear regression, while the spatial factor being a random effect. The model is fitted using Metropolis-Hastings algorithm. Posterior expected predictive deviances, standardized residuals, and residual plots are used for comparison of models. It is found that the proposed model has smaller residuals and better predictive accuracy. Lastly, we described patterns in disease maps for colon cancer.

Closed-form fragility analysis of the steel moment resisting frames

  • Kia, M.;Banazadeh, M.
    • Steel and Composite Structures
    • /
    • 제21권1호
    • /
    • pp.93-107
    • /
    • 2016
  • Seismic fragility analysis is a probabilistic decision-making framework which is widely implemented for evaluating vulnerability of a building under earthquake loading. It requires ingredient named probabilistic model and commonly developed using statistics requiring collecting data in large quantities. Preparation of such a data-base is often costly and time-consuming. Therefore, in this paper, by developing generic seismic drift demand model for regular-multi-story steel moment resisting frames is tried to present a novel application of the probabilistic decision-making analysis to practical purposes. To this end, a demand model which is a linear function of intensity measure in logarithmic space is developed to predict overall maximum inter-story drift. Next, the model is coupled with a set of regression-based equations which are capable of directly estimating unknown statistical characteristics of the model parameters.To explicitly address uncertainties arise from randomness and lack of knowledge, the Bayesian regression inference is employed, when these relations are developed. The developed demand model is then employed in a Seismic Fragility Analysis (SFA) for two designed building. The accuracy of the results is also assessed by comparison with the results directly obtained from Incremental Dynamic analysis.

Rapid seismic vulnerability assessment by new regression-based demand and collapse models for steel moment frames

  • Kia, M.;Banazadeh, M.;Bayat, M.
    • Earthquakes and Structures
    • /
    • 제14권3호
    • /
    • pp.203-214
    • /
    • 2018
  • Predictive demand and collapse fragility functions are two essential components of the probabilistic seismic demand analysis that are commonly developed based on statistics with enormous, costly and time consuming data gathering. Although this approach might be justified for research purposes, it is not appealing for practical applications because of its computational cost. Thus, in this paper, Bayesian regression-based demand and collapse models are proposed to eliminate the need of time-consuming analyses. The demand model developed in the form of linear equation predicts overall maximum inter-story drift of the lowto mid-rise regular steel moment resisting frames (SMRFs), while the collapse model mathematically expressed by lognormal cumulative distribution function provides collapse occurrence probability for a given spectral acceleration at the fundamental period of the structure. Next, as an application, the proposed demand and collapse functions are implemented in a seismic fragility analysis to develop fragility and consequently seismic demand curves of three example buildings. The accuracy provided by utilization of the proposed models, with considering computation reduction, are compared with those directly obtained from Incremental Dynamic analysis, which is a computer-intensive procedure.

Prediction of compressive strength of GGBS based concrete using RVM

  • Prasanna, P.K.;Ramachandra Murthy, A.;Srinivasu, K.
    • Structural Engineering and Mechanics
    • /
    • 제68권6호
    • /
    • pp.691-700
    • /
    • 2018
  • Ground granulated blast furnace slag (GGBS) is a by product obtained from iron and steel industries, useful in the design and development of high quality cement paste/mortar and concrete. This paper investigates the applicability of relevance vector machine (RVM) based regression model to predict the compressive strength of various GGBS based concrete mixes. Compressive strength data for various GGBS based concrete mixes has been obtained by considering the effect of water binder ratio and steel fibres. RVM is a machine learning technique which employs Bayesian inference to obtain parsimonious solutions for regression and classification. The RVM is an extension of support vector machine which couples probabilistic classification and regression. RVM is established based on a Bayesian formulation of a linear model with an appropriate prior that results in a sparse representation. Compressive strength model has been developed by using MATLAB software for training and prediction. About 70% of the data has been used for development of RVM model and 30% of the data is used for validation. The predicted compressive strength for GGBS based concrete mixes is found to be in very good agreement with those of the corresponding experimental observations.

Statistical analysis of KNHANES data with measurement error models

  • Hwang, Jinseub
    • Journal of the Korean Data and Information Science Society
    • /
    • 제26권3호
    • /
    • pp.773-779
    • /
    • 2015
  • We study a statistical analysis about the fifth wave data of the Korea National Health and Nutrition Examination Survey based on linear regression models with measurement errors. The data is obtained from a national population-based complex survey. To demonstrate the availability of measurement error models, two results between the general linear regression model and measurement error model are compared based on the model selection criteria which are Akaike information criterion and Bayesian information criterion. For our study, we use the simulation extrapolation algorithm for measurement error model and the jackknife method for the estimation of standard errors.

지형학적 특성을 고려한 레이더 강수량 편의보정 매개변수의 변동성 및 불확실성 분석 (Assessment of variability and uncertainty in bias correction parameters for radar rainfall estimates based on topographical characteristics)

  • 김태정;반우식;권현한
    • 한국수자원학회논문집
    • /
    • 제52권9호
    • /
    • pp.589-601
    • /
    • 2019
  • 최근 수문기상학 분야에서 레이더 강수량을 활용한 응용연구가 활발하게 진행되고 있다. 하지만 레이더 강수량은 경험적인 레이더 반사도-강수강도 관계식을 활용하여 레이더 강수량을 추정하기 때문에 실제 지상에 도달하는 강수량과 정량적인 오차가 필연적으로 발생한다. 따라서 본 연구에서는 레이더 강수량 편의보정을 위하여 Bayesian 추론기법과 일반화 선형모형을 연계하여 불확실성을 고려한 편의보정 매개변수를 산정하였다. 일반화 선형모형을 적용한 레이더 강수량 편의보정 결과는 현재 널리 사용되고 있는 평균보정 기법보다 우수한 통계적 효율기준을 제시하였다. 추가로 지형학적 특성에 따른 편의보정 매개변수의 변동성을 분석하여 고도 및 이격거리에 따른 편의보정 매개변수의 지역화 공식을 제시하였다. 본 연구를 통하여 개발된 레이더 강수량 편의보정 매개변수 산정 및 지역화 결과는 레이더와 관련된 다양한 연구에 활용성이 클 것으로 판단된다.

Variable Selection in Linear Random Effects Models for Normal Data

  • Kim, Hea-Jung
    • Journal of the Korean Statistical Society
    • /
    • 제27권4호
    • /
    • pp.407-420
    • /
    • 1998
  • This paper is concerned with selecting covariates to be included in building linear random effects models designed to analyze clustered response normal data. It is based on a Bayesian approach, intended to propose and develop a procedure that uses probabilistic considerations for selecting premising subsets of covariates. The approach reformulates the linear random effects model in a hierarchical normal and point mass mixture model by introducing a set of latent variables that will be used to identify subset choices. The hierarchical model is flexible to easily accommodate sign constraints in the number of regression coefficients. Utilizing Gibbs sampler, the appropriate posterior probability of each subset of covariates is obtained. Thus, In this procedure, the most promising subset of covariates can be identified as that with highest posterior probability. The procedure is illustrated through a simulation study.

  • PDF

유역특성인자를 활용한 Sacramento 장기유출모형의 매개변수 지역화 기법 연구 (A Study on Regionalization of Parameters for Sacramento Continuous Rainfall-Runoff Model Using Watershed Characteristics)

  • 김태정;정가인;김기영;권현한
    • 한국수자원학회논문집
    • /
    • 제48권10호
    • /
    • pp.793-806
    • /
    • 2015
  • 미계측유역의 유출량 모의는 수문학 분야에서 필수적인 사항이다. 강우-유출 모형을 이용하여 신뢰성 있는 유출량을 모의하기 위한 핵심사항은 강우-유출 모형의 매개변수를 추정하는 것이다. 하지만 현재 우리나라는 불충분한 수문자료로 인해 매개변수 추정에 어려움이 존재한다. 본 연구의 목표는 불확실성 반영을 위한 Bayesian 통계기법 기반의 강우-유출 모형의 매개변수를 지역화 하는 것이다. 그 방법은 다음과 같다. 첫째, 본 연구는 세계적으로 널리 사용되고 있는 Sacramento 강우-유출 모형에 Bayesian Markov Chain Monte Carlo 기법을 연계한 Bayesian Sacramento 강우-유출 모형을 사용하여 계측유역을 대상으로 13개 매개변수를 최적화하고 각 매개변수의 사후분포를 도출하였다. 둘째, 매개변수와 유역특성인자 사이에 회귀특성을 얻기 위해 다중선형회귀분석을 적용하여 유역특성을 고려한 지역화 매개변수를 결정하였다. 다중회귀분석을 통하여 산정된 지역화 매개변수를 계측유역에 전이하여 유출량을 모의 후 통계적 효율기준인 N-S계수, 일치계수 및 상관계수를 사용하여 지역화 매개변수 검증을 수행하였다.

Effective Computation for Odds Ratio Estimation in Nonparametric Logistic Regression

  • Kim, Young-Ju
    • Communications for Statistical Applications and Methods
    • /
    • 제16권4호
    • /
    • pp.713-722
    • /
    • 2009
  • The estimation of odds ratio and corresponding confidence intervals for case-control data have been done by traditional generalized linear models which assumed that the logarithm of odds ratio is linearly related to risk factors. We adapt a lower-dimensional approximation of Gu and Kim (2002) to provide a faster computation in nonparametric method for the estimation of odds ratio by allowing flexibility of the estimating function and its Bayesian confidence interval under the Bayes model for the lower-dimensional approximations. Simulation studies showed that taking larger samples with the lower-dimensional approximations help to improve the smoothing spline estimates of odds ratio in this settings. The proposed method can be used to analyze case-control data in medical studies.