• Title/Summary/Keyword: Bayesian Linear Regression

Search Result 72, Processing Time 0.025 seconds

Bayesian quantile regression analysis of private education expenses for high scool students in Korea (일반계 고등학생 사교육비 지출에 대한 베이지안 분위회귀모형 분석)

  • Oh, Hyun Sook
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.6
    • /
    • pp.1457-1469
    • /
    • 2017
  • Private education expenses is one of the key issues in Korea and there have been many discussions about it. Academically, most of previous researches for private education expenses have used multiple regression linear model based on ordinary least squares (OLS) method. However, if the data do not satisfy the basic assumptions of the OLS method such as the normality and homoscedasticity, there is a problem with the reliability of estimations of parameters. In this case, quantile regression model is preferred to OLS model since it does not depend on the assumptions of nonnormality and heteroscedasticity for the data. In the present study, the data from a survey on private education expenses, conducted by Statistics Korea in 2015 has been analyzed for investigation of the impacting factors for private education expenses. Since the data do not satisfy the OLS assumptions, quantile regression model has been employed in Bayesian approach by using gibbs sampling method. The analysis results show that the gender of the student, parent's age, and the time and cost of participating after school are not significant. Household income is positively significant in proportion to the same size for all levels (quantiles) of private education expenses. Spending on private education in Seoul is higher than other regions and the regional difference grows as private education expenditure increases. Total time for private education and student's achievement have positive effect on the lower quantiles than the higher quantiles. Education level of father is positively significant for midium-high quantiles only, but education level of mother is for all but low quantiles. Participating after school is positively significant for the lower quantiles but EBS textbook cost is positively significant for the higher quantiles.

Association of heavy metal complex exposure and neurobehavioral function of children

  • Minkeun Kim;Chulyong Park;Joon Sakong;Shinhee Ye;So young Son;Kiook Baek
    • Annals of Occupational and Environmental Medicine
    • /
    • v.35
    • /
    • pp.23.1-23.14
    • /
    • 2023
  • Background: Exposure to heavy metals is a public health concern worldwide. Previous studies on the association between heavy metal exposure and neurobehavioral functions in children have focused on single exposures and clinical manifestations. However, the present study evaluated the effects of heavy metal complex exposure on subclinical neurobehavioral function using a Korean Computerized Neurobehavior Test (KCNT). Methods: Urinary mercury, lead, cadmium analyses as well as symbol digit substitution (SDS) and choice reaction time (CRT) tests of the KCNT were conducted in children aged between 10 and 12 years. Reaction time and urinary heavy metal levels were analyzed using partial correlation, linear regression, Bayesian kernel machine regression (BKMR), the weighted quantile sum (WQS) regression and quantile G-computation analysis. Results: Participants of 203 SDS tests and 198 CRT tests were analyzed, excluding poor cooperation and inappropriate urine sample. Partial correlation analysis revealed no association between neurobehavioral function and exposure to individual heavy metals. The result of multiple linear regression shows significant positive association between urinary lead, mercury, and CRT. BMKR, WQS regression and quantile G-computation analysis showed a statistically significant positive association between complex urinary heavy metal concentrations, especially lead and mercury, and reaction time. Conclusions: Assuming complex exposures, urinary heavy metal concentrations showed a statistically significant positive association with CRT. These results suggest that heavy metal complex exposure during childhood should be evaluated and managed strictly.

Estimation of genetic parameters and trends for production traits of dairy cattle in Thailand using a multiple-trait multiple-lactation test day model

  • Buaban, Sayan;Puangdee, Somsook;Duangjinda, Monchai;Boonkum, Wuttigrai
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.33 no.9
    • /
    • pp.1387-1399
    • /
    • 2020
  • Objective: The objective of this study was to estimate the genetic parameters and trends for milk, fat, and protein yields in the first three lactations of Thai dairy cattle using a 3-trait,-3-lactation random regression test-day model. Methods: Data included 168,996, 63,388, and 27,145 test-day records from the first, second, and third lactations, respectively. Records were from 19,068 cows calving from 1993 to 2013 in 124 herds. (Co) variance components were estimated by Bayesian methods. Gibbs sampling was used to obtain posterior distributions. The model included herd-year-month of testing, breed group-season of calving-month in tested milk group, linear and quadratic age at calving as fixed effects, and random regression coefficients for additive genetic and permanent environmental effects, which were defined as modified constant, linear, quadratic, cubic and quartic Legendre coefficients. Results: Average daily heritabilities ranged from 0.36 to 0.48 for milk, 0.33 to 0.44 for fat and 0.37 to 0.48 for protein yields; they were higher in the third lactation for all traits. Heritabilities of test-day milk and protein yields for selected days in milk were higher in the middle than at the beginning or end of lactation, whereas those for test-day fat yields were high at the beginning and end of lactation. Genetics correlations (305-d yield) among production yields within lactations (0.44 to 0.69) were higher than those across lactations (0.36 to 0.68). The largest genetic correlation was observed between the first and second lactation. The genetic trends of 305-d milk, fat and protein yields were 230 to 250, 25 to 29, and 30 to 35 kg per year, respectively. Conclusion: A random regression model seems to be a flexible and reliable procedure for the genetic evaluation of production yields. It can be used to perform breeding value estimation for national genetic evaluation in the Thai dairy cattle population.

A Study on Regionalization of Parameters of Continuous Rainfall-Runoff Model (연속 강우-유출모형의 매개변수 지역화에 관한 연구)

  • Jeong, Ga-In;Kim, Tae-Jeong;Kwon, Hyun-Han
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2015.05a
    • /
    • pp.182-182
    • /
    • 2015
  • 우리나라에서는 강우관측시스템의 지역적 불균형으로 상대적으로 소규모 저수지의 경우 미계측유역의 특성을 가지며, 신뢰성 있는 강우량, 유출량, 증발량 자료가 매우 부족한 실정이다. 다목적댐 유역과 같은 계측유역의 경우 상류유역의 유입량 자료의 확보가 용이하지만 대부분의 유역의 경우 계측장비가 부족하여 신뢰성이 확보된 유입량 자료를 얻는데 많은 어려움이 있다. 본 연구에서는 미계측유역의 유입량 산정을 위하여 계측유역을 대상으로 강우-유출 모형의 매개변수를 산정하였으며, 산정된 매개변수를 유역특성인자와의 상관성을 토대로 다중선형회귀분석기법(multiple linear regression, MLR)을 적용하여 지역화(regionalization)를 위한 회귀식을 도출하였다. 이를 위해 양질의 유량자료가 확보된 K-water 17개 댐 유역을 대상으로 매개변수를 산정하였으며 이 중 2개의 댐 유역을 미계측유역으로 간주하여 개발된 모형을 검증하였다. 대부분의 통계 지표에서 우수한 모의능력을 확인하였으며, 본 연구를 통하여 개발된 지역화 기법을 미계측유역에 활용한다면 보다 정량적이고 효율적인 수자원 계획이 가능할 것으로 판단된다. 향후 연구로는 불확실성을 고려한 Bayesian GLM 모형을 이용한 지역화기법을 개발하여 매개변수의 불확실성까지 고려할 수 있는 방안을 모색하고자 한다.

  • PDF

Power consumption prediction model based on artificial neural networks for seawater source heat pump system in recirculating aquaculture system fish farm (순환여과식 양식장 해수 열원 히트펌프 시스템의 전력 소비량 예측을 위한 인공 신경망 모델)

  • Hyeon-Seok JEONG;Jong-Hyeok RYU;Seok-Kwon JEONG
    • Journal of the Korean Society of Fisheries and Ocean Technology
    • /
    • v.60 no.1
    • /
    • pp.87-99
    • /
    • 2024
  • This study deals with the application of an artificial neural network (ANN) model to predict power consumption for utilizing seawater source heat pumps of recirculating aquaculture system. An integrated dynamic simulation model was constructed using the TRNSYS program to obtain input and output data for the ANN model to predict the power consumption of the recirculating aquaculture system with a heat pump system. Data obtained from the TRNSYS program were analyzed using linear regression, and converted into optimal data necessary for the ANN model through normalization. To optimize the ANN-based power consumption prediction model, the hyper parameters of ANN were determined using the Bayesian optimization. ANN simulation results showed that ANN models with optimized hyper parameters exhibited acceptably high predictive accuracy conforming to ASHRAE standards.

Comparison of genome-wide association and genomic prediction methods for milk production traits in Korean Holstein cattle

  • Lee, SeokHyun;Dang, ChangGwon;Choy, YunHo;Do, ChangHee;Cho, Kwanghyun;Kim, Jongjoo;Kim, Yousam;Lee, Jungjae
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.32 no.7
    • /
    • pp.913-921
    • /
    • 2019
  • Objective: The objectives of this study were to compare identified informative regions through two genome-wide association study (GWAS) approaches and determine the accuracy and bias of the direct genomic value (DGV) for milk production traits in Korean Holstein cattle, using two genomic prediction approaches: single-step genomic best linear unbiased prediction (ss-GBLUP) and Bayesian Bayes-B. Methods: Records on production traits such as adjusted 305-day milk (MY305), fat (FY305), and protein (PY305) yields were collected from 265,271 first parity cows. After quality control, 50,765 single-nucleotide polymorphic genotypes were available for analysis. In GWAS for ss-GBLUP (ssGWAS) and Bayes-B (BayesGWAS), the proportion of genetic variance for each 1-Mb genomic window was calculated and used to identify informative genomic regions. Accuracy of the DGV was estimated by a five-fold cross-validation with random clustering. As a measure of accuracy for DGV, we also assessed the correlation between DGV and deregressed-estimated breeding value (DEBV). The bias of DGV for each method was obtained by determining regression coefficients. Results: A total of nine and five significant windows (1 Mb) were identified for MY305 using ssGWAS and BayesGWAS, respectively. Using ssGWAS and BayesGWAS, we also detected multiple significant regions for FY305 (12 and 7) and PY305 (14 and 2), respectively. Both single-step DGV and Bayes DGV also showed somewhat moderate accuracy ranges for MY305 (0.32 to 0.34), FY305 (0.37 to 0.39), and PY305 (0.35 to 0.36) traits, respectively. The mean biases of DGVs determined using the single-step and Bayesian methods were $1.50{\pm}0.21$ and $1.18{\pm}0.26$ for MY305, $1.75{\pm}0.33$ and $1.14{\pm}0.20$ for FY305, and $1.59{\pm}0.20$ and $1.14{\pm}0.15$ for PY305, respectively. Conclusion: From the bias perspective, we believe that genomic selection based on the application of Bayesian approaches would be more suitable than application of ss-GBLUP in Korean Holstein populations.

Mapping Landslide Susceptibility Based on Spatial Prediction Modeling Approach and Quality Assessment (공간예측모형에 기반한 산사태 취약성 지도 작성과 품질 평가)

  • Al, Mamun;Park, Hyun-Su;JANG, Dong-Ho
    • Journal of The Geomorphological Association of Korea
    • /
    • v.26 no.3
    • /
    • pp.53-67
    • /
    • 2019
  • The purpose of this study is to identify the quality of landslide susceptibility in a landslide-prone area (Jinbu-myeon, Gangwon-do, South Korea) by spatial prediction modeling approach and compare the results obtained. For this goal, a landslide inventory map was prepared mainly based on past historical information and aerial photographs analysis (Daum Map, 2008), as well as some field observation. Altogether, 550 landslides were counted at the whole study area. Among them, 182 landslides are debris flow and each group of landslides was constructed in the inventory map separately. Then, the landslide inventory was randomly selected through Excel; 50% landslide was used for model analysis and the remaining 50% was used for validation purpose. Total 12 contributing factors, such as slope, aspect, curvature, topographic wetness index (TWI), elevation, forest type, forest timber diameter, forest crown density, geology, landuse, soil depth, and soil drainage were used in the analysis. Moreover, to find out the co-relation between landslide causative factors and incidents landslide, pixels were divided into several classes and frequency ratio for individual class was extracted. Eventually, six landslide susceptibility maps were constructed using the Bayesian Predictive Discriminant (BPD), Empirical Likelihood Ratio (ELR), and Linear Regression Method (LRM) models based on different category dada. Finally, in the cross validation process, landslide susceptibility map was plotted with a receiver operating characteristic (ROC) curve and calculated the area under the curve (AUC) and tried to extract success rate curve. The result showed that Bayesian, likelihood and linear models were of 85.52%, 85.23%, and 83.49% accuracy respectively for total data. Subsequently, in the category of debris flow landslide, results are little better compare with total data and its contained 86.33%, 85.53% and 84.17% accuracy. It means all three models were reasonable methods for landslide susceptibility analysis. The models have proved to produce reliable predictions for regional spatial planning or land-use planning.

Bayesian Network Analysis for the Dynamic Prediction of Financial Performance Using Corporate Social Responsibility Activities (베이지안 네트워크를 이용한 기업의 사회적 책임활동과 재무성과)

  • Sun, Eun-Jung
    • Management & Information Systems Review
    • /
    • v.34 no.5
    • /
    • pp.71-92
    • /
    • 2015
  • This study analyzes the impact of Corporate Social Responsibility (CSR) activities on financial performances using Bayesian Network. The research tries to overcome the issues of the uniform assumption of a linear function between financial performance and CSR activities in multiple regression analysis widely used in previous studies. It is required to infer a causal relationship between activities of CSR which have an impact on the financial performances. Identifying the relationship would empower the firms to improve their financial performance by informing the decision makers about the different CSR activities that influence the financial performance of the firms. This research proposes General Bayesian Network (GBN) and presents Markov Blanket induced from GBN. It is empirically demonstrated that all the proposals presented in this study are statistically significant by the results of the research conducted by Korean Economic Justice Institute (KEJI) under Citizen's Coalition for Economic Justice (CCEJ) which investigated approximately 200 companies in Korea based on Korean Economic Justice Institute Index (KEJI index) from 2005 to 2011. The Bayesian Network to effectively infer the properties affecting financial performances through the probabilistic causal relationship. Moreover, I found that there is a causal relationship among CSR activities variable; that is Environment protection is related to Customer protection, Employee satisfaction, and firm size; Soundness is related to Total CSR Evaluation Score, Debt-Assets Ratio. Though the what-if analysis, I suggest to the sensitive factor among the explanatory variables.

  • PDF

The Risk Assessment and Prediction for the Mixed Deterioration in Cable Bridges Using a Stochastic Bayesian Modeling (확률론적 베이지언 모델링에 의한 케이블 교량의 복합열화 리스크 평가 및 예측시스템)

  • Cho, Tae Jun;Lee, Jeong Bae;Kim, Seong Soo
    • Journal of the Korea institute for structural maintenance and inspection
    • /
    • v.16 no.5
    • /
    • pp.29-39
    • /
    • 2012
  • The main objective is to predict the future degradation and maintenance budget for a suspension bridge system. Bayesian inference is applied to find the posterior probability density function of the source parameters (damage indices and serviceability), given ten years of maintenance data. The posterior distribution of the parameters is sampled using a Markov chain Monte Carlo method. The simulated risk prediction for decreased serviceability conditions are posterior distributions based on prior distribution and likelihood of data updated from annual maintenance tasks. Compared with conventional linear prediction model, the proposed quadratic model provides highly improved convergence and closeness to measured data in terms of serviceability, risky factors, and maintenance budget for bridge components, which allows forecasting a future performance and financial management of complex infrastructures based on the proposed quadratic stochastic regression model.

Application of deep learning with bivariate models for genomic prediction of sow lifetime productivity-related traits

  • Joon-Ki Hong;Yong-Min Kim;Eun-Seok Cho;Jae-Bong Lee;Young-Sin Kim;Hee-Bok Park
    • Animal Bioscience
    • /
    • v.37 no.4
    • /
    • pp.622-630
    • /
    • 2024
  • Objective: Pig breeders cannot obtain phenotypic information at the time of selection for sow lifetime productivity (SLP). They would benefit from obtaining genetic information of candidate sows. Genomic data interpreted using deep learning (DL) techniques could contribute to the genetic improvement of SLP to maximize farm profitability because DL models capture nonlinear genetic effects such as dominance and epistasis more efficiently than conventional genomic prediction methods based on linear models. This study aimed to investigate the usefulness of DL for the genomic prediction of two SLP-related traits; lifetime number of litters (LNL) and lifetime pig production (LPP). Methods: Two bivariate DL models, convolutional neural network (CNN) and local convolutional neural network (LCNN), were compared with conventional bivariate linear models (i.e., genomic best linear unbiased prediction, Bayesian ridge regression, Bayes A, and Bayes B). Phenotype and pedigree data were collected from 40,011 sows that had husbandry records. Among these, 3,652 pigs were genotyped using the PorcineSNP60K BeadChip. Results: The best predictive correlation for LNL was obtained with CNN (0.28), followed by LCNN (0.26) and conventional linear models (approximately 0.21). For LPP, the best predictive correlation was also obtained with CNN (0.29), followed by LCNN (0.27) and conventional linear models (approximately 0.25). A similar trend was observed with the mean squared error of prediction for the SLP traits. Conclusion: This study provides an example of a CNN that can outperform against the linear model-based genomic prediction approaches when the nonlinear interaction components are important because LNL and LPP exhibited strong epistatic interaction components. Additionally, our results suggest that applying bivariate DL models could also contribute to the prediction accuracy by utilizing the genetic correlation between LNL and LPP.