• Title/Summary/Keyword: Random regression model

Search Result 494, Processing Time 0.025 seconds

Statistical notes for clinical researchers: simple linear regression 2 - evaluation of regression line

  • Kim, Hae-Young
    • Restorative Dentistry and Endodontics
    • /
    • v.43 no.3
    • /
    • pp.34.1-34.5
    • /
    • 2018
  • In the previous section, we established a simple linear regression line by finding the slope and intercept using the least square method as: ${\hat{Y}}=30.79+0.71X$. Finding the regression line was a mathematical procedure. After that we need to evaluate the usefulness or effectiveness of the regression line, whether the regression model helps explain the variability of the dependent variable. Also, statistical inference of the regression line is required to make a conclusion at the population level, because practically, we work with a sample, which is a small part of population. Basic assumption of sampling method is simple random sampling.

Small Area Estimation Techniques Based on Logistic Model to Estimate Unemployment Rate

  • Kim, Young-Won;Choi, Hyung-a
    • Communications for Statistical Applications and Methods
    • /
    • v.11 no.3
    • /
    • pp.583-595
    • /
    • 2004
  • For the Korean Economically Active Population Survey(EAPS), we consider the composite estimator based on logistic regression model to estimate the unemployment rate for small areas(Si/Gun). Also, small area estimation technique based on hierarchical generalized linear model is proposed to include the random effect which reflect the characteristic of the small areas. The proposed estimation techniques are applied to real domestic data which is from the Korean EAPS of Choongbuk. The MSE of these estimators are estimated by Jackknife method, and the efficiencies of small area estimators are evaluated by the RRMSE. As a result, the composite estimator based on logistic model is much more efficient than others and it turns out that the composite estimator can produce the reliable estimates under the current EAPS system.

Longitudinal Analysis of Body Weight and Feed Intake in Selection Lines for Residual Feed Intake in Pigs

  • Cai, W.;Wu, H.;Dekkers, J.C.M.
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.24 no.1
    • /
    • pp.17-27
    • /
    • 2011
  • A selection experiment for reduced residual feed intake (RFI) in Yorkshire pigs consisted of a line selected for lower RFI (LRFI) and a random control line (CTRL). Longitudinal measurements of daily feed intake (DFI) and body weight (BW) from generation 5 of this experiment were used. The objectives of this study were to evaluate the use of random regression (RR) and nonlinear mixed models to predict DFI and BW for individual pigs, accounting for the substantial missing information that characterizes these data, and to evaluate the effect of selection for RFI on BW and DFI curves. Forty RR models with different-order polynomials of age as fixed and random effects, and with homogeneous or heterogeneous residual variance by month of age, were fitted for both DFI and BW. Based on predicted residual sum of squares (PRESS) and residual diagnostics, the quadratic polynomial RR model was identified to be best, but with heterogeneous residual variance for DFI and homogeneous residual variance for BW. Compared to the simple quadratic and linear regression models for individual pigs, these RR models decreased PRESS by 1% and 2% for DFI and by 42% and 36% for BW on boars and gilts, respectively. Given the same number of random effects as the polynomial RR models, i.e., two for BW and one for DFI, the non-linear Gompertz model predicted better than the polynomial RR models but not as good as higher order polynomial RR models. After five generations of selection for reduced RFI, the LRFI line had a lower population curve for DFI and BW than the CTRL line, especially towards the end of the growth period.

A Study on Predictive Modeling of I-131 Radioactivity Based on Machine Learning (머신러닝 기반 고용량 I-131의 용량 예측 모델에 관한 연구)

  • Yeon-Wook You;Chung-Wun Lee;Jung-Soo Kim
    • Journal of radiological science and technology
    • /
    • v.46 no.2
    • /
    • pp.131-139
    • /
    • 2023
  • High-dose I-131 used for the treatment of thyroid cancer causes localized exposure among radiology technologists handling it. There is a delay between the calibration date and when the dose of I-131 is administered to a patient. Therefore, it is necessary to directly measure the radioactivity of the administered dose using a dose calibrator. In this study, we attempted to apply machine learning modeling to measured external dose rates from shielded I-131 in order to predict their radioactivity. External dose rates were measured at 1 m, 0.3 m, and 0.1 m distances from a shielded container with the I-131, with a total of 868 sets of measurements taken. For the modeling process, we utilized the hold-out method to partition the data with a 7:3 ratio (609 for the training set:259 for the test set). For the machine learning algorithms, we chose linear regression, decision tree, random forest and XGBoost. To evaluate the models, we calculated root mean square error (RMSE), mean square error (MSE), and mean absolute error (MAE) to evaluate accuracy and R2 to evaluate explanatory power. Evaluation results are as follows. Linear regression (RMSE 268.15, MSE 71901.87, MAE 231.68, R2 0.92), decision tree (RMSE 108.89, MSE 11856.92, MAE 19.24, R2 0.99), random forest (RMSE 8.89, MSE 79.10, MAE 6.55, R2 0.99), XGBoost (RMSE 10.21, MSE 104.22, MAE 7.68, R2 0.99). The random forest model achieved the highest predictive ability. Improving the model's performance in the future is expected to contribute to lowering exposure among radiology technologists.

NONPARAMETRIC ESTIMATION OF THE VARIANCE FUNCTION WITH A CHANGE POINT

  • Kang Kee-Hoon;Huh Jib
    • Journal of the Korean Statistical Society
    • /
    • v.35 no.1
    • /
    • pp.1-23
    • /
    • 2006
  • In this paper we consider an estimation of the discontinuous variance function in nonparametric heteroscedastic random design regression model. We first propose estimators of the change point in the variance function and then construct an estimator of the entire variance function. We examine the rates of convergence of these estimators and give results for their asymptotics. Numerical work reveals that using the proposed change point analysis in the variance function estimation is quite effective.

A Crash Prediction Model for Expressways Using Genetic Programming (유전자 프로그래밍을 이용한 고속도로 사고예측모형)

  • Kwak, Ho-Chan;Kim, Dong-Kyu;Kho, Seung-Young;Lee, Chungwon
    • Journal of Korean Society of Transportation
    • /
    • v.32 no.4
    • /
    • pp.369-379
    • /
    • 2014
  • The Statistical regression model has been used to construct crash prediction models, despite its limitations in assuming data distribution and functional form. In response to the limitations associated with the statistical regression models, a few studies based on non-parametric methods such as neural networks have been proposed to develop crash prediction models. However, these models have a major limitation in that they work as black boxes, and therefore cannot be directly used to identify the relationships between crash frequency and crash factors. A genetic programming model can find a solution to a problem without any specified assumptions and remove the black box effect. Hence, this paper investigates the application of the genetic programming technique to develope the crash prediction model. The data collected from the Gyeongbu expressway during the past three years (2010-2012), were separated into straight and curve sections. The random forest technique was applied to select the important variables that affect crash occurrence. The genetic programming model was developed based on the variables that were selected by the random forest. To test the goodness of fit of the genetic programming model, the RMSE of each model was compared to that of the negative binomial regression model. The test results indicate that the goodness of fit of the genetic programming models is superior to that of the negative binomial models.

Inclusion of bioclimatic variables in genetic evaluations of dairy cattle

  • Negri, Renata;Aguilar, Ignacio;Feltes, Giovani Luis;Machado, Juliana Dementshuk;Neto, Jose Braccini;Costa-Maia, Fabiana Martins;Cobuci, Jaime Araujo
    • Animal Bioscience
    • /
    • v.34 no.2
    • /
    • pp.163-171
    • /
    • 2021
  • Objective: Considering the importance of dairy farming and the negative effects of heat stress, more tolerant genotypes need to be identified. The objective of this study was to investigate the effect of heat stress via temperature-humidity index (THI) and diurnal temperature variation (DTV) in the genetic evaluations for daily milk yield of Holstein dairy cattle, using random regression models. Methods: The data comprised 94,549 test-day records of 11,294 first parity Holstein cows from Brazil, collected from 1997 to 2013, and bioclimatic data (THI and DTV) from 18 weather stations. Least square linear regression models were used to determine the THI and DTV thresholds for milk yield losses caused by heat stress. In addition to the standard model (SM, without bioclimatic variables), THI and DTV were combined in various ways and tested for different days, totaling 41 models. Results: The THI and DTV thresholds for milk yield losses was THI = 74 (-0.106 kg/d/THI) and DTV = 13 (-0.045 kg/d/DTV). The model that included THI and DTV as fixed effects, considering the two-day average, presented better fit (-2logL, Akaike information criterion, and Bayesian information criterion). The estimated breeding values (EBVs) and the reliabilities of the EBVs improved when using this model. Conclusion: Sires are re-ranking when heat stress indicators are included in the model. Genetic evaluation using the mean of two days of THI and DTV as fixed effect, improved EBVs and EBVs reliability.

Geographically weighted kernel logistic regression for small area proportion estimation

  • Shim, Jooyong;Hwang, Changha
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.2
    • /
    • pp.531-538
    • /
    • 2016
  • In this paper we deal with the small area estimation for the case that the response variables take binary values. The mixed effects models have been extensively studied for the small area estimation, which treats the spatial effects as random effects. However, when the spatial information of each area is given specifically as coordinates it is popular to use the geographically weighted logistic regression to incorporate the spatial information by assuming that the regression parameters vary spatially across areas. In this paper, relaxing the linearity assumption and propose a geographically weighted kernel logistic regression for estimating small area proportions by using basic principle of kernel machine. Numerical studies have been carried out to compare the performance of proposed method with other methods in estimating small area proportion.

A Study on the prediction of BMI(Benthic Macroinvertebrate Index) using Machine Learning Based CFS(Correlation-based Feature Selection) and Random Forest Model (머신러닝 기반 CFS(Correlation-based Feature Selection)기법과 Random Forest모델을 활용한 BMI(Benthic Macroinvertebrate Index) 예측에 관한 연구)

  • Go, Woo-Seok;Yoon, Chun Gyeong;Rhee, Han-Pil;Hwang, Soon-Jin;Lee, Sang-Woo
    • Journal of Korean Society on Water Environment
    • /
    • v.35 no.5
    • /
    • pp.425-431
    • /
    • 2019
  • Recently, people have been attracting attention to the good quality of water resources as well as water welfare. to improve the quality of life. This study is a papers on the prediction of benthic macroinvertebrate index (BMI), which is a aquatic ecological health, using the machine learning based CFS (Correlation-based Feature Selection) method and the random forest model to compare the measured and predicted values of the BMI. The data collected from the Han River's branch for 10 years are extracted and utilized in 1312 data. Through the utilized data, Pearson correlation analysis showed a lack of correlation between single factor and BMI. The CFS method for multiple regression analysis was introduced. This study calculated 10 factors(water temperature, DO, electrical conductivity, turbidity, BOD, $NH_3-N$, T-N, $PO_4-P$, T-P, Average flow rate) that are considered to be related to the BMI. The random forest model was used based on the ten factors. In order to prove the validity of the model, $R^2$, %Difference, NSE (Nash-Sutcliffe Efficiency) and RMSE (Root Mean Square Error) were used. Each factor was 0.9438, -0.997, and 0,992, and accuracy rate was 71.6% level. As a result, These results can suggest the future direction of water resource management and Pre-review function for water ecological prediction.

A Study on The Optimization Method of The Initial Weights in Single Layer Perceptron

  • Cho, Yong-Jun;Lee, Yong-Goo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.15 no.2
    • /
    • pp.331-337
    • /
    • 2004
  • In the analysis of massive volume data, a neural network model is a useful tool. To implement the Neural network model, it is important to select initial value. Since the initial values are generally used as random value in the neural network, the convergent performance and the prediction rate of model are not stable. To overcome the drawback a possible method use samples randomly selected from the whole data set. That is, coefficients estimated by logistic regression based on the samples are the initial values.

  • PDF