• Title/Summary/Keyword: Linear Random Effects Models

Search Result 77, Processing Time 0.029 seconds

The effect of missing levels of nesting in multilevel analysis

  • Park, Seho;Chung, Yujin
    • Genomics & Informatics
    • /
    • v.20 no.3
    • /
    • pp.34.1-34.11
    • /
    • 2022
  • Multilevel analysis is an appropriate and powerful tool for analyzing hierarchical structure data widely applied from public health to genomic data. In practice, however, we may lose the information on multiple nesting levels in the multilevel analysis since data may fail to capture all levels of hierarchy, or the top or intermediate levels of hierarchy are ignored in the analysis. In this study, we consider a multilevel linear mixed effect model (LMM) with single imputation that can involve all data hierarchy levels in the presence of missing top or intermediate-level clusters. We evaluate and compare the performance of a multilevel LMM with single imputation with other models ignoring the data hierarchy or missing intermediate-level clusters. To this end, we applied a multilevel LMM with single imputation and other models to hierarchically structured cohort data with some intermediate levels missing and to simulated data with various cluster sizes and missing rates of intermediate-level clusters. A thorough simulation study demonstrated that an LMM with single imputation estimates fixed coefficients and variance components of a multilevel model more accurately than other models ignoring data hierarchy or missing clusters in terms of mean squared error and coverage probability. In particular, when models ignoring data hierarchy or missing clusters were applied, the variance components of random effects were overestimated. We observed similar results from the analysis of hierarchically structured cohort data.

Determinants of student course evaluation using hierarchical linear model (위계적 선형모형을 이용한 강의평가 결정요인 분석)

  • Cho, Jang Sik
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.6
    • /
    • pp.1285-1296
    • /
    • 2013
  • The fundamental concerns of this paper are to analyze the effects of student course evaluation using subject characteristic and student characteristic variables. We use a 2-level hierarchical linear model since the data structure of subject characteristic and student characteristic variables is multilevel. Four models we consider are as follows; (1) null model, (2) random coefficient model, (3) mean as outcomes model, (4) intercepts and slopes as outcomes model. The results of the analysis were given as follows. First, the result of null model was that subject characteristics effects on course evaluation had much larger than student characteristics. Second, the result of conditional model specifying subject and student level predictors revealed that class size, grade, tenure, mean GPA of the class, native class for level-1, and sex, department category, admission method, mean GPA of the student for level-2 had statistically significant effects on course evaluation. The explained variance was 13% in subject level, 13% in student level.

The wage determinants of the vocational high school graduates using mixed effects mode (혼합모형을 이용한 특성화고 졸업생의 임금결정요인 분석)

  • Ryu, Jangsoo;Cho, Jangsik
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.4
    • /
    • pp.935-946
    • /
    • 2016
  • In this paper, we analyzed wage determinants of the vocational high school graduates utilizing both individual-level and work region-level variables. We formulate the models in the way wage determination has multi-level structure in the sense that individual wage is influenced by individual-level variables (level-1) and work region-level (level-2) variables. To incorporate dependency between individual wages into the model, we utilize hierarchical linear model (HLM). The major results are as follows. First, it is shown that the HLM model is better than the OLS regression models which do not take level-1 and level-2 variables simultaneously into account. Second, random effects on sex, maester dummy and engineering dummy variables are statistically significant. Third, the fixed effects on business hours and mean wage of regular job for level-2 variables are statistically significant effect individual-level wages. Finally, parental education level, parental income, number of licenses and high school grade are statistically significant for higher individual-level wages.

Accuracy Evaluation of Machine Learning Model for Concrete Aging Prediction due to Thermal Effect and Carbonation (콘크리트 탄산화 및 열효과에 의한 경년열화 예측을 위한 기계학습 모델의 정확성 검토)

  • Kim, Hyun-Su
    • Journal of Korean Association for Spatial Structures
    • /
    • v.23 no.4
    • /
    • pp.81-88
    • /
    • 2023
  • Numerous factors contribute to the deterioration of reinforced concrete structures. Elevated temperatures significantly alter the composition of the concrete ingredients, consequently diminishing the concrete's strength properties. With the escalation of global CO2 levels, the carbonation of concrete structures has emerged as a critical challenge, substantially affecting concrete durability research. Assessing and predicting concrete degradation due to thermal effects and carbonation are crucial yet intricate tasks. To address this, multiple prediction models for concrete carbonation and compressive strength under thermal impact have been developed. This study employs seven machine learning algorithms-specifically, multiple linear regression, decision trees, random forest, support vector machines, k-nearest neighbors, artificial neural networks, and extreme gradient boosting algorithms-to formulate predictive models for concrete carbonation and thermal impact. Two distinct datasets, derived from reported experimental studies, were utilized for training these predictive models. Performance evaluation relied on metrics like root mean square error, mean square error, mean absolute error, and coefficient of determination. The optimization of hyperparameters was achieved through k-fold cross-validation and grid search techniques. The analytical outcomes demonstrate that neural networks and extreme gradient boosting algorithms outshine the remaining five machine learning approaches, showcasing outstanding predictive performance for concrete carbonation and thermal effect modeling.

Analysis of SEER Glassy Cell Carcinoma Data: Underuse of Radiotherapy and Predicators of Cause Specific Survival

  • Cheung, Rex
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.17 no.1
    • /
    • pp.353-356
    • /
    • 2016
  • Background: This study used receiver operating characteristic curve to analyze Surveillance, Epidemiology and End Results (SEER) for glassy cell carcinoma data to identify predictive models and potential disparities in outcome. Materials and Methods: This study analyzed socio-economic, staging and treatment factors. For risk modeling, each factor was fitted by a generalized linear model to predict the cause specific survival. Area under the receiver operating characteristic curves (ROCs) were computed. Similar strata were combined to construct the most parsimonious models. A random sampling algorithm was used to estimate modeling errors. Risk of glassy cell carcinoma death was computed for the predictors for comparison. Results: There were 79 patients included in this study. The mean follow up time (S.D.) was 37 (32.8) months. Female patients outnumbered males 4:1. The mean (S.D.) age was 54.4 (19.8) years. SEER stage was the most predictive factor of outcome (ROC area of 0.69). The risks of cause specific death were, respectively, 9.4% for localized, 16.7% for regional, 35% for the un-staged/others category, and 60% for distant disease. After optimization, separation between the regional and unstaged/others category was removed with a higher ROC area of 0.72. Several socio-economic factors had small but measurable effects on outcome. Radiotherapy had not been used in 90% of patients with regional disease. Conclusions: Optimized SEER stage was predictive and useful in treatment selection. Underuse of radiotherapy may have contributed to poor outcome.

Bayesian analysis of finite mixture model with cluster-specific random effects (군집 특정 변량효과를 포함한 유한 혼합 모형의 베이지안 분석)

  • Lee, Hyejin;Kyung, Minjung
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.1
    • /
    • pp.57-68
    • /
    • 2017
  • Clustering algorithms attempt to find a partition of a finite set of objects in to a potentially predetermined number of nonempty subsets. Gibbs sampling of a normal mixture of linear mixed regressions with a Dirichlet prior distribution calculates posterior probabilities when the number of clusters was known. Our approach provides simultaneous partitioning and parameter estimation with the computation of classification probabilities. A Monte Carlo study of curve estimation results showed that the model was useful for function estimation. Examples are given to show how these models perform on real data.

Modelling Stem Diameter Variability in Pinus caribaea (Morelet) Plantations in South West Nigeria

  • Adesoye, Peter Oluremi
    • Journal of Forest and Environmental Science
    • /
    • v.32 no.3
    • /
    • pp.280-290
    • /
    • 2016
  • Stem diameter variability is an essential inventory result that provides useful information in forest management decisions. Little has been done to explore the modelling potentials of standard deviation (SDD) and coefficient of variation (CVD) of diameter at breast height (dbh). This study, therefore, was aimed at developing and testing models for predicting SDD and CVD in stands of Pinus caribaea Morelet (pine) in south west Nigeria. Sixty temporary sample plots of size $20m{\times}20m$, ranging between 15 and 37 years were sampled, covering the entire range of pine in south west Nigeria. The dbh (cm), total and merchantable heights (m), number of stems and age of trees were measured within each plot. Basal area ($m^2$), site index (m), relative spacing and percentile positions of dbh at $24^{th}$, $63^{rd}$, $76^{th}$ and $93^{rd}$ (i.e. $P_{24}$, $P_{63}$, $P_{76}$ and $P_{93}$) were computed from measured variables for each plot. Linear mixed model (LMM) was used to test the effects of locations (fixed) and plots (random). Six candidate models (3 for SDD and 3 for CVD), using three categories of explanatory variables (i.e. (i) only stand size measures, (ii) distribution measures, and (iii) combination of i and ii). The best model was chosen based on smaller relative standard error (RSE), prediction residual sum of squares (PRESS), corrected Akaike Information Criterion ($AIC_c$) and larger coefficient of determination ($R^2$). The results of the LMM indicated that location and plot effects were not significant. The CVD and SDD models having only measures of percentiles (i.e. $P_{24}$ and $P_{93}$) as predictors produced better predictions than others. However, CVD model produced the overall best predictions, because of the lower RSE and stability in measuring variability across different stand developments. The results demonstrate the potentials of CVD in modelling stem diameter variability in relationship with percentiles variables.

Modeling and Analysis of Accelerated Degradation Testing Data for a Solid State Drive (SSD) (Solid State Drive(SSD)에 대한 가속열화시험 데이터 모델링 및 분석)

  • Mun, Byeong Min;Choi, Young Jin;Ji, You Min;Lee, Yong Jung;Lee, Keun Woo;Na, Han Joo;Yang, Joong Seob;Bae, Suk Joo
    • Journal of Applied Reliability
    • /
    • v.18 no.1
    • /
    • pp.33-39
    • /
    • 2018
  • Purpose: Accelerated degradation tests can be effective in assessing product reliability when degradation leading to failure can be observed. This article proposes an accelerated degradation test model for highly reliable solid state drives (SSDs). Methods: We suggest a nonlinear mixed-effects (NLME) model to degradation data for SSDs. A Monte Carlo simulation is used to estimate lifetime distribution in accelerated degradation testing data. This simulation is performed by generating random samples from the assumed NLME model. Conclusion: We apply the proposed method to degradation data collected from SSDs. The derived power model is shown to be much better at fitting the degradation data than other existing models. Finally, the Monte Carlo simulation based on the NLME model provides reasonable results in lifetime estimation.

Heritability and Repeatability of Superovulatory Responses in Holstein Population in Hokkaido, Japan

  • Asada, Y.;Terawaki, Y.
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.15 no.7
    • /
    • pp.944-948
    • /
    • 2002
  • The aim of this study was to estimate heritability and repeatability for the number of embryos and transferable embryos collected per flush in Holstein population in Hokkaido, Japan. Data consisted of 306 MOET (Multiple Ovulation and Embryo Transfer) treatments on 224 Holstein cows from 1997 to 2000. Variance components for these traits were estimated using the REML procedure. The model included only non-genetic factors that were significant at the 0.05 level, through using generalized linear models, maximum likelihood methods, and stepwise regression procedure as fixed effects and sire and residual for heritabilities, donor and residual for repeatabilities as random effects. The factor identified as important in determining the results was the donor''s estrous condition after superovulation. Heritabilities for the number of embryos and transferable embryos collected per flush were 0.14 and 0.09, respectively. The corresponding repeatabilities were 0.43 and 0.32, respectively. These results show that it was difficult to genetically improve these traits, thus, environmental and physical factors affecting the donor must be improved. These results also show that it is necessary to take the donor''s estrous condition after superovulation and repeatabilities for the number of embryos and transferable embryos collected per flush into account when the genetic gains and inbreeding rates for MOET breeding schemes are predicted by a computer simulation.

Generalized Linear Mixed Model for Multivariate Multilevel Binomial Data (다변량 다수준 이항자료에 대한 일반화선형혼합모형)

  • Lim, Hwa-Kyung;Song, Seuck-Heun;Song, Ju-Won;Cheon, Soo-Young
    • The Korean Journal of Applied Statistics
    • /
    • v.21 no.6
    • /
    • pp.923-932
    • /
    • 2008
  • We are likely to face complex multivariate data which can be characterized by having a non-trivial correlation structure. For instance, omitted covariates may simultaneously affect more than one count in clustered data; hence, the modeling of the correlation structure is important for the efficiency of the estimator and the computation of correct standard errors, i.e., valid inference. A standard way to insert dependence among counts is to assume that they share some common unobservable variables. For this assumption, we fitted correlated random effect models considering multilevel model. Estimation was carried out by adopting the semiparametric approach through a finite mixture EM algorithm without parametric assumptions upon the random coefficients distribution.