• Title/Summary/Keyword: Generalized log linear model

Search Result 15, Processing Time 0.024 seconds

Binary regression model using skewed generalized t distributions (기운 일반화 t 분포를 이용한 이진 데이터 회귀 분석)

  • Kim, Mijeong
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.5
    • /
    • pp.775-791
    • /
    • 2017
  • We frequently encounter binary data in real life. Logistic, Probit, Cauchit, Complementary log-log models are often used for binary data analysis. In order to analyze binary data, Liu (2004) proposed a Robit model, in which the inverse of cdf of the Student's t distribution is used as a link function. Kim et al. (2008) also proposed a generalized t-link model to make the binary regression model more flexible. The more flexible skewed distributions allow more flexible link functions in generalized linear models. In the sense, we propose a binary data regression model using skewed generalized t distributions introduced in Theodossiou (1998). We implement R code of the proposed models using the glm function included in R base and R sgt package. We also analyze Pima Indian data using the proposed model in R.

Mutual Information and Redundancy for Categorical Data

  • Hong, Chong-Sun;Kim, Beom-Jun
    • Communications for Statistical Applications and Methods
    • /
    • v.13 no.2
    • /
    • pp.297-307
    • /
    • 2006
  • Most methods for describing the relationship among random variables require specific probability distributions and some assumptions of random variables. The mutual information based on the entropy to measure the dependency among random variables does not need any specific assumptions. And the redundancy which is a analogous version of the mutual information was also proposed. In this paper, the redundancy and mutual information are explored to multi-dimensional categorical data. It is found that the redundancy for categorical data could be expressed as the function of the generalized likelihood ratio statistic under several kinds of independent log-linear models, so that the redundancy could also be used to analyze contingency tables. Whereas the generalized likelihood ratio statistic to test the goodness-of-fit of the log-linear models is sensitive to the sample size, the redundancy for categorical data does not depend on sample size but its cell probabilities itself.

Empirical Comparisons of Disparity Measures for Three Dimensional Log-Linear Models

  • Park, Y.S.;Hong, C.S.;Jeong, D.B.
    • Journal of the Korean Data and Information Science Society
    • /
    • v.17 no.2
    • /
    • pp.543-557
    • /
    • 2006
  • This paper is concerned with the applicability of the chi-square approximation to the six disparity statistics: the Pearson chi-square, the generalized likelihood ratio, the power divergence, the blended weight chi-square, the blended weight Hellinger distance, and the negative exponential disparity statistic. Three dimensional contingency tables of small and moderate sample sizes are generated to be fitted to all possible hierarchical log-linear models: the completely independent model, the conditionally independent model, the partial association models, and the model with one variable independent of the other two. For models with direct solutions of expected cell counts, point estimates and confidence intervals of the 90 and 95 percentage points of six statistics are explored. For model without direct solutions, the empirical significant levels and the empirical powers of six statistics to test the significance of the three factor interaction are computed and compared.

  • PDF

Analysis of Online Behavior and Prediction of Learning Performance in Blended Learning Environments

  • JO, Il-Hyun;PARK, Yeonjeong;KIM, Jeonghyun;SONG, Jongwoo
    • Educational Technology International
    • /
    • v.15 no.2
    • /
    • pp.71-88
    • /
    • 2014
  • A variety of studies to predict students' performance have been conducted since educational data such as web-log files traced from Learning Management System (LMS) are increasingly used to analyze students' learning behaviors. However, it is still challenging to predict students' learning achievement in blended learning environment where online and offline learning are combined. In higher education, diverse cases of blended learning can be formed from simple use of LMS for administrative purposes to full usages of functions in LMS for online distance learning class. As a result, a generalized model to predict students' academic success does not fulfill diverse cases of blended learning. This study compares two blended learning classes with each prediction model. The first blended class which involves online discussion-based learning revealed a linear regression model, which explained 70% of the variance in total score through six variables including total log-in time, log-in frequencies, log-in regularities, visits on boards, visits on repositories, and the number of postings. However, the second case, a lecture-based class providing regular basis online lecture notes in Moodle show weaker results from the same linear regression model mainly due to non-linearity of variables. To investigate the non-linear relations between online activities and total score, RF (Random Forest) was utilized. The results indicate that there are different set of important variables for the two distinctive types of blended learning cases. Results suggest that the prediction models and data-mining technique should be based on the considerations of diverse pedagogical characteristics of blended learning classes.

Study of the high pressure hose assemblies by accelerated life test (고압호스 조립체의 가속수명시험에 관한 연구)

  • Lee, Gi Chun;Lee, Yong Bum
    • Journal of Advanced Marine Engineering and Technology
    • /
    • v.37 no.8
    • /
    • pp.886-892
    • /
    • 2013
  • Hydraulic hose assemblies are used as piping components for construction machinery, automobile, aircraft, industrial machinery, machine tools, and machinery for ships. Then the reliability of hose assemblies is important because total hydraulic system, which used to deliver the fluid power ($P^*Q$) needed to flexibility in the piping system, is not operated if the hose assembly failed in the system. The data of the accelerated life test estimated through the shape parameter(${\beta}$) resulting of the Weibull distribution analysis. This study has tried to reduce the test time resulting from varying impulse pressure range and the flexing diameter. Accelerated life test model for the test results was adopted the GLL(generalized log linear) and the accelerated indexes are identified as 6.64 for the pressure and 4.46 for flexing radius. Also, it found that shape parameter is 6.19, scale parameter(${\eta}$) is $1.035{\times}108$, which were adopted the pressure 35 MPa and the flexing diameter R100 mm in the used condition.

Estimation and variable selection in censored regression model with smoothly clipped absolute deviation penalty

  • Shim, Jooyong;Bae, Jongsig;Seok, Kyungha
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.6
    • /
    • pp.1653-1660
    • /
    • 2016
  • Smoothly clipped absolute deviation (SCAD) penalty is known to satisfy the desirable properties for penalty functions like as unbiasedness, sparsity and continuity. In this paper, we deal with the regression function estimation and variable selection based on SCAD penalized censored regression model. We use the local linear approximation and the iteratively reweighted least squares algorithm to solve SCAD penalized log likelihood function. The proposed method provides an efficient method for variable selection and regression function estimation. The generalized cross validation function is presented for the model selection. Applications of the proposed method are illustrated through the simulated and a real example.

Claims Reserving via Kernel Machine

  • Kim, Mal-Suk;Park, He-Jung;Hwang, Chang-Ha;Shim, Joo-Yong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.19 no.4
    • /
    • pp.1419-1427
    • /
    • 2008
  • This paper shows the kernel Poisson regression which can be applied in the claims reserving, where the row effect is assumed to be a nonlinear function of the row index. The paper concentrates on the chain-ladder technique, within the framework of the chain-ladder linear model. It is shown that the proposed method can provide better reserve estimates than the Poisson model. The cross validation function is introduced to choose optimal hyper-parameters in the procedure. Experimental results are then presented which indicate the performance of the proposed model.

  • PDF

New Response Surface Approach to Optimize Medium Composition for Production of Bacteriocin by Lactobacillus acidophilus ATCC 4356

  • RHEEM, SUNGSUE;SEJONG OH;KYOUNG SIK HAN;JEE YOUNG IMM;SAEHUN KIM
    • Journal of Microbiology and Biotechnology
    • /
    • v.12 no.3
    • /
    • pp.449-456
    • /
    • 2002
  • The objective of this study was to optimize medium composition of initial pH, tryptone, glucose, yeast extract, and mineral mixture for production of bacteriocin by Lactobacillus acidophilus ATCC 4356, using response surface methodology. A response surface approach including new statistical and plotting methods was employed for design and analysis of the experiment. An interiorly augmented central composite design was used as an experimental design. A normal-distribution log-link generalized linear model based on a subset fourth-order polynomial ($R^2$=0.94, Mean Error Deviance=0.0065) was used as an analysis model. This model was statistically superior to the full second-order polynomial-based generalized linear model ($R^2$=0.80, Mean Error Deviance=0.0140). Nonlinear programming determined the optimum composition of the medium as initial pH 6.35, typtone $1.21\%$, glucose $0.9\%$, yeast extract $0.65\%$, and mineral mixture $1.17\%$. A validation experiment confirmed that the optimized medium was comparable to the MRS medium in bacteriocin production, having the advantage of economy and practicality.

Design Criterion for Estimating Mean and Variance Functions

  • Lim, Yong B.
    • International Journal of Quality Innovation
    • /
    • v.1 no.1
    • /
    • pp.32-37
    • /
    • 2000
  • In an industrial process, the proper objective is to find the optimal operating conditions with minimum process variability around the target. Vining and Myers(1990) suggest to use the separate model for the mean response and the process varian linear predictor ${\tau}_i={\log}\;{\sigma}^2_i$ is unknown and should be estimated. Noting that the variance of $\hat{{\tau}_i}$ is heterogeneous, another appropriate D-optimality criterion $D_3$ based on the method of generalized least squares is proposed in this paper.

  • PDF

Estimation of the number of discontinuity points based on likelihood (가능도함수를 이용한 불연속점 수의 추정)

  • Huh, Jib
    • Journal of the Korean Data and Information Science Society
    • /
    • v.21 no.1
    • /
    • pp.51-59
    • /
    • 2010
  • In the case that the regression function has a discontinuity point in generalized linear model, Huh (2009) estimated the location and jump size using the log-likelihood weighted the one-sided kernel function. In this paper, we consider estimation of the unknown number of the discontinuity points in the regression function. The proposed algorithm is based on testing of the existence of a discontinuity point coming from the asymptotic distribution of the estimated jump size described in Huh (2009). The finite sample performance is illustrated by simulated example.