• 제목/요약/키워드: Generalized log linear model

검색결과 15건 처리시간 0.019초

기운 일반화 t 분포를 이용한 이진 데이터 회귀 분석 (Binary regression model using skewed generalized t distributions)

  • 김미정
    • 응용통계연구
    • /
    • 제30권5호
    • /
    • pp.775-791
    • /
    • 2017
  • 이진 데이터는 일상 생활에서 자주 접할 수 있는 데이터이다. 이진 데이터를 회귀 분석하는 방법으로 로지스틱(Logistic), 프로빗(Probit), Cauchit, Complementary log-log 모형이 주로 쓰이는데, 이 방법 이외에도 Liu(2004)가 제시한 t 분포를 이용한 로빗(Robit) 모형, Kim 등 (2008)에서 제시한 일반화 t-link 모형을 이용한 방법 등이 있다. 유연한 분포를 이용하면 유연한 회귀 모형이 가능해지는 점에 착안하여, 이 논문에서는 Theodossiou(1998)에서 제시된 기운 일반화 t 분포 (Skewed Generalized t Distribution)의 이용하여 우도 함수를 최대로 하는 이진 데이터 회귀 모형을 소개한다. 기운 일반화 t 분포를 R glm 함수, R sgt 패키지를 연결하여 이 논문에서 제시한 방법을 R로 분석할 수 있는 방법을 소개하고, 피마 인디언(Pima Indian) 데이터를 분석한다.

Mutual Information and Redundancy for Categorical Data

  • Hong, Chong-Sun;Kim, Beom-Jun
    • Communications for Statistical Applications and Methods
    • /
    • 제13권2호
    • /
    • pp.297-307
    • /
    • 2006
  • Most methods for describing the relationship among random variables require specific probability distributions and some assumptions of random variables. The mutual information based on the entropy to measure the dependency among random variables does not need any specific assumptions. And the redundancy which is a analogous version of the mutual information was also proposed. In this paper, the redundancy and mutual information are explored to multi-dimensional categorical data. It is found that the redundancy for categorical data could be expressed as the function of the generalized likelihood ratio statistic under several kinds of independent log-linear models, so that the redundancy could also be used to analyze contingency tables. Whereas the generalized likelihood ratio statistic to test the goodness-of-fit of the log-linear models is sensitive to the sample size, the redundancy for categorical data does not depend on sample size but its cell probabilities itself.

Empirical Comparisons of Disparity Measures for Three Dimensional Log-Linear Models

  • Park, Y.S.;Hong, C.S.;Jeong, D.B.
    • Journal of the Korean Data and Information Science Society
    • /
    • 제17권2호
    • /
    • pp.543-557
    • /
    • 2006
  • This paper is concerned with the applicability of the chi-square approximation to the six disparity statistics: the Pearson chi-square, the generalized likelihood ratio, the power divergence, the blended weight chi-square, the blended weight Hellinger distance, and the negative exponential disparity statistic. Three dimensional contingency tables of small and moderate sample sizes are generated to be fitted to all possible hierarchical log-linear models: the completely independent model, the conditionally independent model, the partial association models, and the model with one variable independent of the other two. For models with direct solutions of expected cell counts, point estimates and confidence intervals of the 90 and 95 percentage points of six statistics are explored. For model without direct solutions, the empirical significant levels and the empirical powers of six statistics to test the significance of the three factor interaction are computed and compared.

  • PDF

Analysis of Online Behavior and Prediction of Learning Performance in Blended Learning Environments

  • JO, Il-Hyun;PARK, Yeonjeong;KIM, Jeonghyun;SONG, Jongwoo
    • Educational Technology International
    • /
    • 제15권2호
    • /
    • pp.71-88
    • /
    • 2014
  • A variety of studies to predict students' performance have been conducted since educational data such as web-log files traced from Learning Management System (LMS) are increasingly used to analyze students' learning behaviors. However, it is still challenging to predict students' learning achievement in blended learning environment where online and offline learning are combined. In higher education, diverse cases of blended learning can be formed from simple use of LMS for administrative purposes to full usages of functions in LMS for online distance learning class. As a result, a generalized model to predict students' academic success does not fulfill diverse cases of blended learning. This study compares two blended learning classes with each prediction model. The first blended class which involves online discussion-based learning revealed a linear regression model, which explained 70% of the variance in total score through six variables including total log-in time, log-in frequencies, log-in regularities, visits on boards, visits on repositories, and the number of postings. However, the second case, a lecture-based class providing regular basis online lecture notes in Moodle show weaker results from the same linear regression model mainly due to non-linearity of variables. To investigate the non-linear relations between online activities and total score, RF (Random Forest) was utilized. The results indicate that there are different set of important variables for the two distinctive types of blended learning cases. Results suggest that the prediction models and data-mining technique should be based on the considerations of diverse pedagogical characteristics of blended learning classes.

고압호스 조립체의 가속수명시험에 관한 연구 (Study of the high pressure hose assemblies by accelerated life test)

  • 이기천;이용범
    • Journal of Advanced Marine Engineering and Technology
    • /
    • 제37권8호
    • /
    • pp.886-892
    • /
    • 2013
  • 고압호스 조립체는 건설기계, 선박, 항공기, 산업기계, 공작기계 및 자동차 등의 각종 유압장치에 널리 유압배관으로 사용된다. 이는 유연성이 필요한 부분에 유체동력($P^*Q$)으로 전달해야 함으로서, 고장이 발생할 경우는 유압시스템 전체가 작동이 불가능함으로서 신뢰성이 매우 중요한 부품이다. 가속 수명 시험 데이터는 와이블분포 분석을 통해서 형상 모수를 추종 하였다. 본 시험연구에서는 실제 가속수명시험 조건의 충격압력과 반복 굽힘을 변화시켜 시험시간을 감소시켰다. 가속수명시험 모형은 GLL(generalized linear)모형을 사용하였으며, 충격압력과 반복 굽힘에 대한 가속지수는 각각 6.64와 4.46으로 확인되었다. 또한 시험 결과에 대한 분석결과 형상모수(${\beta}$)는 6.19이며, 실제 사용조건인 35 MPa과 굽힘 반경 R100 mm를 적용하였을 경우 척도모수(${\eta}$)는 $1.035{\times}108$사이클로 확인되었다.

Estimation and variable selection in censored regression model with smoothly clipped absolute deviation penalty

  • Shim, Jooyong;Bae, Jongsig;Seok, Kyungha
    • Journal of the Korean Data and Information Science Society
    • /
    • 제27권6호
    • /
    • pp.1653-1660
    • /
    • 2016
  • Smoothly clipped absolute deviation (SCAD) penalty is known to satisfy the desirable properties for penalty functions like as unbiasedness, sparsity and continuity. In this paper, we deal with the regression function estimation and variable selection based on SCAD penalized censored regression model. We use the local linear approximation and the iteratively reweighted least squares algorithm to solve SCAD penalized log likelihood function. The proposed method provides an efficient method for variable selection and regression function estimation. The generalized cross validation function is presented for the model selection. Applications of the proposed method are illustrated through the simulated and a real example.

Claims Reserving via Kernel Machine

  • Kim, Mal-Suk;Park, He-Jung;Hwang, Chang-Ha;Shim, Joo-Yong
    • Journal of the Korean Data and Information Science Society
    • /
    • 제19권4호
    • /
    • pp.1419-1427
    • /
    • 2008
  • This paper shows the kernel Poisson regression which can be applied in the claims reserving, where the row effect is assumed to be a nonlinear function of the row index. The paper concentrates on the chain-ladder technique, within the framework of the chain-ladder linear model. It is shown that the proposed method can provide better reserve estimates than the Poisson model. The cross validation function is introduced to choose optimal hyper-parameters in the procedure. Experimental results are then presented which indicate the performance of the proposed model.

  • PDF

New Response Surface Approach to Optimize Medium Composition for Production of Bacteriocin by Lactobacillus acidophilus ATCC 4356

  • RHEEM, SUNGSUE;SEJONG OH;KYOUNG SIK HAN;JEE YOUNG IMM;SAEHUN KIM
    • Journal of Microbiology and Biotechnology
    • /
    • 제12권3호
    • /
    • pp.449-456
    • /
    • 2002
  • The objective of this study was to optimize medium composition of initial pH, tryptone, glucose, yeast extract, and mineral mixture for production of bacteriocin by Lactobacillus acidophilus ATCC 4356, using response surface methodology. A response surface approach including new statistical and plotting methods was employed for design and analysis of the experiment. An interiorly augmented central composite design was used as an experimental design. A normal-distribution log-link generalized linear model based on a subset fourth-order polynomial ($R^2$=0.94, Mean Error Deviance=0.0065) was used as an analysis model. This model was statistically superior to the full second-order polynomial-based generalized linear model ($R^2$=0.80, Mean Error Deviance=0.0140). Nonlinear programming determined the optimum composition of the medium as initial pH 6.35, typtone $1.21\%$, glucose $0.9\%$, yeast extract $0.65\%$, and mineral mixture $1.17\%$. A validation experiment confirmed that the optimized medium was comparable to the MRS medium in bacteriocin production, having the advantage of economy and practicality.

Design Criterion for Estimating Mean and Variance Functions

  • Lim, Yong B.
    • International Journal of Quality Innovation
    • /
    • 제1권1호
    • /
    • pp.32-37
    • /
    • 2000
  • In an industrial process, the proper objective is to find the optimal operating conditions with minimum process variability around the target. Vining and Myers(1990) suggest to use the separate model for the mean response and the process varian linear predictor ${\tau}_i={\log}\;{\sigma}^2_i$ is unknown and should be estimated. Noting that the variance of $\hat{{\tau}_i}$ is heterogeneous, another appropriate D-optimality criterion $D_3$ based on the method of generalized least squares is proposed in this paper.

  • PDF

가능도함수를 이용한 불연속점 수의 추정 (Estimation of the number of discontinuity points based on likelihood)

  • 허집
    • Journal of the Korean Data and Information Science Society
    • /
    • 제21권1호
    • /
    • pp.51-59
    • /
    • 2010
  • 일반화선형모형에서 회귀함수가 하나의 불연속점을 가질 때, Huh (2009)는 하나의 모수를 가지는 지수족의 가능도함수를 한쪽방향커널을 이용하여 그 불연속점의 위치와 점프크기를 추정하였다. 이 논문에서는 미지의 불연속점 수 q개를 가지는 회귀함수인 경우에, Huh (2009)가 제안한 점프크기 추정량의 점근분포를 이용한 가설검정법을 소개하고, 그 가설검정법을 이용한 불연속점 수를 추정하는 알고리듬을 제안하고, 모의실험을 통하여 추정의 정도를 알아보고자 한다.