• Title/Summary/Keyword: 이항반응자료

Search Result 21, Processing Time 0.023 seconds

Statistical Modeling of Learning Curves with Binary Response Data (이항 반응 자료에 대한 학습곡선의 모형화)

  • Lee, Seul-Ji;Park, Man-Sik
    • Communications for Statistical Applications and Methods
    • /
    • v.19 no.3
    • /
    • pp.433-450
    • /
    • 2012
  • As a worker performs a certain operation repeatedly, he tends to become familiar with the job and complete it in a very short time. That means that the efficiency is improved due to his accumulated knowledge, experience and skill in regards to the operation. Investing time in an output is reduced by repeating any operation. This phenomenon is referred to as the learning curve effect. A learning curve is a graphical representation of the changing rate of learning. According to previous literature, learning curve effects are determined by subjective pre-assigned factors. In this study, we propose a new statistical model to clarify the learning curve effect by means of a basic cumulative distribution function. This work mainly focuses on the statistical modeling of binary data. We employ the Newton-Raphson method for the estimation and Delta method for the construction of confidence intervals. We also perform a real data analysis.

Sparse Design Problem in Local Linear Quasi-likelihood Estimator (국소선형 준가능도 추정량의 자료 희박성 문제 해결방안)

  • Park, Dong-Ryeon
    • The Korean Journal of Applied Statistics
    • /
    • v.20 no.1
    • /
    • pp.133-145
    • /
    • 2007
  • Local linear estimator has a number of advantages over the traditional kernel estimators. The better performance near boundaries is one of them. However, local linear estimator can produce erratic result in sparse regions in the realization of the design and to solve this problem much research has been done. Local linear quasi-likelihood estimator has many common properties with local linear estimator, and it turns out that sparse design can also lead local linear quasi-likelihood estimator to erratic behavior in practice. Several methods to solve this problem are proposed and their finite sample properties are compared by the simulation study.

Bivariate Zero-Inflated Negative Binomial Regression Model with Heterogeneous Dispersions (서로 다른 산포를 허용하는 이변량 영과잉 음이항 회귀모형)

  • Kim, Dong-Seok;Jeong, Seul-Gi;Lee, Dong-Hee
    • Communications for Statistical Applications and Methods
    • /
    • v.18 no.5
    • /
    • pp.571-579
    • /
    • 2011
  • We propose a new bivariate zero-inflated negative binomial regression model to allow heterogeneous dispersions. To show the performance of our proposed model, Health Care data in Deb and Trivedi (1997) are used to compare it with the other bivariate zero-inflated negative binomial model proposed by Wang (2003) that has a common dispersion between the two response variables. This empirical study shows better results from the views of log-likelihood and AIC.

A mixed-effects model for overdispersed binomial data (초과변동의 이항자료에 대한 혼합효과 모형)

  • Choi, Jae-Sung
    • Journal of the Korean Data and Information Science Society
    • /
    • v.10 no.1
    • /
    • pp.199-205
    • /
    • 1999
  • This paper discusses the generalized mixed-effects model for the analysis of overdispersed binomial data. Sometimes certain types of sampling designs or genetic characters of experimental units can be regarded as factors of extra binomial variation. For such cases, this paper suggests models with one or two random effects to explain overdispersion caused by those affecting factors and shows how to test for a model adequacy based on deviance.

  • PDF

On sampling algorithms for imbalanced binary data: performance comparison and some caveats (불균형적인 이항 자료 분석을 위한 샘플링 알고리즘들: 성능비교 및 주의점)

  • Kim, HanYong;Lee, Woojoo
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.5
    • /
    • pp.681-690
    • /
    • 2017
  • Various imbalanced binary classification problems exist such as fraud detection in banking operations, detecting spam mail and predicting defective products. Several sampling methods such as over sampling, under sampling, SMOTE have been developed to overcome the poor prediction performance of binary classifiers when the proportion of one group is dominant. In order to overcome this problem, several sampling methods such as over-sampling, under-sampling, SMOTE have been developed. In this study, we investigate prediction performance of logistic regression, Lasso, random forest, boosting and support vector machine in combination with the sampling methods for binary imbalanced data. Four real data sets are analyzed to see if there is a substantial improvement in prediction performance. We also emphasize some precautions when the sampling methods are implemented.

Bayesian ordinal probit semiparametric regression models: KNHANES 2016 data analysis of the relationship between smoking behavior and coffee intake (베이지안 순서형 프로빗 준모수 회귀 모형 : 국민건강영양조사 2016 자료를 통한 흡연양태와 커피섭취 간의 관계 분석)

  • Lee, Dasom;Lee, Eunji;Jo, Seogil;Choi, Taeryeon
    • The Korean Journal of Applied Statistics
    • /
    • v.33 no.1
    • /
    • pp.25-46
    • /
    • 2020
  • This paper presents ordinal probit semiparametric regression models using Bayesian Spectral Analysis Regression (BSAR) method. Ordinal probit regression is a way of modeling ordinal responses - usually more than two categories - by connecting the probability of falling into each category explained by a combination of available covariates using a probit (an inverse function of normal cumulative distribution function) link. The Bayesian probit model facilitates posterior sampling by bringing a latent variable following normal distribution, therefore, the responses are categorized by the cut-off points according to values of latent variables. In this paper, we extend the latent variable approach to a semiparametric model for the Bayesian ordinal probit regression with nonparametric functions using a spectral representation of Gaussian processes based BSAR method. The latent variable is decomposed into a parametric component and a nonparametric component with or without a shape constraint for modeling ordinal responses and predicting outcomes more flexibly. We illustrate the proposed methods with simulation studies in comparison with existing methods and real data analysis applied to a Korean National Health and Nutrition Examination Survey (KNHANES) 2016 for investigating nonparametric relationship between smoking behavior and coffee intake.

Review for time-dependent ROC analysis under diverse survival models (생존 분석 자료에서 적용되는 시간 가변 ROC 분석에 대한 리뷰)

  • Kim, Yang-Jin
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.1
    • /
    • pp.35-47
    • /
    • 2022
  • The receiver operating characteristic (ROC) curve was developed to quantify the classification ability of marker values (covariates) on the response variable and has been extended to survival data with diverse missing data structure. When survival data is understood as binary data (status of being alive or dead) at each time point, the ROC curve expressed at every time point results in time-dependent ROC curve and time-dependent area under curve (AUC). In particular, a follow-up study brings the change of cohort and incomplete data structures such as censoring and competing risk. In this paper, we review time-dependent ROC estimators under several contexts and perform simulation to check the performance of each estimators. We analyzed a dementia dataset to compare the prognostic power of markers.

A comparative study of feature screening methods for ultrahigh dimensional multiclass classification (초고차원 다범주분류를 위한 변수선별 방법 비교 연구)

  • Lee, Kyungeun;Kim, Kyoung Hee;Shin, Seung Jun
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.5
    • /
    • pp.793-808
    • /
    • 2017
  • We compare various variable screening methods on multiclass classification problems when the data is ultrahigh-dimensional. Two different approaches were considered: (1) pairwise extension from binary classification via one versus one or one versus rest comparisons and (2) direct classification of multiclass responses. We conducted extensive simulation studies under different conditions: heavy tailed explanatory variables, correlated signal and noise variables, correlated joint distributions but uncorrelated marginals, and unbalanced response variables. We then analyzed real data to examine the performance of the methods. The results showed that model-free methods perform better for multiclass classification problems as well as binary ones.

Development of the Continuous-Time HGDM with Binomial Sensitivity Factor (이항 반응 계수를 가진 연속 시간형 HGDM의 개발)

  • Park, Joong-Yang;Kim, Seong-Hee;Park, Jae-Heong
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.12
    • /
    • pp.3490-3499
    • /
    • 1999
  • The hyper-geometric distribution software reliability growth model (HGDM) was recently developed and successfully applied to the problem of estimating the number of initial faults residual in a software at the beginning of the test-and-debug phase. Though the HGDM is a time-domain software reliability growth model(SRGM), it is not possible to compare the HGDM with other time-domain SRGMs. Furthermore the usual software reliability can not be computed. These drawbacks are derived from fact that the HGDM is not described in terms of the execution time. Thus we develop a continuous-time HGDM with binomial sensitivity factor in order to remove these drawbacks. Statistical characteristics of the suggested model are studied and its applicability is then examined by analyzing real test data sets. It is empirically shown that the continuous-time HGDM with binomial sensitivity factor can be used as an alternative to the current HGDM.

  • PDF

Fit of the number of insurance solicitor's turnovers using zero-inflated negative binomial regression (영과잉 음이항회귀 모형을 이용한 보험설계사들의 이직횟수 적합)

  • Chun, Heuiju
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.5
    • /
    • pp.1087-1097
    • /
    • 2017
  • This study aims to find the best model to fit the number of insurance solicitor's turnovers of life insurance companies using count data regression models such as poisson regression, negative binomial regression, zero-inflated poisson regression, or zero-inflated negative binomial regression. Out of the four models, zero-inflated negative binomial model has been selected based on AIC and SBC criteria, which is due to over-dispersion and high proportion of zero-counts. The significant factors to affect insurance solicitor's turnover found to be a work period in current company, a total work period as financial planner, an affiliated corporation, and channel management satisfaction. We also have found that as the job satisfaction or the channel management satisfaction gets lower as channel management satisfaction, the number of insurance solicitor's turnovers increases. In addition, the total work period as financial planner has positive relationship with the number of insurance solicitor's turnovers, but the work period in current company has negative relationship with it.