• Title/Summary/Keyword: Binary logistic model

Search Result 163, Processing Time 0.028 seconds

Exploring interaction using 3-D residual plots in logistic regression model (3차원 잔차산점도를 이용한 로지스틱회귀모형에서 교호작용의 탐색)

  • Kahng, Myung-Wook
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.1
    • /
    • pp.177-185
    • /
    • 2014
  • Under bivariate normal distribution assumptions, the interaction and quadratic terms are needed in the logistic regression model with two predictors. However, depending on the correlation coefficient and the variances of two conditional distributions, the interaction and quadratic terms may not be necessary. Although the need for these terms can be determined by comparing the two scatter plots, it is not as useful for interaction terms. We explore the structure and usefulness of the 3-D residual plot as a tool for dealing with interaction in logistic regression models. If predictors have an interaction effect, a 3-D residual plot can show the effect. This is illustrated by simulated and real data.

Log-density Ratio with Two Predictors in a Logistic Regression Model (로지스틱 회귀모형에서 이변량 정규분포에 근거한 로그-밀도비)

  • Kahng, Myung Wook;Yoon, Jae Eun
    • The Korean Journal of Applied Statistics
    • /
    • v.26 no.1
    • /
    • pp.141-149
    • /
    • 2013
  • We present methods for studying the log-density ratio that enables the selection of the predictors and the form to be included in the logistic regression model. Under bivariate normal distributional assumptions, we investigate the form of the log-density ratio as a function of two predictors. If two covariance matrices are equal, then the crossproduct and quadratic terms are not needed. If the variables are uncorrelated, we do not need the crossproduct terms, but we still need the linear and quadratic terms. We also explore other conditions in which the crossproduct and quadratic terms are not needed in the logistic regression model.

Geographically weighted kernel logistic regression for small area proportion estimation

  • Shim, Jooyong;Hwang, Changha
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.2
    • /
    • pp.531-538
    • /
    • 2016
  • In this paper we deal with the small area estimation for the case that the response variables take binary values. The mixed effects models have been extensively studied for the small area estimation, which treats the spatial effects as random effects. However, when the spatial information of each area is given specifically as coordinates it is popular to use the geographically weighted logistic regression to incorporate the spatial information by assuming that the regression parameters vary spatially across areas. In this paper, relaxing the linearity assumption and propose a geographically weighted kernel logistic regression for estimating small area proportions by using basic principle of kernel machine. Numerical studies have been carried out to compare the performance of proposed method with other methods in estimating small area proportion.

Semiparametric kernel logistic regression with longitudinal data

  • Shim, Joo-Yong;Seok, Kyung-Ha
    • Journal of the Korean Data and Information Science Society
    • /
    • v.23 no.2
    • /
    • pp.385-392
    • /
    • 2012
  • Logistic regression is a well known binary classification method in the field of statistical learning. Mixed-effect regression models are widely used for the analysis of correlated data such as those found in longitudinal studies. We consider kernel extensions with semiparametric fixed effects and parametric random effects for the logistic regression. The estimation is performed through the penalized likelihood method based on kernel trick, and our focus is on the efficient computation and the effective hyperparameter selection. For the selection of optimal hyperparameters, cross-validation techniques are employed. Numerical results are then presented to indicate the performance of the proposed procedure.

A Study on Diagnostics Method for Categorical Data (범주형 자료의 진단방법에 관한 연구)

  • 이선규;조범석
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.18 no.33
    • /
    • pp.93-102
    • /
    • 1995
  • In this study we are concerned with the diagnostics method of cross-classified categorical data using logistic regression model of binary response models for cell proportions. under this model, we could examine the goodness-of-fit of the models using Pearson's $x^2$test statistic and likelihood ratio statistic. Under this model, these statistics are assumed that sample survey schemes are with replacement sampling model. But these statistics are often inappropriate for analysing contingency tables consists of complex sampling schemes obtained sample survey data. In this study we are examined diagnostics procedures detecting any outlying cell proportions and influential observations on design space in logistic regression modeltake account of the survey design effects.

  • PDF

Goodness-of-fit tests for a proportional odds model

  • Lee, Hyun Yung
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.6
    • /
    • pp.1465-1475
    • /
    • 2013
  • The chi-square type test statistic is the most commonly used test in terms of measuring testing goodness-of-fit for multinomial logistic regression model, which has its grouped data (binomial data) and ungrouped (binary) data classified by a covariate pattern. Chi-square type statistic is not a satisfactory gauge, however, because the ungrouped Pearson chi-square statistic does not adhere well to the chi-square statistic and the ungrouped Pearson chi-square statistic is also not a satisfactory form of measurement in itself. Currently, goodness-of-fit in the ordinal setting is often assessed using the Pearson chi-square statistic and deviance tests. These tests involve creating a contingency table in which rows consist of all possible cross-classifications of the model covariates, and columns consist of the levels of the ordinal response. I examined goodness-of-fit tests for a proportional odds logistic regression model-the most commonly used regression model for an ordinal response variable. Using a simulation study, I investigated the distribution and power properties of this test and compared these with those of three other goodness-of-fit tests. The new test had lower power than the existing tests; however, it was able to detect a greater number of the different types of lack of fit considered in this study. I illustrated the ability of the tests to detect lack of fit using a study of aftercare decisions for psychiatrically hospitalized adolescents.

Using Data Mining Techniques to Predict Win-Loss in Korean Professional Baseball Games (데이터마이닝을 활용한 한국프로야구 승패예측모형 수립에 관한 연구)

  • Oh, Younhak;Kim, Han;Yun, Jaesub;Lee, Jong-Seok
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.40 no.1
    • /
    • pp.8-17
    • /
    • 2014
  • In this research, we employed various data mining techniques to build predictive models for win-loss prediction in Korean professional baseball games. The historical data containing information about players and teams was obtained from the official materials that are provided by the KBO website. Using the collected raw data, we additionally prepared two more types of dataset, which are in ratio and binary format respectively. Dividing away-team's records by the records of the corresponding home-team generated the ratio dataset, while the binary dataset was obtained by comparing the record values. We applied seven classification techniques to three (raw, ratio, and binary) datasets. The employed data mining techniques are decision tree, random forest, logistic regression, neural network, support vector machine, linear discriminant analysis, and quadratic discriminant analysis. Among 21(= 3 datasets${\times}$7 techniques) prediction scenarios, the most accurate model was obtained from the random forest technique based on the binary dataset, which prediction accuracy was 84.14%. It was also observed that using the ratio and the binary dataset helped to build better prediction models than using the raw data. From the capability of variable selection in decision tree, random forest, and stepwise logistic regression, we found that annual salary, earned run, strikeout, pitcher's winning percentage, and four balls are important winning factors of a game. This research is distinct from existing studies in that we used three different types of data and various data mining techniques for win-loss prediction in Korean professional baseball games.

An Analysis of Environmental Policy Effect on Green Space Change using Logistic Regression Model : The Case of Ulsan Metropolitan City (로지스틱 회귀모형을 이용한 환경정책 효과 분석: 울산광역시 녹지변화 분석을 중심으로)

  • Lee, Sung-Joo;Ryu, Ji-Eun;Jeon, Seong-Woo
    • Journal of the Korean Society of Environmental Restoration Technology
    • /
    • v.23 no.4
    • /
    • pp.13-30
    • /
    • 2020
  • This study aims to analyze the qualitative and quantitative effects of environmental policies in terms of green space management using logistic regression model(LRM). Landsat satellite imageries in 1985, 1992, 2000, 2008, and 2015 are classified using a hybrid-classification method. Based on these classified maps, logistic regression model having a deforestation tendency of the past is built. Binary green space change map is used for the dependent variable and four explanatory variables are used: distance from green space, distance from settlements, elevation, and slope. The green space map of 2008 and 2015 is predicted using the constructed model. The conservation effect of Ulsan's environmental policies is quantified through the numerical comparison of green area between the predicted and real data. Time-series analysis of green space showed that restoration and destruction of green space are highly related to human activities rather than natural land transition. The effect of green space management policy was spatially-explicit and brought a significant increase in green space. Furthermore, as a result of quantitative analysis, Ulsan's environmental policy had effects of conserving and restoring 111.75㎢ and 175.45㎢ respectively for the periods of eight and fifteen years. Among four variables, slope was the most determinant factor that accounts for the destruction of green space in the city. This study presents logistic regression model as a way of evaluating the effect of environmental policies that have been practiced in the city. It has its significance in that it allows us a comprehensive understanding of the effect by considering every direct and indirect effect from other domains, such as air and water, on green space. We conclude discussing practicability of implementing environmental policy in terms of green space management with the focus on a non-statutory plan.

Variable Selection with Log-Density in Logistic Regression Model (로지스틱회귀모형에서 로그-밀도비를 이용한 변수의 선택)

  • Kahng, Myung-Wook;Shin, Eun-Young
    • Communications for Statistical Applications and Methods
    • /
    • v.19 no.1
    • /
    • pp.1-11
    • /
    • 2012
  • We present methods to study the log-density ratio of the conditional densities of the predictors given the response variable in the logistic regression model. This allows us to select which predictors are needed and how they should be included in the model. If the conditional distributions are skewed, the distributions can be considered as gamma distributions. A simulation study shows that the linear and log terms are required in general. If the conditional distributions of xjy for the two groups overlap significantly, we need both the linear and log terms; however, only the linear or log term is needed in the model if they are well separated.

Factors Affecting on Suicidal Ideation in Public Assistance Recipients (공공부조 수급자의 자살생각 영향요인)

  • Lee, Ju Hyun;Kim, Min Ji;Lee, Byeong Hui;Noh, Jin-Won
    • The Journal of the Korea Contents Association
    • /
    • v.15 no.8
    • /
    • pp.366-374
    • /
    • 2015
  • This study investigated what factors would affect suicidal ideation of the people who have received public assistance. For the purpose, the survey results of the 7th year(2012) of Korea Welfare Panel Study, which were conducted by Korea Institute for health and Social Affairs and Social Welfare Research Institute of Seoul National University, were used for analysis. In order to figure out the level of influence on the suicidal ideation, a binary logistic regression analysis using a binary logistic model was used as an analysis method. As a result, it was found that when the subjects are middle school graduates, and if they are married, there are low suicidal ideation. Also, the higher their self-esteem is and the higher their satisfaction with public assistance, the lower there they have suicidal ideation. Furthermore, it was proved that if they have depression, or in middle age, they have high possibility of suicidal ideation. It was proved that satisfaction with public assistance also can have influence on the suicidal ideation of the poor class, not only physical and psychological factors. Therefore, measuring the satisfaction of the recipients with public assistance can be one of the significant factors that affects suicidal ideation.