• Title/Summary/Keyword: logistic regression

Search Result 6,270, Processing Time 0.03 seconds

Log-density Ratio with Two Predictors in a Logistic Regression Model (로지스틱 회귀모형에서 이변량 정규분포에 근거한 로그-밀도비)

  • Kahng, Myung Wook;Yoon, Jae Eun
    • The Korean Journal of Applied Statistics
    • /
    • v.26 no.1
    • /
    • pp.141-149
    • /
    • 2013
  • We present methods for studying the log-density ratio that enables the selection of the predictors and the form to be included in the logistic regression model. Under bivariate normal distributional assumptions, we investigate the form of the log-density ratio as a function of two predictors. If two covariance matrices are equal, then the crossproduct and quadratic terms are not needed. If the variables are uncorrelated, we do not need the crossproduct terms, but we still need the linear and quadratic terms. We also explore other conditions in which the crossproduct and quadratic terms are not needed in the logistic regression model.

Comparison of Regression Models for Estimating Ventilation Rate of Mechanically Ventilated Swine Farm (강제환기식 돈사의 환기량 추정을 위한 회귀모델의 비교)

  • Jo, Gwanggon;Ha, Taehwan;Yoon, Sanghoo;Jang, Yuna;Jung, Minwoong
    • Journal of The Korean Society of Agricultural Engineers
    • /
    • v.62 no.1
    • /
    • pp.61-70
    • /
    • 2020
  • To estimate the ventilation volume of mechanically ventilated swine farms, various regression models were applied, and errors were compared to select the regression model that can best simulate actual data. Linear regression, linear spline, polynomial regression (degrees 2 and 3), logistic curve, generalized additive model (GAM), and gompertz curve were compared. Overfitting models were excluded even when the error rate was small. The evaluation criteria were root mean square error (RMSE) and mean absolute percentage error (MAPE). The evaluation results indicated that degree 3 exhibited the lowest error rate; however, an overestimation contradiction was observed in a certain section. The logistic curve was the most stable and superior to all the models. In the estimation of ventilation volume by all of the models, the estimated ventilation volume of the logistic curve was the smallest except for the model with a large error rate and the overestimated model.

Comparing Classification Accuracy of Ensemble and Clustering Algorithms Based on Taguchi Design (다구찌 디자인을 이용한 앙상블 및 군집분석 분류 성능 비교)

  • Shin, Hyung-Won;Sohn, So-Young
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.27 no.1
    • /
    • pp.47-53
    • /
    • 2001
  • In this paper, we compare the classification performances of both ensemble and clustering algorithms (Data Bagging, Variable Selection Bagging, Parameter Combining, Clustering) to logistic regression in consideration of various characteristics of input data. Four factors used to simulate the logistic model are (1) correlation among input variables (2) variance of observation (3) training data size and (4) input-output function. In view of the unknown relationship between input and output function, we use a Taguchi design to improve the practicality of our study results by letting it as a noise factor. Experimental study results indicate the following: When the level of the variance is medium, Bagging & Parameter Combining performs worse than Logistic Regression, Variable Selection Bagging and Clustering. However, classification performances of Logistic Regression, Variable Selection Bagging, Bagging and Clustering are not significantly different when the variance of input data is either small or large. When there is strong correlation in input variables, Variable Selection Bagging outperforms both Logistic Regression and Parameter combining. In general, Parameter Combining algorithm appears to be the worst at our disappointment.

  • PDF

Suppression for Logistic Regression Model (로지스틱 회귀모형에서의 SUPPRESSION)

  • Hong C. S.;Kim H. I.;Ham J. H.
    • The Korean Journal of Applied Statistics
    • /
    • v.18 no.3
    • /
    • pp.701-712
    • /
    • 2005
  • The suppression for logistic regression models has been debated no longer than that for linear regression models since, among many other reasons, sum of squares for regression (SSR) or coefficient of determination ($R^2$) could be defined into various ways. Based on four kinds of $R^2$'s: two kinds are most preferred, and the other two are proposed by Liao & McGee (2003), four kinds of SSR's are derived so that the suppression for logistic models is explained. Many data fitted to logistic models are generated by Monte Carlo method. We explore when suppression happens, and compare with that for linear regression models.

Comparison of Bias Correction Methods for the Rare Event Logistic Regression (희귀 사건 로지스틱 회귀분석을 위한 편의 수정 방법 비교 연구)

  • Kim, Hyungwoo;Ko, Taeseok;Park, No-Wook;Lee, Woojoo
    • The Korean Journal of Applied Statistics
    • /
    • v.27 no.2
    • /
    • pp.277-290
    • /
    • 2014
  • We analyzed binary landslide data from the Boeun area with logistic regression. Since the number of landslide occurrences is only 9 out of 5000 observations, this can be regarded as a rare event data. The main issue of logistic regression with the rare event data is a serious bias problem in regression coefficient estimates. Two bias correction methods were proposed before and we quantitatively compared them via simulation. Firth (1993)'s approach outperformed and provided the most stable results for analyzing the rare-event binary data.

Analysis of Donation Intention of MZ Generation and Senior Generation Using Machine Learning's logistic Regression (머신러닝의 로지스틱 회귀를 활용한 MZ세대와 시니어 세대의 기부의도 분석)

  • Min Jung Oh;IkJin Jeon
    • Journal of Information Technology Services
    • /
    • v.23 no.2
    • /
    • pp.1-12
    • /
    • 2024
  • This study aims to find ways to increase the declining donation intention by using machine learning techniques. To this end, in order to predict factors that affect donations between the MZ generation and the senior generation, various machine learning algorithms, including logistic regression analysis, are applied to build a model to determine variables that affect donation intention, and provide statistical verification and evaluation indicators. In this study, differences in donation intention by generation were expected as a variable affecting donation intention, and the senior generation was expected to show a higher donation intention tendency than the younger generation. However, although the research results were not statistically significant, the younger generation showed a higher intention to donate, and these results are interpreted to mean that value consumption and ethical consumption, which are important to today's MZ generation, also influenced donations. However, there were differences between generations in the amount of donations, and higher donation amounts were confirmed among the senior generation (those in their 50s or older) than the younger generation. In addition, the results of the logistic regression analysis showed that previous donation experience had a positive effect on future donation intention, and the more motivation and importance of donation and various social participation activities online and offline, the more active one became in donating.

Comparison of the Prediction Model of Adolescents' Suicide Attempt Using Logistic Regression and Decision Tree: Secondary Data Analysis of the 2019 Youth Health Risk Behavior Web-Based Survey (로지스틱 회귀모형과 의사결정 나무모형을 활용한 청소년 자살 시도 예측모형 비교: 2019 청소년 건강행태 온라인조사를 이용한 2차 자료분석)

  • Lee, Yoonju;Kim, Heejin;Lee, Yesul;Jeong, Hyesun
    • Journal of Korean Academy of Nursing
    • /
    • v.51 no.1
    • /
    • pp.40-53
    • /
    • 2021
  • Purpose: The purpose of this study was to develop and compare the prediction model for suicide attempts by Korean adolescents using logistic regression and decision tree analysis. Methods: This study utilized secondary data drawn from the 2019 Youth Health Risk Behavior web-based survey. A total of 20 items were selected as the explanatory variables (5 of sociodemographic characteristics, 10 of health-related behaviors, and 5 of psychosocial characteristics). For data analysis, descriptive statistics and logistic regression with complex samples and decision tree analysis were performed using IBM SPSS ver. 25.0 and Stata ver. 16.0. Results: A total of 1,731 participants (3.0%) out of 57,303 responded that they had attempted suicide. The most significant predictors of suicide attempts as determined using the logistic regression model were experience of sadness and hopelessness, substance abuse, and violent victimization. Girls who have experience of sadness and hopelessness, and experience of substance abuse have been identified as the most vulnerable group in suicide attempts in the decision tree model. Conclusion: Experiences of sadness and hopelessness, experiences of substance abuse, and experiences of violent victimization are the common major predictors of suicide attempts in both logistic regression and decision tree models, and the predict rates of both models were similar. We suggest to provide programs considering combination of high-risk predictors for adolescents to prevent suicide attempt.

Hazard Map of Road Slope Using a Logistic Regression Model and GIS (Logistic 회귀모형과 GIS기법을 활용한 접도사면 붕괴확률위험도 제작)

  • Kang Ho-Yun;Kwak Young-Joo;Kang In-Joon;Jang Yong-Gu
    • Proceedings of the Korean Society of Surveying, Geodesy, Photogrammetry, and Cartography Conference
    • /
    • 2006.04a
    • /
    • pp.339-344
    • /
    • 2006
  • Slope failures are happen to natural disastrous when they occur in mountainous areas adjoining highways in Korea. The accidents associated with slope failures have increased due to rapid urbanization of mountainous areas. Therefore, Regular maintenance is essential for all slope and conducted to maintain road safety as well as road function. In this study, we take priority of making a database of risk factor of the failure of a slope before assesment and analysis. The purpose of this paper is to recommend a standard of Slope Management Information Sheet(SMIS) like as Hazard Map. The next research, we suggest to pre-estimated model of a road slope using Logistic Regression Model.

  • PDF

Estimation of Logistic Regression for Two-Stage Case-Control Data (2단계 사례-대조자료를 위한 로지스틱 회귀모형의 추론)

  • 신미영;신은순
    • The Korean Journal of Applied Statistics
    • /
    • v.13 no.2
    • /
    • pp.237-245
    • /
    • 2000
  • In this paper we consider a logistic regression model based on two-stage case-control sampling and study the Weighted Exogeneous Sampling Maximum Likelihood(WESML) method to get an asymptotically normal estimates of the parameters in a logistic regression model. A numerical example is carried out to demonstrate the differences between the Conditional Maximum Likelihood(CML) estimates and the WESML estimates for two-stage case-control data.

  • PDF

Logistic Regression Type Small Area Estimations Based on Relative Error

  • Hwang, Hee-Jin;Shin, Key-Il
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.3
    • /
    • pp.445-453
    • /
    • 2011
  • Almost all small area estimations are obtained by minimizing the mean squared error. Recently relative error prediction methods have been developed and adapted to small area estimation. Usually the estimators obtained by using relative error prediction is called a shrinkage estimator. Especially when data set consists of large range values, the shrinkage estimator is known as having good statistical properties and an easy interpretation. In this paper we study the shrinkage estimators based on logistic regression type estimators for small area estimation. Some simulation studies are performed and the Economically Active Population Survey data of 2005 is used for comparison.