• Title/Summary/Keyword: Logistic 모형

Search Result 690, Processing Time 0.062 seconds

Nomogram comparison conducted by logistic regression and naïve Bayesian classifier using type 2 diabetes mellitus (T2D) (제 2형 당뇨병을 이용한 로지스틱과 베이지안 노모그램 구축 및 비교)

  • Park, Jae-Cheol;Kim, Min-Ho;Lee, Jea-Young
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.5
    • /
    • pp.573-585
    • /
    • 2018
  • In this study, we fit the logistic regression model and naïve Bayesian classifier model using 11 risk factors to predict the incidence rate probability for type 2 diabetes mellitus. We then introduce how to construct a nomogram that can help people visually understand it. We use data from the 2013-2015 Korean National Health and Nutrition Examination Survey (KNHANES). We take 3 interactions in the logistic regression model to improve the quality of the analysis and facilitate the application of the left-aligned method to the Bayesian nomogram. Finally, we compare the two nomograms and examine their utility. Then we verify the nomogram using the ROC curve.

Logistic Regressions with Sensory Evaluation Data about Hanwoo Steer Beef (한우 거세우 고기 관능평가 데이터의 로지스틱 회귀분석)

  • Lee, Hye-Jung;Kim, Jae-Hee
    • The Korean Journal of Applied Statistics
    • /
    • v.23 no.5
    • /
    • pp.857-870
    • /
    • 2010
  • This study was conducted to investigate the relationship between the socio-demographic factors and the Korean consumers palatability evaluation grades with Hanwoo sensory evaluation data from 2006 to 2008 by National Institute of Animal Science. The dichotomy logistic regression model and the multinomial logistic regression model are fitted with the independent variables such as the consumer living location, age, gender occupation, monthly income, beef cut and the the palatability grade as the categorical dependent variable and tenderness, 리avor and juiciness as the continuous dependent variable. Stepwise variable selection procedure is incorporated to find the final model and odds ratios are calculated to nd the associations between categories.

Parameter estimation for the imbalanced credit scoring data using AUC maximization (AUC 최적화를 이용한 낮은 부도율 자료의 모수추정)

  • Hong, C.S.;Won, C.H.
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.2
    • /
    • pp.309-319
    • /
    • 2016
  • For binary classification models, we consider a risk score that is a function of linear scores and estimate the coefficients of the linear scores. There are two estimation methods: one is to obtain MLEs using logistic models and the other is to estimate by maximizing AUC. AUC approach estimates are better than MLEs when using logistic models under a general situation which does not support logistic assumptions. This paper considers imbalanced data that contains a smaller number of observations in the default class than those in the non-default for credit assessment models; consequently, the AUC approach is applied to imbalanced data. Various logit link functions are used as a link function to generate imbalanced data. It is found that predicted coefficients obtained by the AUC approach are equivalent to (or better) than those from logistic models for low default probability - imbalanced data.

Variable Selection with Log-Density in Logistic Regression Model (로지스틱회귀모형에서 로그-밀도비를 이용한 변수의 선택)

  • Kahng, Myung-Wook;Shin, Eun-Young
    • Communications for Statistical Applications and Methods
    • /
    • v.19 no.1
    • /
    • pp.1-11
    • /
    • 2012
  • We present methods to study the log-density ratio of the conditional densities of the predictors given the response variable in the logistic regression model. This allows us to select which predictors are needed and how they should be included in the model. If the conditional distributions are skewed, the distributions can be considered as gamma distributions. A simulation study shows that the linear and log terms are required in general. If the conditional distributions of xjy for the two groups overlap significantly, we need both the linear and log terms; however, only the linear or log term is needed in the model if they are well separated.

Genetic Aspects of the Growth Curve Parameters in Hanwoo Cows (한우 암소의 성장곡선 모수에 대한 유전적 경향)

  • Lee, Chang-U;Choe, Jae-Gwan;Jeon, Gi-Jun;Kim, Hyeong-Cheol
    • Journal of Animal Science and Technology
    • /
    • v.48 no.1
    • /
    • pp.29-38
    • /
    • 2006
  • The objective of this study was to estimate genetic variances of growth curve parameters in Hanwoo cows. The data used in this study were records from 1,083 Hanwoo cows raised at Hanwoo Experiment Station, National Livestock Research Institute(NLRI). First evaluation model(Model I) fit year-season of birth and age of dam as fixed effects and second model(Model II) added age at the final weight as a linear covariate to Model I. Heritability estimates of A, b and k from Gompertz model were 0.22, 0.11 and 0.07 using modelⅠ and 0.28, 0.11 and 0.12 using modelⅡ. Those from Von Bertalanffy model were 0.22, 0.11 and 0.07 using modelⅠ, 0.28, 0.11 and 0.12 using modelⅡ. Heritability estimates of A, b and k from Logistic model were 0.14, 0.07 and 0.05 using modelⅠ, 0.18, 0.07 and 0.12 using modelⅡ. Heritability estimates of A from Gompertz model were higher than those from Von Bertalanffy model or Logistic model in both model Ⅰand model Ⅱ. Heritability estimates of b from Logistic model were higher than those from Gompertz model or Von Bertalanffy model in both modelⅠand model Ⅱ. Heritability estimates of birth weight, weaning weight, 3 month weight, 6 month weight, 9 month weight, 12 month weight, 18 month weight, 24 month weight, 36 month weight were after linear age adjustment 0.27, 0.11, 0.19, 0.14, 0.16, 0.23, 0.52 and 0.32, respectively. Heritability estimates of birth weight, weaning weight, 3 month weight, 6 month weight, 9 month weight and 24 month weight fit by Gompertz model were larger than those estimated from linearly adjusted data. Heritability estimates of 12 month weight, 18 month weight and 36 month weight fit by Von Bertalanffy model were larger than those estimated from linearly adjusted data. In the multitrait analyses for parameters from Gompertz model, genetic and phenotypic correlations between A and k parameters were -0.47 and -0.67 using modelⅠand -0.56 and -0.63 using model Ⅱ. Those between the A and b parameters were 0.69 and 0.34 using modelⅠand 0.72 and 0.37 using model Ⅱ. Those between the b and k parameters were -0.26 and 0.01 using modelⅠand -0.30 and 0.01 using model Ⅱ. In the multitrait analyses for parameters from Von Bertalanffy model, genetic and phenotypic correlations between A and k parameters were -0.49 and -0.67 suing model Ⅰ and -0.57 and -0.70 using modelⅡ. Those between the A and b parameters were 0.61 and 0.33 using modelⅠ and 0.60 and 0.30 using model Ⅱ. Those between the b and k parameters were -0.20 and 0.02 using modelⅠ and 0.16 and 0.00 using modelⅡ. In the multitrait analyses for parameters from Logistic model, genetic and phenotypic correlations between A and k parameters were -0.43 and -0.67 using model Ⅰ and -0.50 and -0.63 using modelⅡ. Those between the A and b parameters were 0.47 and 0.22 using modelⅠ and 0.38 and 0.24 using modelⅡ. Those between the b and k parameters were -0.09 and 0.02 using model Ⅰ and -0.02 and 0.13 using model Ⅱ.

Making a Hazard Map of Road Slope Using a GIS and Logistic Regression Model (GIS와 Logistic 회귀모형을 이용한 접도사면 재해위험도 작성)

  • Kang, In-Joon;Kang, Ho-Yun;Jang, Yong-Gu;Kwak, Young-Joo
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.14 no.1 s.35
    • /
    • pp.85-91
    • /
    • 2006
  • Recently, slope failures are happen to natural disastrous when they occur in mountainous areas adjoining highways in Korea. The accidents associated with slope failures have increased due to rapid urbanization of mountainous areas. Therefore, Regular maintenance is essential for all slope and needs maintenance of road safety as well as road function. In this study, we take priority of making a database of risk factor of the failure of a slope before assesment and analysis. The purpose of this paper is to recommend a standard of Slope Management Information Sheet(SMIS) like as Hazard Map. The next research, we suggest to pre-estimated model of a road slope using Logistic Regression Model.

  • PDF

Prediction on Busan's Gross Product and Employment of Major Industry with Logistic Regression and Machine Learning Model (로지스틱 회귀모형과 머신러닝 모형을 활용한 주요산업의 부산 지역총생산 및 고용 효과 예측)

  • Chae-Deug Yi
    • Korea Trade Review
    • /
    • v.47 no.2
    • /
    • pp.69-88
    • /
    • 2022
  • This paper aims to predict Busan's regional product and employment using the logistic regression models and machine learning models. The following are the main findings of the empirical analysis. First, the OLS regression model shows that the main industries such as electricity and electronics, machine and transport, and finance and insurance affect the Busan's income positively. Second, the binomial logistic regression models show that the Busan's strategic industries such as the future transport machinery, life-care, and smart marine industries contribute on the Busan's income in large order. Third, the multinomial logistic regression models show that the Korea's main industries such as the precise machinery, transport equipment, and machinery influence the Busan's economy positively. And Korea's exports and the depreciation can affect Busan's economy more positively at the higher employment level. Fourth, the voting ensemble model show the higher predictive power than artificial neural network model and support vector machine models. Furthermore, the gradient boosting model and the random forest show the higher predictive power than the voting model in large order.

A Study on Accident Prediction Models for Chemical Accidents Using the Logistic Regression Analysis Model (로지스틱회귀분석 모델을 활용한 화학사고 사상사고 예측모형 개발 연구)

  • Lee, Tae-Hyung;Park, Choon-Hwa;Park, Hyo-Hyeon;Kwak, Dae-Hoon
    • Fire Science and Engineering
    • /
    • v.33 no.6
    • /
    • pp.72-79
    • /
    • 2019
  • Through this study, we developed a model for predicting chemical accidents lead to casualties. The model was derived from the logistic regression analysis model and applied to the variables affecting the accident. The accident data used in the model was analyzed by studying the statistics of past chemical accidents, and applying independent variables that were statistically significant through data analysis, such as the type of accident, cause, place of occurrence, status of casualties, and type of chemical accident that caused the casualties. A significance of p < 0.05 was applied. The model developed in this study is meaningful for the prevention of casualties caused by chemical accidents and the establishment of safety systems in the workplace. The analysis using the model found that the most influential factor in the occurrence of casualty in accidents was chemical explosions. Therefore, there is an urgent need to prepare countermeasures to prevent chemical accidents, specifically explosions, from occurring in the workplace.

Binary regression model using skewed generalized t distributions (기운 일반화 t 분포를 이용한 이진 데이터 회귀 분석)

  • Kim, Mijeong
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.5
    • /
    • pp.775-791
    • /
    • 2017
  • We frequently encounter binary data in real life. Logistic, Probit, Cauchit, Complementary log-log models are often used for binary data analysis. In order to analyze binary data, Liu (2004) proposed a Robit model, in which the inverse of cdf of the Student's t distribution is used as a link function. Kim et al. (2008) also proposed a generalized t-link model to make the binary regression model more flexible. The more flexible skewed distributions allow more flexible link functions in generalized linear models. In the sense, we propose a binary data regression model using skewed generalized t distributions introduced in Theodossiou (1998). We implement R code of the proposed models using the glm function included in R base and R sgt package. We also analyze Pima Indian data using the proposed model in R.

A Case Study on Text Analysis Using Meal Kit Product Review Data (밀키트 제품 리뷰 데이터를 이용한 텍스트 분석 사례 연구)

  • Choi, Hyeseon;Yeon, Kyupil
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.5
    • /
    • pp.1-15
    • /
    • 2022
  • In this study, text analysis was performed on the mealkit product review data to identify factors affecting the evaluation of the mealkit product. The data used for the analysis were collected by scraping 334,498 reviews of mealkit products in Naver shopping site. After preprocessing the text data, wordclouds and sentiment analyses based on word frequency and normalized TF-IDF were performed. Logistic regression model was applied to predict the polarity of reviews on mealkit products. From the logistic regression models derived for each product category, the main factors that caused positive and negative emotions were identified. As a result, it was verified that text analysis can be a useful tool that provides a basis for maximizing positive factors for a specific category, menu, and material and removing negative risk factors when developing a mealkit product.