• Title/Summary/Keyword: Logistic models

Search Result 804, Processing Time 0.026 seconds

A Comparative Study of Predictive Factors for Hypertension using Logistic Regression Analysis and Decision Tree Analysis

  • SoHyun Kim;SungHyoun Cho
    • Physical Therapy Rehabilitation Science
    • /
    • v.12 no.2
    • /
    • pp.80-91
    • /
    • 2023
  • Objective: The purpose of this study is to identify factors that affect the incidence of hypertension using logistic regression and decision tree analysis, and to build and compare predictive models. Design: Secondary data analysis study Methods: We analyzed 9,859 subjects from the Korean health panel annual 2019 data provided by the Korea Institute for Health and Social Affairs and National Health Insurance Service. Frequency analysis, chi-square test, binary logistic regression, and decision tree analysis were performed on the data. Results: In logistic regression analysis, those who were 60 years of age or older (Odds ratio, OR=68.801, p<0.001), those who were divorced/widowhood/separated (OR=1.377, p<0.001), those who graduated from middle school or younger (OR=1, reference), those who did not walk at all (OR=1, reference), those who were obese (OR=5.109, p<0.001), and those who had poor subjective health status (OR=2.163, p<0.001) were more likely to develop hypertension. In the decision tree, those over 60 years of age, overweight or obese, and those who graduated from middle school or younger had the highest probability of developing hypertension at 83.3%. Logistic regression analysis showed a specificity of 85.3% and sensitivity of 47.9%; while decision tree analysis showed a specificity of 81.9% and sensitivity of 52.9%. In classification accuracy, logistic regression and decision tree analysis showed 73.6% and 72.6% prediction, respectively. Conclusions: Both logistic regression and decision tree analysis were adequate to explain the predictive model. It is thought that both analysis methods can be used as useful data for constructing a predictive model for hypertension.

Growth curve estimates for wither height, hip height, and body length of Hanwoo steers (Bos taurus coreanae)

  • Park, Hu-Rak;Eum, Seung-Hoon;Roh, Seung-Hee;Sun, Du-Won;Seo, Jakyeom;Cho, Seong-Keun;Lee, Jung-Gyu;Kim, Byeong-Woo
    • Korean Journal of Agricultural Science
    • /
    • v.44 no.3
    • /
    • pp.384-391
    • /
    • 2017
  • Growth curves in Hanwoo steers were estimated by Gompertz, Von Bertalanffy, Logistic, and Brody nonlinear models using growth data collected by the Hanwoo Improvement Center from a total of 6,973 Hanwoo (Bos taurus coreanae) steers 6 to 24 months old that were born between 1996 and 2015. The data included three parameters: A, mature size of body measurement; b, growth ratio; and, k, intrinsic growth rate. Nonlinear regression equations for wither height according to Gompertz, Von Bertalanffy, Logistic, and Brody models were $Y_t=144.7e^{-0.5869e^{-0.00301t}}$, $Y_t=145.3(1-0.1816e^{-0.00284t})^3$, $Y_t=143.1(1+0.7356e^{-0.00352t})^{-1}$, and $Y_t=146.8(1+0.4700e^{-0.00249t})^1$, respectively, while those for hip height were $Y_t=144.5e^{-0.5549e^{-0.00312t}}$, $Y_t=145.0(1-0.1724e^{-0.00295t})^3$, $Y_t=143.1(1+0.6863e^{-0.00360t})^{-1}$, and $Y_t=146.2(1+0.4501e^{-0.00263t})^1$, respectively. Equations for body length $Y_t=174.1e^{-0.8342e^{-0.00289t}}$, $Y_t=175.8(1-0.2500e^{-0.00265t})^3$, $Y_t=170.0(1+1.1548e^{-0.00363t})^{-1}$, and $Y_t=180.3(1+0.6077e^{-0.00215t})^1$, respectively, for the same models. Among the four models, the Brody model resulted in the lowest mean square error, with mean square errors of 31.79, 30.57, and 42.13, respectively, for wither height, hip height, and body length. Also, an estimated birth wither height, birth hip height, and birth body length (77.98, 80.57, and 70.97 cm, respectively) were lower in the Brody model than in other models. An inflection point was not observed during the growth phase of Hanwoo steer according to the growth curves calculated using Gompertz, Von Bertalanffy, and Logistic models. Based on the results, we concluded that the regression equation using the Brody model was the most appropriate among the four growth models. To obtain more accurate parameters, however, using data from a wider production period (from birth to shipping) would be required, and the development of a suitable model for body conformation traits would be needed.

Predictive Bayesian Network Model Using Electronic Patient Records for Prevention of Hospital-Acquired Pressure Ulcers (전자의무기록을 이용한 욕창발생 예측 베이지안 네트워크 모델 개발)

  • Cho, In-Sook;Chung, Eun-Ja
    • Journal of Korean Academy of Nursing
    • /
    • v.41 no.3
    • /
    • pp.423-431
    • /
    • 2011
  • Purpose: The study was designed to determine the discriminating ability of a Bayesian network (BN) for predicting risk for pressure ulcers. Methods: Analysis was done using a retrospective cohort, nursing records representing 21,114 hospital days, 3,348 patients at risk for ulcers, admitted to the intensive care unit of a tertiary teaching hospital between January 2004 and January 2007. A BN model and two logistic regression (LR) versions, model-I and .II, were compared, varying the nature, number and quality of input variables. Classification competence and case coverage of the models were tested and compared using a threefold cross validation method. Results: Average incidence of ulcers was 6.12%. Of the two LR models, model-I demonstrated better indexes of statistical model fits. The BN model had a sensitivity of 81.95%, specificity of 75.63%, positive and negative predictive values of 35.62% and 96.22% respectively. The area under the receiver operating characteristic (AUROC) was 85.01% implying moderate to good overall performance, which was similar to LR model-I. However, regarding case coverage, the BN model was 100% compared to 15.88% of LR. Conclusion: Discriminating ability of the BN model was found to be acceptable and case coverage proved to be excellent for clinical use.

Classification of nuclear activity types for neighboring countries of South Korea using machine learning techniques with xenon isotopic activity ratios

  • Sang-Kyung Lee;Ser Gi Hong
    • Nuclear Engineering and Technology
    • /
    • v.56 no.4
    • /
    • pp.1372-1384
    • /
    • 2024
  • The discrimination of the source for xenon gases' release can provide an important clue for detecting the nuclear activities in the neighboring countries. In this paper, three machine learning techniques, which are logistic regression, support vector machine (SVM), and k-nearest neighbors (KNN), were applied to develop the predictive models for discriminating the source for xenon gases' release based on the xenon isotopic activity ratio data which were generated using the depletion codes, i.e., ORIGEN in SCALE 6.2 and Serpent, for the probable sources. The considered sources for the neighboring countries of South Korea include PWRs, CANDUs, IRT-2000, Yongbyun 5 MWe reactor, and nuclear tests with plutonium and uranium. The results of the analysis showed that the overall prediction accuracies of models with SVM and KNN using six inputs, all exceeded 90%. Particularly, the models based on SVM and KNN that used six or three xenon isotope activity ratios with three classification categories, namely reactor, plutonium bomb, and uranium bomb, had accuracy levels greater than 88%. The prediction performances demonstrate the applicability of machine learning algorithms to predict nuclear threat using ratios of xenon isotopic activity.

Applying Conventional and Saturated Generalized Gamma Distributions in Parametric Survival Analysis of Breast Cancer

  • Yavari, Parvin;Abadi, Alireza;Amanpour, Farzaneh;Bajdik, Chris
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.13 no.5
    • /
    • pp.1829-1831
    • /
    • 2012
  • Background: The generalized gamma distribution statistics constitute an extensive family that contains nearly all of the most commonly used distributions including the exponential, Weibull and log normal. A saturated version of the model allows covariates having effects through all the parameters of survival time distribution. Accelerated failure-time models assume that only one parameter of the distribution depends on the covariates. Methods: We fitted both the conventional GG model and the saturated form for each of its members including the Weibull and lognormal distribution; and compared them using likelihood ratios. To compare the selected parameter distribution with log logistic distribution which is a famous distribution in survival analysis that is not included in generalized gamma family, we used the Akaike information criterion (AIC; r=l(b)-2p). All models were fitted using data for 369 women age 50 years or more, diagnosed with stage IV breast cancer in BC during 1990-1999 and followed to 2010. Results: In both conventional and saturated parametric models, the lognormal was the best candidate among the GG family members; also, the lognormal fitted better than log-logistic distribution. By the conventional GG model, the variables "surgery", "radiotherapy", "hormone therapy", "erposneg" and interaction between "hormone therapy" and "erposneg" are significant. In the AFT model, we estimated the relative time for these variables. By the saturated GG model, similar significant variables are selected. Estimating the relative times in different percentiles of extended model illustrate the pattern in which the relative survival time change during the time. Conclusions: The advantage of using the generalized gamma distribution is that it facilitates estimating a model with improved fit over the standard Weibull or lognormal distributions. Alternatively, the generalized F family of distributions might be considered, of which the generalized gamma distribution is a member and also includes the commonly used log-logistic distribution.

Binary regression model using skewed generalized t distributions (기운 일반화 t 분포를 이용한 이진 데이터 회귀 분석)

  • Kim, Mijeong
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.5
    • /
    • pp.775-791
    • /
    • 2017
  • We frequently encounter binary data in real life. Logistic, Probit, Cauchit, Complementary log-log models are often used for binary data analysis. In order to analyze binary data, Liu (2004) proposed a Robit model, in which the inverse of cdf of the Student's t distribution is used as a link function. Kim et al. (2008) also proposed a generalized t-link model to make the binary regression model more flexible. The more flexible skewed distributions allow more flexible link functions in generalized linear models. In the sense, we propose a binary data regression model using skewed generalized t distributions introduced in Theodossiou (1998). We implement R code of the proposed models using the glm function included in R base and R sgt package. We also analyze Pima Indian data using the proposed model in R.

Parameter estimation of linear function using VUS and HUM maximization (VUS와 HUM 최적화를 이용한 선형함수의 모수추정)

  • Hong, Chong Sun;Won, Chi Hwan;Jeong, Dong Gil
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.6
    • /
    • pp.1305-1315
    • /
    • 2015
  • Consider the risk score which is a function of a linear score for the classification models. The AUC optimization method can be applied to estimate the coefficients of linear score. These estimates obtained by this AUC approach method are shown to be better than the maximum likelihood estimators using logistic models under the general situation which does not fit the logistic assumptions. In this work, the VUS and HUM approach methods are suggested by extending AUC approach method for more realistic discrimination and prediction worlds. Some simulation results are obtained with both various distributions of thresholds and three kinds of link functions such as logit, complementary log-log and modified logit functions. It is found that coefficient prediction results by using the VUS and HUM approach methods for multiple categorical classification are equivalent to or better than those by using logistic models with some link functions.

Pure additive contribution of genetic variants to a risk prediction model using propensity score matching: application to type 2 diabetes

  • Park, Chanwoo;Jiang, Nan;Park, Taesung
    • Genomics & Informatics
    • /
    • v.17 no.4
    • /
    • pp.47.1-47.12
    • /
    • 2019
  • The achievements of genome-wide association studies have suggested ways to predict diseases, such as type 2 diabetes (T2D), using single-nucleotide polymorphisms (SNPs). Most T2D risk prediction models have used SNPs in combination with demographic variables. However, it is difficult to evaluate the pure additive contribution of genetic variants to classically used demographic models. Since prediction models include some heritable traits, such as body mass index, the contribution of SNPs using unmatched case-control samples may be underestimated. In this article, we propose a method that uses propensity score matching to avoid underestimation by matching case and control samples, thereby determining the pure additive contribution of SNPs. To illustrate the proposed propensity score matching method, we used SNP data from the Korea Association Resources project and reported SNPs from the genome-wide association study catalog. We selected various SNP sets via stepwise logistic regression (SLR), least absolute shrinkage and selection operator (LASSO), and the elastic-net (EN) algorithm. Using these SNP sets, we made predictions using SLR, LASSO, and EN as logistic regression modeling techniques. The accuracy of the predictions was compared in terms of area under the receiver operating characteristic curve (AUC). The contribution of SNPs to T2D was evaluated by the difference in the AUC between models using only demographic variables and models that included the SNPs. The largest difference among our models showed that the AUC of the model using genetic variants with demographic variables could be 0.107 higher than that of the corresponding model using only demographic variables.

Exploring Factors Related to Metastasis Free Survival in Breast Cancer Patients Using Bayesian Cure Models

  • Jafari-Koshki, Tohid;Mansourian, Marjan;Mokarian, Fariborz
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.15 no.22
    • /
    • pp.9673-9678
    • /
    • 2014
  • Background: Breast cancer is a fatal disease and the most frequently diagnosed cancer in women with an increasing pattern worldwide. The burden is mostly attributed to metastatic cancers that occur in one-third of patients and the treatments are palliative. It is of great interest to determine factors affecting time from cancer diagnosis to secondary metastasis. Materials and Methods: Cure rate models assume a Poisson distribution for the number of unobservable metastatic-component cells that are completely deleted from the non-metastasis patient body but some may remain and result in metastasis. Time to metastasis is defined as a function of the number of these cells and the time for each cell to develop a detectable sign of metastasis. Covariates are introduced to the model via the rate of metastatic-component cells. We used non-mixture cure rate models with Weibull and log-logistic distributions in a Bayesian setting to assess the relationship between metastasis free survival and covariates. Results: The median of metastasis free survival was 76.9 months. Various models showed that from covariates in the study, lymph node involvement ratio and being progesterone receptor positive were significant, with an adverse and a beneficial effect on metastasis free survival, respectively. The estimated fraction of patients cured from metastasis was almost 48%. The Weibull model had a slightly better performance than log-logistic. Conclusions: Cure rate models are popular in survival studies and outperform other models under certain conditions. We explored the prognostic factors of metastatic breast cancer from a different viewpoint. In this study, metastasis sites were analyzed all together. Conducting similar studies in a larger sample of cancer patients as well as evaluating the prognostic value of covariates in metastasis to each site separately are recommended.

Quantitative Comparison of Probabilistic Multi-source Spatial Data Integration Models for Landslide Hazard Assessment

  • Park No-Wook;Chi Kwang-Hoon;Chung Chang-Jo F.;Kwon Byung-Doo
    • Proceedings of the KSRS Conference
    • /
    • 2004.10a
    • /
    • pp.622-625
    • /
    • 2004
  • This paper presents multi-source spatial data integration models based on probability theory for landslide hazard assessment. Four probabilistic models such as empirical likelihood ratio estimation, logistic regression, generalized additive and predictive discriminant models are proposed and applied. The models proposed here are theoretically based on statistical relationships between landslide occurrences and input spatial data sets. Those models especially have the advantage of direct use of continuous data without any information loss. A case study from the Gangneung area, Korea was carried out to quantitatively assess those four models and to discuss operational issues.

  • PDF