• Title/Summary/Keyword: stepwise logistic regression model

Search Result 33, Processing Time 0.028 seconds

Variable Selection for Logistic Regression Model Using Adjusted Coefficients of Determination (수정 결정계수를 사용한 로지스틱 회귀모형에서의 변수선택법)

  • Hong C. S.;Ham J. H.;Kim H. I.
    • The Korean Journal of Applied Statistics
    • /
    • v.18 no.2
    • /
    • pp.435-443
    • /
    • 2005
  • Coefficients of determination in logistic regression analysis are defined as various statistics, and their values are relatively smaller than those for linear regression model. These coefficients of determination are not generally used to evaluate and diagnose logistic regression model. Liao and McGee (2003) proposed two adjusted coefficients of determination which are robust at the addition of inappropriate predictors and the variation of sample size. In this work, these adjusted coefficients of determination are applied to variable selection method for logistic regression model and compared with results of other methods such as the forward selection, backward elimination, stepwise selection, and AIC statistic.

Study on Association of DSOM Items for Uterine Myoma in Oriental Medicine -Control Group: Outpatient and Clinical Trials Data - (자궁근종 여부에 대한 DSOM 항목의 연관성분석 - 대조군 : 한방부인과 외래환자와 임상시험 피시험자 -)

  • Kim, Jong-Won;Kim, Kyu-Kon;Lee, In-Sun
    • The Journal of Korean Medicine
    • /
    • v.28 no.2 s.70
    • /
    • pp.22-33
    • /
    • 2007
  • Uterine myoma is a benign tumor of smooth muscle in the uterine wall. Recently, in Oriental medicine, concerns about uterine myoma patients have increased. We analyzed the medical records for 944 patients, including 257 uterine myoma patients, who visited Dongeui University Oriental Medical Center from May 2001 to June 2006. We investigated the DSOM (Diagnosis System of Oriental Medicine) symptom scores which effect uterine myoma patients using stepwise logistic regression model. Logistic regression analysis indicated as follows: In the control group composed of 558 outpatients, 18 items of DSOM were associated with myoma, 9 positively and 9 negatively, and the results showed that the correct rate was equal to 81.1%, sensitivity 72.8%, and specificity 84.9%. In 129 clinical trials data, 33 items of DSOM were associated with myoma, 18 positively and 15 negatively, and the results showed that the correct rate was equal to 85.8%, sensitivity 84.8%, and specificity 87.6%. In 687 outpatient and clinical trials data, 18 items of DSOM were associated with myoma, 10 positively and 8 negatively, and the results showed that the correct rate was equal to 82.8%, sensitivity 70.8%, and specificity 87.3%.

  • PDF

Use of GIS to Develop a Multivariate Habitat Model for the Leopard Cat (Prionailurus bengalensis) in Mountainous Region of Korea

  • Rho, Paik-Ho
    • Journal of Ecology and Environment
    • /
    • v.32 no.4
    • /
    • pp.229-236
    • /
    • 2009
  • A habitat model was developed to delineate potential habitat of the leopard cat (Prionailurus bengalensis) in a mountainous region of Kangwon Province, Korea. Between 1997 and 2005, 224 leopard cat presence sites were recorded in the province in the Nationwide Survey on Natural Environments. Fifty percent of the sites were used to develop a habitat model, and the remaining sites were used to test the model. Fourteen environmental variables related to topographic features, water resources, vegetation and human disturbance were quantified for 112 of the leopard cat presence sites and an equal number of randomly selected sites. Statistical analyses (e.g., t-tests, and Pearson correlation analysis) showed that elevation, ridges, plains, % water cover, distance to water source, vegetated area, deciduous forest, coniferous forest, and distance to paved road differed significantly (P < 0.01) between presence and random sites. Stepwise logistic regression was used to develop a habitat model. Landform type (e.g., ridges vs. plains) is the major topographic factor affecting leopard cat presence. The species also appears to prefer deciduous forests and areas far from paved roads. The habitat map derived from the model correctly classified 93.75% of data from an independent sample of leopard cat presence sites, and the map at a regional scale showed that the cat's habitats are highly fragmented. Protection and restoration of connectivity of critical habitats should be implemented to preserve the leopard cat in mountainous regions of Korea.

Logistic Regressions with Sensory Evaluation Data about Hanwoo Steer Beef (한우 거세우 고기 관능평가 데이터의 로지스틱 회귀분석)

  • Lee, Hye-Jung;Kim, Jae-Hee
    • The Korean Journal of Applied Statistics
    • /
    • v.23 no.5
    • /
    • pp.857-870
    • /
    • 2010
  • This study was conducted to investigate the relationship between the socio-demographic factors and the Korean consumers palatability evaluation grades with Hanwoo sensory evaluation data from 2006 to 2008 by National Institute of Animal Science. The dichotomy logistic regression model and the multinomial logistic regression model are fitted with the independent variables such as the consumer living location, age, gender occupation, monthly income, beef cut and the the palatability grade as the categorical dependent variable and tenderness, 리avor and juiciness as the continuous dependent variable. Stepwise variable selection procedure is incorporated to find the final model and odds ratios are calculated to nd the associations between categories.

Predicting Early Retirees Using Personality Data (인성 데이터를 활용한 조기 퇴사자 예측)

  • Kim, Young Park;Kim, Hyoung Joong
    • Journal of Digital Contents Society
    • /
    • v.19 no.1
    • /
    • pp.141-147
    • /
    • 2018
  • This study analyzed the early retired employees who stayed in company no longer than 3 years based on a certain company's personality evaluation result data. The predicted model was analyzed by dividing into two categories; the manufacture group and the R&D group. Independent variables were selected according to the stepwise method. A logistic regression model was selected as a prediction model among various supervised learning methods, and trained through cross-validation to prevent over-fitting or under-fitting. The accuracy of the two groups were confirmed by the confusion matrix. The most influential factor for early retirement in the manufacture group was revealed as "immersion," and for the R&D group appeared as "antisocial." In the past, people concentrated on collecting data by questionnaire and identifying factors that are highly related to the retirement, but this study suggests a sustainable early retirement prediction model in the future by analyzing the tangible outcome of the recruitment process.

Development of model for prediction of land sliding at steep slopes (급경사지 붕괴 예측을 위한 모형 개발)

  • Park, Ki-Byung;Joo, Yong-Sung;Park, Dug-Keun
    • Journal of the Korean Data and Information Science Society
    • /
    • v.22 no.4
    • /
    • pp.691-699
    • /
    • 2011
  • Land sliding is one of well-known nature disaster. As a part of effort to reduce damage from land sliding, many researchers worked on increasing prediction ability. However, because previous studies are conducted mostly by non-statisticians, previously proposed models were hardly statistically justifiable. In this paper, we predicted the probability of land sliding using the logistic regression model. Since most explanatory variables under consideration were correlated, we proposed the final model after backward elimination process.

Categorical data analysis of sensory evaluation data with Hanwoo bull beef (한우 수소 고기 관능평가 데이터에 대한 범주형 자료 분석)

  • Lee, Hye-Jung;Cho, Soo-Hyun;Kim, Jae-Hee
    • Journal of the Korean Data and Information Science Society
    • /
    • v.20 no.5
    • /
    • pp.819-827
    • /
    • 2009
  • This study was conducted to investigate the relationship between the sociodemographic factors and the Korean consumers palatability evaluation grades with Hanwoo sensory evaluation data. The dichotomy logistic regression model and the multinomial logistic regression model are fitted with the independent variables such as the consumer living location, age, gender, occupation, monthly income, and beef cut and the the palatability grade as the dependent variable. Stepwise variable selection procedure is incorporated to find the final model and odds ratios are calculated to find the associations between categories.

  • PDF

Validation of Three Breast Cancer Nomograms and a New Formula for Predicting Non-sentinel Lymph Node Status

  • Derici, Serhan;Sevinc, Ali;Harmancioglu, Omer;Saydam, Serdar;Kocdor, Mehmet;Aksoy, Suleyman;Egeli, Tufan;Canda, Tulay;Ellidokuz, Hulya;Derici, Solen
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.13 no.12
    • /
    • pp.6181-6185
    • /
    • 2012
  • Background: The aim of the study was to evaluate the available breast nomograms (MSKCC, Stanford, Tenon) to predict non-sentinel lymph node metastasis (NSLNM) and to determine variables for NSLNM in SLN positive breast cancer patients in our population. Materials and Methods: We retrospectively reviewed 170 patients who underwent completion axillary lymph node dissection between Jul 2008 and Aug 2010 in our hospital. We validated three nomograms (MSKCC, Stanford, Tenon). The likelihood of having positive NSLNM based on various factors was evaluated by use of univariate analysis. Stepwise multivariate analysis was applied to estimate a predictive model for NSLNM. Four factors were found to contribute significantly to the logistic regression model, allowing design of a new formula to predict non-sentinel lymph node metastasis. The AUCs of the ROCs were used to describe the performance of the diagnostic value of MSKCC, Stanford, Tenon nomograms and our new nomogram. Results: After stepwise multiple logistic regression analysis, multifocality, proportion of positive SLN to total SLN, LVI, SLN extracapsular extention were found to be statistically significant. AUC results were MSKCC: 0.713/Tenon: 0.671/Stanford: 0.534/DEU: 0.814. Conclusions: The MSKCC nomogram proved to be a good discriminator of NSLN metastasis in SLN positive BC patients for our population. Stanford and Tenon nomograms were not as predictive of NSLN metastasis. Our newly created formula was the best prediction tool for discriminate of NSLN metastasis in SLN positive BC patients for our population. We recommend that nomograms be validated before use in specific populations, and more than one validated nomogram may be used together while consulting patients.

Pure additive contribution of genetic variants to a risk prediction model using propensity score matching: application to type 2 diabetes

  • Park, Chanwoo;Jiang, Nan;Park, Taesung
    • Genomics & Informatics
    • /
    • v.17 no.4
    • /
    • pp.47.1-47.12
    • /
    • 2019
  • The achievements of genome-wide association studies have suggested ways to predict diseases, such as type 2 diabetes (T2D), using single-nucleotide polymorphisms (SNPs). Most T2D risk prediction models have used SNPs in combination with demographic variables. However, it is difficult to evaluate the pure additive contribution of genetic variants to classically used demographic models. Since prediction models include some heritable traits, such as body mass index, the contribution of SNPs using unmatched case-control samples may be underestimated. In this article, we propose a method that uses propensity score matching to avoid underestimation by matching case and control samples, thereby determining the pure additive contribution of SNPs. To illustrate the proposed propensity score matching method, we used SNP data from the Korea Association Resources project and reported SNPs from the genome-wide association study catalog. We selected various SNP sets via stepwise logistic regression (SLR), least absolute shrinkage and selection operator (LASSO), and the elastic-net (EN) algorithm. Using these SNP sets, we made predictions using SLR, LASSO, and EN as logistic regression modeling techniques. The accuracy of the predictions was compared in terms of area under the receiver operating characteristic curve (AUC). The contribution of SNPs to T2D was evaluated by the difference in the AUC between models using only demographic variables and models that included the SNPs. The largest difference among our models showed that the AUC of the model using genetic variants with demographic variables could be 0.107 higher than that of the corresponding model using only demographic variables.

A Survival Prediction Model of Rats in Uncontrolled Acute Hemorrhagic Shock Using the Random Forest Classifier (랜덤 포리스트를 이용한 비제어 급성 출혈성 쇼크의 흰쥐에서의 생존 예측)

  • Choi, J.Y.;Kim, S.K.;Koo, J.M.;Kim, D.W.
    • Journal of Biomedical Engineering Research
    • /
    • v.33 no.3
    • /
    • pp.148-154
    • /
    • 2012
  • Hemorrhagic shock is a primary cause of deaths resulting from injury in the world. Although many studies have tried to diagnose accurately hemorrhagic shock in the early stage, such attempts were not successful due to compensatory mechanisms of humans. The objective of this study was to construct a survival prediction model of rats in acute hemorrhagic shock using a random forest (RF) model. Heart rate (HR), mean arterial pressure (MAP), respiration rate (RR), lactate concentration (LC), and peripheral perfusion (PP) measured in rats were used as input variables for the RF model and its performance was compared with that of a logistic regression (LR) model. Before constructing the models, we performed 5-fold cross validation for RF variable selection, and forward stepwise variable selection for the LR model to examine which variables were important for the models. For the LR model, sensitivity, specificity, accuracy, and area under the receiver operating characteristic curve (ROC-AUC) were 0.83, 0.95, 0.88, and 0.96, respectively. For the RF models, sensitivity, specificity, accuracy, and AUC were 0.97, 0.95, 0.96, and 0.99, respectively. In conclusion, the RF model was superior to the LR model for survival prediction in the rat model.