• Title/Summary/Keyword: Logistic models

Search Result 804, Processing Time 0.03 seconds

An Ensemble Approach to Detect Fake News Spreaders on Twitter

  • Sarwar, Muhammad Nabeel;UlAmin, Riaz;Jabeen, Sidra
    • International Journal of Computer Science & Network Security
    • /
    • 제22권5호
    • /
    • pp.294-302
    • /
    • 2022
  • Detection of fake news is a complex and a challenging task. Generation of fake news is very hard to stop, only steps to control its circulation may help in minimizing its impacts. Humans tend to believe in misleading false information. Researcher started with social media sites to categorize in terms of real or fake news. False information misleads any individual or an organization that may cause of big failure and any financial loss. Automatic system for detection of false information circulating on social media is an emerging area of research. It is gaining attention of both industry and academia since US presidential elections 2016. Fake news has negative and severe effects on individuals and organizations elongating its hostile effects on the society. Prediction of fake news in timely manner is important. This research focuses on detection of fake news spreaders. In this context, overall, 6 models are developed during this research, trained and tested with dataset of PAN 2020. Four approaches N-gram based; user statistics-based models are trained with different values of hyper parameters. Extensive grid search with cross validation is applied in each machine learning model. In N-gram based models, out of numerous machine learning models this research focused on better results yielding algorithms, assessed by deep reading of state-of-the-art related work in the field. For better accuracy, author aimed at developing models using Random Forest, Logistic Regression, SVM, and XGBoost. All four machine learning algorithms were trained with cross validated grid search hyper parameters. Advantages of this research over previous work is user statistics-based model and then ensemble learning model. Which were designed in a way to help classifying Twitter users as fake news spreader or not with highest reliability. User statistical model used 17 features, on the basis of which it categorized a Twitter user as malicious. New dataset based on predictions of machine learning models was constructed. And then Three techniques of simple mean, logistic regression and random forest in combination with ensemble model is applied. Logistic regression combined in ensemble model gave best training and testing results, achieving an accuracy of 72%.

Applications of proportional odds ordinal logistic regression models and continuation ratio models in examining the association of physical inactivity with erectile dysfunction among type 2 diabetic patients

  • Mathew, Anil C.;Siby, Elbin;Tom, Amal;Kumar R, Senthil
    • 운동영양학회지
    • /
    • 제25권1호
    • /
    • pp.30-34
    • /
    • 2021
  • [Purpose] Many studies have observed a high prevalence of erectile dysfunction among individuals performing physical activity in less leisure-time. However, this relationship in patients with type 2 diabetic patients is not well studied. In exposure outcome studies with ordinal outcome variables, investigators often try to make the outcome variable dichotomous and lose information by collapsing categories. Several statistical models have been developed to make full use of all information in ordinal response data, but they have not been widely used in public health research. In this paper, we discuss the application of two statistical models to determine the association of physical inactivity with erectile dysfunction among patients with type 2 diabetes. [Methods] A total of 204 married men aged 20-60 years with a diagnosis of type 2 diabetes at the outpatient unit of the Department of Endocrinology at PSG hospitals during the months of May and June 2019 were studied. We examined the association between physical inactivity and erectile dysfunction using proportional odds ordinal logistic regression models and continuation ratio models. [Results] The proportional odds model revealed that patients with diabetes who perform leisure time physical activity for over 40 minutes per day have reduced odds of erectile dysfunction (odds ratio=0.38) across the severity categories of erectile dysfunction after adjusting for age and duration of diabetes. [Conclusion] The present study suggests that physical inactivity has a negative impact on erectile function. We observed that the simple logistic regression model had only 75% efficiency compared to the proportional odds model used here; hence, more valid estimates were obtained here.

Logistic regression model for major separation rate

  • 최재성
    • Journal of the Korean Data and Information Science Society
    • /
    • 제13권2호
    • /
    • pp.129-138
    • /
    • 2002
  • This paper deals with logistic regression models for analysing separation rates from majors. The model building procedure shows how to incoporate the effects of some factors causing from three-way nested sampling scheme and discusses what type of characteristics as independent variables directly affecting the rates should be considered.

  • PDF

Logistic Regression for Retrospective Studies

  • Shin, Mi-Young
    • 품질경영학회지
    • /
    • 제22권4호
    • /
    • pp.111-119
    • /
    • 1994
  • We consider logistic models based on retrospective, case-control data with stratified samples and study the Weighted Exogeneous Sampling Maximum Likelihood (WESMU) We develop a consistent estimator of the asymptotic covariance matrix of the WESML estimator.

  • PDF

Neural Networks and Logistic Models for Classification: A Case Study

  • Hwang, Chang-Ha
    • Journal of the Korean Data and Information Science Society
    • /
    • 제7권1호
    • /
    • pp.13-19
    • /
    • 1996
  • In this paper, we study and compare two types of methods for classification when both continuous and categorical variables are used to describe each individual. One is neural network(NN) method using backpropagation learning(BPL). The other is logistic model(LM) method. Both the NN and LM are based on projections of the data in directions determined from interconnection weights.

  • PDF

Receiver Operating Characteristic (ROC) Curves Using Neural Network in Classification

  • Lee, Jea-Young;Lee, Yong-Won
    • Journal of the Korean Data and Information Science Society
    • /
    • 제15권4호
    • /
    • pp.911-920
    • /
    • 2004
  • We try receiver operating characteristic(ROC) curves by neural networks of logistic function. The models are shown to arise from model classification for normal (diseased) and abnormal (nondiseased) groups in medical research. A few goodness-of-fit test statistics using normality curves are discussed and the performances using neural networks of logistic function are conducted.

  • PDF

Multicollinarity in Logistic Regression

  • Jong-Han lee;Myung-Hoe Huh
    • Communications for Statistical Applications and Methods
    • /
    • 제2권2호
    • /
    • pp.303-309
    • /
    • 1995
  • Many measures to detect multicollinearity in linear regression have been proposed in statistics and numerical analysis literature. Among them, condition number and variance inflation factor(VIF) are most popular. In this study, we give new interpretations of condition number and VIF in linear regression, using geometry on the explanatory space. In the same line, we derive natural measures of condition number and VIF for logistic regression. These computer intensive measures can be easily extended to evaluate multicollinearity in generalized linear models.

  • PDF

Binary Forecast of Heavy Snow Using Statistical Models

  • Sohn, Keon-Tae
    • Communications for Statistical Applications and Methods
    • /
    • 제13권2호
    • /
    • pp.369-378
    • /
    • 2006
  • This Study focuses on the binary forecast of occurrence of heavy snow in Honam area based on the MOS(model output statistic) method. For our study daily amount of snow cover at 17 stations during the cold season (November to March) in 2001 to 2005 and Corresponding 45 RDAPS outputs are used. Logistic regression model and neural networks are applied to predict the probability of occurrence of Heavy snow. Based on the distribution of estimated probabilities, optimal thresholds are determined via true shill score. According to the results of comparison the logistic regression model is recommended.

Predicting Suicidal Ideation in College Students with Mental Health Screening Questionnaires

  • Shim, Geumsook;Jeong, Bumseok
    • Psychiatry investigation
    • /
    • 제15권11호
    • /
    • pp.1037-1045
    • /
    • 2018
  • Objective The present study aimed to identify risk factors for future SI and to predict individual-level risk for future or persistent SI among college students. Methods Mental health check-up data collected over 3 years were retrospectively analyzed. Students were categorized as suicidal ideators and non-ideators at baseline. Logistic regression analyses were performed separately for each group, and the predicted probability for each student was calculated. Results Students likely to exhibit future SI had higher levels of mental health problems, including depression and anxiety, and significant risk factors for future SI included depression, current SI, social phobia, alcohol problems, being female, low self-esteem, and number of close relationships and concerns. Logistic regression models that included current suicide ideators revealed acceptable area under the curve (AUC) values (0.7-0.8) in both the receiver operating characteristic (ROC) and precision recall (PR) curves for predicting future SI. Predictive models with current suicide non-ideators revealed an acceptable level of AUCs only for ROC curves. Conclusion Several factors such as low self-esteem and a focus on short-term rather than long-term outcomes may enhance the prediction of future SI. Because a certain range of SI clearly necessitates clinical attention, further studies differentiating significant from other types of SI are necessary.

Optimum failure-censored step-stress partially accelerated life test for the truncated logistic life distribution

  • Srivastava, P.W.;Mittal, N.
    • International Journal of Reliability and Applications
    • /
    • 제13권1호
    • /
    • pp.19-35
    • /
    • 2012
  • This paper presents an optimum design of step-stress partially accelerated life test (PALT) plan which allows the test condition to be changed from use to accelerated condition on the occurrence of fixed number of failures. Various life distribution models such as exponential, Weibull, log-logistic, Burr type-Xii, etc have been used in the literature to analyze the PALT data. The need of different life distribution models is necessitated as in the presence of a limited source of data as typically occurs with modern devices having high reliability, the use of correct life distribution model helps in preventing the choice of unnecessary and expensive planned replacements. Truncated distributions arise when sample selection is not possible in some sub-region of sample space. In this paper it is assumed that the lifetimes of the items follow Truncated Logistic distribution truncated at point zero since time to failure of an item cannot be negative. Optimum step-stress PALT plan that finds the optimal proportion of units failed at normal use condition is determined by using the D-optimality criterion. The method developed has been explained using a numerical example. Sensitivity analysis and comparative study have also been carried out.

  • PDF