• 제목/요약/키워드: multinomial logistic regression model

검색결과 60건 처리시간 0.021초

A Bayesian Method for Narrowing the Scope of Variable Selection in Binary Response Logistic Regression

  • Kim, Hea-Jung;Lee, Ae-Kyung
    • 품질경영학회지
    • /
    • 제26권1호
    • /
    • pp.143-160
    • /
    • 1998
  • This article is concerned with the selection of subsets of predictor variables to be included in bulding the binary response logistic regression model. It is based on a Bayesian aproach, intended to propose and develop a procedure that uses probabilistic considerations for selecting promising subsets. This procedure reformulates the logistic regression setup in a hierarchical normal mixture model by introducing a set of hyperparameters that will be used to identify subset choices. It is done by use of the fact that cdf of logistic distribution is a, pp.oximately equivalent to that of $t_{(8)}$/.634 distribution. The a, pp.opriate posterior probability of each subset of predictor variables is obtained by the Gibbs sampler, which samples indirectly from the multinomial posterior distribution on the set of possible subset choices. Thus, in this procedure, the most promising subset of predictors can be identified as that with highest posterior probability. To highlight the merit of this procedure a couple of illustrative numerical examples are given.

  • PDF

심층 신경망모형을 사용한 미세먼지 PM10의 예측 (Prediction of fine dust PM10 using a deep neural network model)

  • 전성현;손영숙
    • 응용통계연구
    • /
    • 제31권2호
    • /
    • pp.265-285
    • /
    • 2018
  • 본 연구에서는 미세먼지 $PM_{10}$의 4가지 분류 등급인 '좋음, 보통, 나쁨, 매우 나쁨' 그리고 2가지 분류 등급인 '좋음 혹은 보통, 나쁨 혹은 매우 나쁨'을 예측하기 위해서 심층 신경망모형을 사용하였다. 2010년부터 2015년까지 국내 6개 대도시 지역에서 관측한 일별 미세먼지 데이터에 대하여 기존 분류기법인 신경망모형, 다항 로지스틱 회귀모형, Support Vector Machine, Random Forest을 적용했을 때에 비해서 심층 신경망모형의 정확도는 더 높아졌다.

Goodness-of-fit tests for a proportional odds model

  • Lee, Hyun Yung
    • Journal of the Korean Data and Information Science Society
    • /
    • 제24권6호
    • /
    • pp.1465-1475
    • /
    • 2013
  • The chi-square type test statistic is the most commonly used test in terms of measuring testing goodness-of-fit for multinomial logistic regression model, which has its grouped data (binomial data) and ungrouped (binary) data classified by a covariate pattern. Chi-square type statistic is not a satisfactory gauge, however, because the ungrouped Pearson chi-square statistic does not adhere well to the chi-square statistic and the ungrouped Pearson chi-square statistic is also not a satisfactory form of measurement in itself. Currently, goodness-of-fit in the ordinal setting is often assessed using the Pearson chi-square statistic and deviance tests. These tests involve creating a contingency table in which rows consist of all possible cross-classifications of the model covariates, and columns consist of the levels of the ordinal response. I examined goodness-of-fit tests for a proportional odds logistic regression model-the most commonly used regression model for an ordinal response variable. Using a simulation study, I investigated the distribution and power properties of this test and compared these with those of three other goodness-of-fit tests. The new test had lower power than the existing tests; however, it was able to detect a greater number of the different types of lack of fit considered in this study. I illustrated the ability of the tests to detect lack of fit using a study of aftercare decisions for psychiatrically hospitalized adolescents.

Comparison of Machine Learning Techniques for Cyberbullying Detection on YouTube Arabic Comments

  • Alsubait, Tahani;Alfageh, Danyah
    • International Journal of Computer Science & Network Security
    • /
    • 제21권1호
    • /
    • pp.1-5
    • /
    • 2021
  • Cyberbullying is a problem that is faced in many cultures. Due to their popularity and interactive nature, social media platforms have also been affected by cyberbullying. Social media users from Arab countries have also reported being a target of cyberbullying. Machine learning techniques have been a prominent approach used by scientists to detect and battle this phenomenon. In this paper, we compare different machine learning algorithms for their performance in cyberbullying detection based on a labeled dataset of Arabic YouTube comments. Three machine learning models are considered, namely: Multinomial Naïve Bayes (MNB), Complement Naïve Bayes (CNB), and Linear Regression (LR). In addition, we experiment with two feature extraction methods, namely: Count Vectorizer and Tfidf Vectorizer. Our results show that, using count vectroizer feature extraction, the Logistic Regression model can outperform both Multinomial and Complement Naïve Bayes models. However, when using Tfidf vectorizer feature extraction, Complement Naive Bayes model can outperform the other two models.

데이터마이닝 기법들을 통한 제주 안개 예측 방안 연구 (A Study on Fog Forecasting Method through Data Mining Techniques in Jeju)

  • 이영미;배주현;박다빈
    • 한국환경과학회지
    • /
    • 제25권4호
    • /
    • pp.603-613
    • /
    • 2016
  • Fog may have a significant impact on road conditions. In an attempt to improve fog predictability in Jeju, we conducted machine learning with various data mining techniques such as tree models, conditional inference tree, random forest, multinomial logistic regression, neural network and support vector machine. To validate machine learning models, the results from the simulation was compared with the fog data observed over Jeju(184 ASOS site) and Gosan(185 ASOS site). Predictive rates proposed by six data mining methods are all above 92% at two regions. Additionally, we validated the performance of machine learning models with WRF (weather research and forecasting) model meteorological outputs. We found that it is still not good enough for operational fog forecast. According to the model assesment by metrics from confusion matrix, it can be seen that the fog prediction using neural network is the most effective method.

데이터마이닝 기법을 기반으로 한 성공적인 Joint Venture 전략 (Successful Joint Venture Strategies Based on Data Mining)

  • 김진형;손소영
    • 대한산업공학회지
    • /
    • 제33권4호
    • /
    • pp.424-429
    • /
    • 2007
  • The purpose of this study is to propose types of joint venturesthat can increase the competitivenessof a company in the marketplace. We examine the characteristics of individual venture enterprises based on technology. We considered 16 TEA in order to categorize companies into four groups. Next, we used a multinomial logistic regression model to identify the significant characteristics of a venture company that successfully predicts group membership. Based on this information, we propose various forms of joint venture which complement each other and produce higher overall competence. Our study can provide important feedback information to academics, Policy-makers.

식품소비행태조사를 이용한 COVID-19 전후 친환경식품 구매빈도 결정요인분석 (Analysis of Determinants of Eco-Friendly Food Purchase Frequency Before and After COVID-19 Using the Consumer Behavior Survey for Food)

  • 김성태;김선웅
    • 한국식품영양학회지
    • /
    • 제36권4호
    • /
    • pp.222-235
    • /
    • 2023
  • In this research, we examined the shifts in determinants influencing the frequency of eco-friendly food purchases pre- and post-COVID-19. Our analysis utilized filtered 2019-2021 Consumption Behavior Survey data from the Korea Rural Economic Institute Food, excluding any irrational responses. Given the nature of the dependent variable, a multinomial logistic regression model was employed with demographic factors, variables pertaining to food consumption behavior, and variables concerning food consumption awareness as predictors. Following the onset of the COVID-19 pandemic, an individual's level of education was observed to positively influence the frequency of eco-friendly food purchases. In contrast, income level and fluctuations in food consumption expenditure did not appear to have a discernible impact on the purchasing frequency of such eco-friendly products. Irrespective of the advent of COVID-19, variables such as the frequency of online food purchases, the utilization of early morning delivery services, dining out frequency, and the intake of health-functional foods consistently demonstrated a positive correlation with the propensity to purchase eco-friendly foods. Overall, consumers prioritizing safety, quality, and nutrition over price, taste, and convenience in their procurement decisions for rice, vegetables, meat, and processed foods exhibit an increased inclination toward the acquisition of eco-friendly food products.

연속형의 텐서곱과 범주형의 직합을 사용한 다항 로지스틱 회귀모형 (A polychotomous regression model with tensor product splines and direct sums)

  • 심송용;강희모
    • Journal of the Korean Data and Information Science Society
    • /
    • 제25권1호
    • /
    • pp.19-26
    • /
    • 2014
  • 다항 로지스틱 회귀모형의 설명변수가 연속형과 범주형을 모두 포함할 때 범주형 설명변수는 직합을 적용하고 연속형 설명변수는 텐서곱을 적용하는 모형을 제안한다. 변수선택의 기준으로 BIC를 사용하고, 제안된 모형의 알고리즘을 구현하였다. 구현된 알고리즘을 실제 자료에 적용하여 기존의 방법과 비교하여 제안된 모형이 더 좋은 분류율을 보임을 확인하였다.

Analyzing the Impact of Lockdown on COVID-19 Pandemic in Saudi Arabia

  • Gyani, Jayadev;Haq, Mohd Anul;Ahmed, Ahsan
    • International Journal of Computer Science & Network Security
    • /
    • 제22권4호
    • /
    • pp.39-46
    • /
    • 2022
  • The spread of Omicron, a mutated version of COVID-19 across several countries is leading to the discussion of lockdown once again for curbing the spread of the new virus. In this context, this research is showing the impact of lockdown for the successful control of the COVID-19 pandemic in Saudi Arabia. The outbreak of the COVID-19 pandemic around the globe has affected Saudi Arabia with around 2,37,803 confirmed cases within the initial 4 months of transmission. Saudi Arabia has announced a 21-day lockdown from March 23, 2020, to reduce the transmission of the COVID-19 pandemic. Machine Learning-based, Multinomial logistic regression was applied to understand the relationship between daily COVID-19 confirmed cases and lockdown in the 17 most-affected cities of KSA. We used secondary published data from the Ministry of Health, KSA daily dataset of COVID-19 confirmed case counts. These 17 cities were categorized into 4 classes based on lockdown dates. A total of three scenarios such as night lockdown, full lockdown, and no lockdown have been analyzed with the total number of confirmed cases with 4 classes. 15 out of 17 cities have shown a strong correlation with a confidence interval of 95%. These findings provide evidence that the COVID-19 pandemic may be partially suppressed with lockdown measures.

한우 거세우 고기 관능평가 데이터의 로지스틱 회귀분석 (Logistic Regressions with Sensory Evaluation Data about Hanwoo Steer Beef)

  • 이혜정;김재희
    • 응용통계연구
    • /
    • 제23권5호
    • /
    • pp.857-870
    • /
    • 2010
  • 국립축산과학원에서는 2006년 부터 2008년 까지 전국 소비자들을 대상으로 한우 거세우 표본 시료에 대한 관능 평가 조사를 실시하여 데이터를 수집하였으며 본 연구에서는 한우 관능 평가 데이터에 대해 사회 인구학적 요인과 한국 소비자들의 맛 평가에 대한 연관성을 탐구하고자 한다. 소비자 거주지역, 연령, 성별, 직업, 월수입과 쇠고기 부위를 설명변수로 맛등급 평가를 반응변수로 이항 다중 로지스틱 모형과 다항 다중 로지스틱 모형을 적합하고 회귀계수별 유의성 검정과 적합도 검정을 실시한다. 단계별 변수 선택으로 최종 모형을 선택하고 반응변수 범주에 대한 오즈비를 계산하여 맛등급과 설명변수들 간의 관련성을 파악한다. 또한 맛과 관련 있는 연속형 변수를 설명변수로 포함한 경우에 대해서도 이항 다중 로지스틱 모형과 다항 다중 로지스틱 모형을 적합하고 비교한다. 그 결과 거주 지역, 연령, 월수입과 쇠고기 부위 변수들이 선택되었으며 영남지역에서 맛에 대한 오즈가 큰 편이며 수입이 많고 연령이 높을수록 맛에 대한 오즈가 작은 편이었다. 요리법으로는 탕에 대한 구이의 오즈비가 큰 편이며 쇠고기 부위별로는 우둔에 비해서 등심이 다른 부위들 보다 맛에 대한 차이가 크다고 볼 수 있다. 연속형 변수로는 연도가 맛등급에 큰 영향을 미치는 변수로 나타났다.