• Title/Summary/Keyword: Negative Binomial Regression Model

Search Result 113, Processing Time 0.025 seconds

Forecasting of the COVID-19 pandemic situation of Korea

  • Goo, Taewan;Apio, Catherine;Heo, Gyujin;Lee, Doeun;Lee, Jong Hyeok;Lim, Jisun;Han, Kyulhee;Park, Taesung
    • Genomics & Informatics
    • /
    • v.19 no.1
    • /
    • pp.11.1-11.8
    • /
    • 2021
  • For the novel coronavirus disease 2019 (COVID-19), predictive modeling, in the literature, uses broadly susceptible exposed infected recoverd (SEIR)/SIR, agent-based, curve-fitting models. Governments and legislative bodies rely on insights from prediction models to suggest new policies and to assess the effectiveness of enforced policies. Therefore, access to accurate outbreak prediction models is essential to obtain insights into the likely spread and consequences of infectious diseases. The objective of this study is to predict the future COVID-19 situation of Korea. Here, we employed 5 models for this analysis; SEIR, local linear regression (LLR), negative binomial (NB) regression, segment Poisson, deep-learning based long short-term memory models (LSTM) and tree based gradient boosting machine (GBM). After prediction, model performance comparison was evelauated using relative mean squared errors (RMSE) for two sets of train (January 20, 2020-December 31, 2020 and January 20, 2020-January 31, 2021) and testing data (January 1, 2021-February 28, 2021 and February 1, 2021-February 28, 2021) . Except for segmented Poisson model, the other models predicted a decline in the daily confirmed cases in the country for the coming future. RMSE values' comparison showed that LLR, GBM, SEIR, NB, and LSTM respectively, performed well in the forecasting of the pandemic situation of the country. A good understanding of the epidemic dynamics would greatly enhance the control and prevention of COVID-19 and other infectious diseases. Therefore, with increasing daily confirmed cases since this year, these results could help in the pandemic response by informing decisions about planning, resource allocation, and decision concerning social distancing policies.

The Effects of Sentiment and Readability on Useful Votes for Customer Reviews with Count Type Review Usefulness Index (온라인 리뷰의 감성과 독해 용이성이 리뷰 유용성에 미치는 영향: 가산형 리뷰 유용성 정보 활용)

  • Cruz, Ruth Angelie;Lee, Hong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.43-61
    • /
    • 2016
  • Customer reviews help potential customers make purchasing decisions. However, the prevalence of reviews on websites push the customer to sift through them and change the focus from a mere search to identifying which of the available reviews are valuable and useful for the purchasing decision at hand. To identify useful reviews, websites have developed different mechanisms to give customers options when evaluating existing reviews. Websites allow users to rate the usefulness of a customer review as helpful or not. Amazon.com uses a ratio-type helpfulness, while Yelp.com uses a count-type usefulness index. This usefulness index provides helpful reviews to future potential purchasers. This study investigated the effects of sentiment and readability on useful votes for customer reviews. Similar studies on the relationship between sentiment and readability have focused on the ratio-type usefulness index utilized by websites such as Amazon.com. In this study, Yelp.com's count-type usefulness index for restaurant reviews was used to investigate the relationship between sentiment/readability and usefulness votes. Yelp.com's online customer reviews for stores in the beverage and food categories were used for the analysis. In total, 170,294 reviews containing information on a store's reputation and popularity were used. The control variables were the review length, store reputation, and popularity; the independent variables were the sentiment and readability, while the dependent variable was the number of helpful votes. The review rating is the moderating variable for the review sentiment and readability. The length is the number of characters in a review. The popularity is the number of reviews for a store, and the reputation is the general average rating of all reviews for a store. The readability of a review was calculated with the Coleman-Liau index. The sentiment is a positivity score for the review as calculated by SentiWordNet. The review rating is a preference score selected from 1 to 5 (stars) by the review author. The dependent variable (i.e., usefulness votes) used in this study is a count variable. Therefore, the Poisson regression model, which is commonly used to account for the discrete and nonnegative nature of count data, was applied in the analyses. The increase in helpful votes was assumed to follow a Poisson distribution. Because the Poisson model assumes an equal mean and variance and the data were over-dispersed, a negative binomial distribution model that allows for over-dispersion of the count variable was used for the estimation. Zero-inflated negative binomial regression was used to model count variables with excessive zeros and over-dispersed count outcome variables. With this model, the excess zeros were assumed to be generated through a separate process from the count values and therefore should be modeled as independently as possible. The results showed that positive sentiment had a negative effect on gaining useful votes for positive reviews but no significant effect on negative reviews. Poor readability had a negative effect on gaining useful votes and was not moderated by the review star ratings. These findings yield considerable managerial implications. The results are helpful for online websites when analyzing their review guidelines and identifying useful reviews for their business. Based on this study, positive reviews are not necessarily helpful; therefore, restaurants should consider which type of positive review is helpful for their business. Second, this study is beneficial for businesses and website designers in creating review mechanisms to know which type of reviews to highlight on their websites and which type of reviews can be beneficial to the business. Moreover, this study highlights the review systems employed by websites to allow their customers to post rating reviews.

Marginal Effect Analysis of Travel Behavior by Count Data Model (가산자료모형을 기초로 한 통행행태의 한계효과분석)

  • 장태연
    • Journal of Korean Society of Transportation
    • /
    • v.21 no.3
    • /
    • pp.15-22
    • /
    • 2003
  • In general, the linear regression model has been used to estimate trip generation in the travel demand forecasting procedure. However, the model suffers from several methodological limitations. First, trips as a dependent variable with non-negative integer show discrete distribution but the model assumes that the dependent variable is continuously distributed between -$\infty$ and +$\infty$. Second, the model may produce negative estimates. Third, even if estimated trips are within the valid range, the model offers only forecasted trips without discrete probability distribution of them. To overcome these limitations, a poisson model with a assumption of equidispersion has frequently been used to analyze count data such as trip frequencies. However, if the variance of data is greater than the mean. the poisson model tends to underestimate errors, resulting in unreliable estimates. Using overdispersion test, this study proved that the poisson model is not appropriate and by using Vuong test, zero inflated negative binomial model is optimal. Model reliability was checked by likelihood test and the accuracy of model by Theil inequality coefficient as well. Finally, marginal effect of the change of socio-demographic characteristics of households on trips was analyzed.

Factors Influencing the Initiation of Treatment after the Diagnosis of Korean Patients with HIV (HIV 감염인의 진단 후 치료 시작에 영향을 미치는 요인)

  • Shim, Mi-So;Kim, Gwang Suk;Park, Chang Gi
    • Research in Community and Public Health Nursing
    • /
    • v.29 no.3
    • /
    • pp.279-289
    • /
    • 2018
  • Purpose: This study has been conducted to identify factors that influence the initiation of treatment after the diagnosis of Korean patients with HIV. Methods: A cross-sectional study design was used, and 290 patients with HIV from outpatient departments of 7 hospitals participated. Self-report questionnaires included items on the days from the primary diagnosis to the initiation of treatment, and the patients' demographic and disease related characteristics. Negative binomial regression model (NBR) was utilized to determine risk factors influencing the initiation of treatment after the diagnosis of the patients with HIV. Results: The skewness of days was 6.62, and the degree of asymmetry of distribution was severe. In NBR, patients who were in their 40s and 50s, female, unmarried and living with their family, jobless, in a middle or high level of economic status, and diagnosed before 2014 showed a higher risk of delayed treatment than patients who were younger, male, married and living with family, in a low level of economic status, and diagnosed in 2014 or afterwards. Conclusion: The findings suggest the necessity of intervention to promote HIV patients' early entry into treatment based on the participants' characteristics.

Impact of Level of Physical Activity on Healthcare Utilization among Korean Adults (성인의 신체활동 정도가 의료이용에 미치는 영향)

  • Kim, Ji-Yun;Park, Seung-Mi
    • Journal of Korean Academy of Nursing
    • /
    • v.42 no.2
    • /
    • pp.199-206
    • /
    • 2012
  • Purpose: This study was done to identify the impact of physical activity on healthcare utilization among Korean adults. Methods: Drawing from the 2008 Korean National Health and Nutrition Examination Survey (NHANES IV-2), data from 6,521 adults who completed the Health Interview and Health Behavior Surveys were analyzed. Association between physical activity and healthcare utilization was tested using the $X^2$-test. Multiple logistic regression analysis was used to calculate the odds ratios of using outpatient and inpatient healthcare for different levels of physical activity after adjusting for predisposing, enabling, and need factors. A generalized linear model applying a negative binomial distribution was used to determine how the level of physical activity was related to use of outpatient and inpatient healthcare. Results: Physically active participants were 16% less likely to use outpatient healthcare (OR, 0.84; 95% CI, 0.74-0.97) and 23% less likely to use inpatient healthcare (OR, 0.77; 95% CI, 0.63-0.93) than physically inactive participants. Levels of outpatient and inpatient healthcare use decreased as levels of physical activity increased, after adjusting for relevant factors. Conclusion: An independent association between being physically active and lower healthcare utilization was ascertained among Korean adults indicating a need to develop nursing intervention programs that encourage regular physical activity.

Impacts of Pre-signals on Traffic Crashes at 4-leg Signalized Intersections (전방신호기가 교통사고에 미치는 영향 연구)

  • Kim, Byeongeun;Lee, Youngihn
    • International Journal of Highway Engineering
    • /
    • v.15 no.4
    • /
    • pp.135-146
    • /
    • 2013
  • PURPOSES : This study aimed to analyze the impact the operation of pre-signals at 4-leg signalized intersections and present primary environmental factors of roads that need to be considered in the installation of pre-signals. METHODS : Shift of proportions safety effectiveness evaluation method which assesses shifts in proportions of target collision types to determine safety effectiveness was applied to analyze traffic crash by types. Also, Empirical Bayes before/after safety effectiveness evaluation method was adapted to analyze the impact pre-signal installation. Negative binomial regression was conducted to determine SPF(safety performance function). RESULTS : Pre-signals are effective in reducing the number of head on, right angle and sideswipe collisions and both the total number of personal injury crashes and severe crashes. Also, it is deemed that each factor used as an independent variable for the SPF model has strong correlation with the total number of personal injury crashes and severe crashes, and impacts general traffic crashes as a whole. CONCLUSIONS: This study suggests the following should be considered in pre-signal installation on intersections. 1) U-turns allowed in the front and rear 2) A high number of roads that connect to the intersection 3) Many right-turn traffic flows 4) Crosswalks installed in the front and rear 5) Insufficient left-turn lanes compared to left-turn traffic flows or no left-turn-only lane.

Analysis of Elderly Drivers' Accident Models Considering Operations and Physical Characteristics (고령운전자 운전 및 신체특성을 반영한 교통사고 분석 연구)

  • Lim, Sam Jin;Park, Jun Tae;Kim, Young Il;Kim, Tae Ho
    • Journal of Korean Society of Transportation
    • /
    • v.30 no.6
    • /
    • pp.37-46
    • /
    • 2012
  • The number of traffic accidents caused by elderly drivers over the age of 65 has surged over the past ten years from 37,000 to 274,000 cases. The proportion of elderly drivers' accidents has jumped 3.1 times from 1.2% to 3.7% out of all traffic accidents, and traffic safety organizations are pursuing diverse measures to address the situation. Above all, connecting safety measures with an in-depth research on behavioral and physical characteristics of elderly drivers will prove vital. This study conducted an empirical research linking the driving characteristics and traffic accidents by elderly drivers based on the Driving Aptitude Test items and traffic accident data, which enabled the measurement of behavioral characteristics of elderly drivers. In developing the Influence Model, we applied the zero-inflated Poisson (ZIP) regression model and selected an accident prediction model based on the Bayesian Influence in regards to the ZIP regression model and the zero-inflated negative binomial (ZINB) regression model. According to the results of the AAE analysis, the ZIP regression model was more appropriate and it was found that three variables? prediction of velocity, diversion, and cognitive ability? had a relation of influence with traffic accidents caused by elderly drivers.

Estimation of the Effects of Daily Walking Hours and Days on the Mental Health of Urban Residents - The Case in Seoul - (주거지역 가로환경 및 일상 걷기가 정신 건강에 미치는 영향 - 서울시 대상으로 -)

  • Koo, Bonyu;Baek, Seungjoo;Yoon, Heeyeun
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.52 no.1
    • /
    • pp.87-100
    • /
    • 2024
  • This study aimed to investigate the impact of the quality of the street environment in residential areas on the mental health of urban residents, considering the frequency of street use. Using a zero-inflated negative binomial regression model, the study analyzed the influence of walking frequency and the street environment on depressive symptoms of urban residents. The research focused on Seoul, South Korea, in 2017, with depressive symptoms as the dependent variable and street environment variables, walking variables, and individual characteristics as independent variables. Additionally, the study explores the interaction effect of street greenery and walking frequency to analyze the synergistic impacts of walking in green spaces on mental health. The findings indicate that a higher ratio of street green areas is associated with fewer depressive symptoms. Increased walking frequency is linked to a reduction in depressive symptoms or a weaker manifestation of such symptoms. The interaction effect confirms that more frequent walking in green spaces is associated with weaker depressive symptoms. Lower ratios of visual complexity are correlated with reduced depressive symptoms. This study contributes to addressing urban residents' mental health issues at the community level by emphasizing the importance of the street green environment in residential areas.

Development of Evaluation Model for Black Spot Improvement Priorities by using Emperical Bayes Method (EB기법을 이용한 사고잦은 곳 개선사업 우선순위 판정기법 개발)

  • Jeong, Seong-Bong;Hwang, Bo-Hui;Seong, Nak-Mun;Lee, Seon-Ha
    • Journal of Korean Society of Transportation
    • /
    • v.27 no.3
    • /
    • pp.81-90
    • /
    • 2009
  • The safety management of a road network comprises four basic inter-related components:identification of sites(black spot) requiring safety investigation, diagnosis of safety problems, selection of feasible treatments for potential treatment candidates, and prioritization of treatments given limited budgets(Persaud, 2001). Identification process of selecting black spot is very important for efficient investigation of sites. In this study, the accident prediction model for EB method was developed by using accident data and geometric conditions of black spots selected from four-leg signalized intersections in In-cheon City for three years (2004-2006). In addition, by comparing the rank nomination technique using EB method to that by using accident counts, we managed to show the problems which the existing method have and the necessity for developing rational prediction model. As a result, in terms of total number of accidents, both the counts predicted by existing non-linear regression model and that by EB method have high good of fitness, but EB method, considering both the accident counts by sites and total number of accident, has better good of fitness than non-linear poison model. According to the result of the comparison of ranks nominated for treatment between two methods, the rank for treatment of almost sites does not change but SeoHae intersection and a few other intersections have significant changes in their rank. This shows that, with the technique proposed in the study, the RTM problem caused by using real accident counts can be overcome.

A Study on the Road Safety Analysis Model: Focused on National Highway Areas in Cheonbuk Province (도로 안전성 분석 모형에 관한 연구: 전라북도 국도 권역을 중심으로)

  • Lim, Joonbeom;Kim, Joon-Ki;Lee, Soobeom;Kim, Hyunjin
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.34 no.2
    • /
    • pp.583-595
    • /
    • 2014
  • Currently, Korean transportation policies are aiming for increase of safety and environment-friendly and efficient operation, by avoiding construction and expansion of roads, and upgrading road alignments and facilities. This is revealed by that there have been 22 road expansion projects (30%) and 50 road improvement projects (70%) under the 3rd Five-Year Plan for National Highways ('11~'15), while there were 53 road expansion projects (71%) and 22 road improvement projects (29%) under the 2nd Five-Year Plan for National Highways. For more effective road improvement projects, there is a need of choosing projects after an objective and scientific safety assessment of each road, and assessing safety improvement depending on projects. This study is intended to develop a model for this road safety analysis and assessment. The major objective of this study is creating a road safety analysis and assessment model appropriate for Korean society, based on the HSM (Highway Safety Manual) of the U.S. In order to build up data for model development, the sections thought to have identical geometrical structure factors in 5 lines, Cheonbuk province, were divided as homogeneous sections, and representative values of geometric structures, facilities, traffic volume, climate conditions and land usage were collected from the 1,452 sections divided. In order to build up data for model development, the sections thought to have identical geometrical structure factors in 5 lines, Cheonbuk province, were divided as homogeneous sections, and representative values of geometric structures, facilities, traffic volume, climate conditions and land usage were collected from the 1,452 sections divided. The collected data was processed correlation analysis of each road element was implemented to see which factor had a big effect on traffic accidents. On the basis of these results, then, an accident model was established as a negative binomial regression model.Using the developed model, an Crash Modification Factor (CMF) which determines accident frequency changes depending on safety performance function (SPF) predicting the number of accident occurrence through traffic volume and road section expansion, road geometric structure and traffic properties, was extracted.