• Title/Summary/Keyword: 영과잉 음이항 회귀분석

Search Result 14, Processing Time 0.028 seconds

A Bayesian zero-inflated negative binomial regression model based on Pólya-Gamma latent variables with an application to pharmaceutical data (폴랴-감마 잠재변수에 기반한 베이지안 영과잉 음이항 회귀모형: 약학 자료에의 응용)

  • Seo, Gi Tae;Hwang, Beom Seuk
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.2
    • /
    • pp.311-325
    • /
    • 2022
  • For count responses, the situation of excess zeros often occurs in various research fields. Zero-inflated model is a common choice for modeling such count data. Bayesian inference for the zero-inflated model has long been recognized as a hard problem because the form of conditional posterior distribution is not in closed form. Recently, however, Pillow and Scott (2012) and Polson et al. (2013) proposed a Pólya-Gamma data-augmentation strategy for logistic and negative binomial models, facilitating Bayesian inference for the zero-inflated model. We apply Bayesian zero-inflated negative binomial regression model to longitudinal pharmaceutical data which have been previously analyzed by Min and Agresti (2005). To facilitate posterior sampling for longitudinal zero-inflated model, we use the Pólya-Gamma data-augmentation strategy.

The study on the determinants of the number of job changes (중소기업 청년인턴 이직횟수 결정요인 분석)

  • Park, Sungik;Ryu, Jangsoo;Kim, Jonghan;Cho, Jangsik
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.2
    • /
    • pp.387-397
    • /
    • 2015
  • In this paper, the determinants of the number of job changes in the SMEs (small and medium enterprises) youth-intern project is analysed, utilizing SMEs youth-intern DB and employment insurance DB. Since the number of job changes are count data which take integer values other than negative values, general linear regression analysis becomes inappropriate. Therefore, four models such as Poisson regression model, zero inflated Poisson regression model, negative binomial regression model and zero inflated negative binomial regression model are tried to fit count data. A zero inflated negative binomial regression model is selected to be the best model. Major results are the followings. First, the number of job changes is shown to be significantly smaller in the treatment group than in the control group. Second, the number of job changes turns out to be significantly smaller in the young-age group than in the old-age group. Third, it is also shown that the number of job changes of man is significantly greater than that of woman. Lastly, the number of job changes in the bigger firm is shown to be significantly less than that of the smaller firm.

Bayesian Analysis for the Zero-inflated Regression Models (영과잉 회귀모형에 대한 베이지안 분석)

  • Jang, Hak-Jin;Kang, Yun-Hee;Lee, S.;Kim, Seong-W.
    • The Korean Journal of Applied Statistics
    • /
    • v.21 no.4
    • /
    • pp.603-613
    • /
    • 2008
  • We often encounter the situation that discrete count data have a large portion of zeros. In this case, it is not appropriate to analyze the data based on standard regression models such as the poisson or negative binomial regression models. In this article, we consider Bayesian analysis for two commonly used models. They are zero-inflated poisson and negative binomial regression models. We use the Bayes factor as a model selection tool and computation is proceeded via Markov chain Monte Carlo methods. Crash count data are analyzed to support theoretical results.

Prediction of the Number of Food Poisoning Occurrences by Microbes (원인균별 식중독 발생 건수 예측)

  • Yeo, In-Kwon
    • The Korean Journal of Applied Statistics
    • /
    • v.26 no.6
    • /
    • pp.923-932
    • /
    • 2013
  • This paper proposes a method to predict the number of foodborne disease outbreaks by microbes. The weekly data of food poisoning occurrences by microbes in Korea contain many zero-valued observations and have dependency between outbreaks. In order to model both phenomena, the number of food poisonings is predicted by an autoregressive model and the probabilities of food poisoning occurrences by microbes (given the total of food poisonings) are estimated by the baseline category logit model. The predicted number of foodborne disease outbreaks by a microbe is obtained by multiplying the predicted number of foodborne disease outbreaks and the estimated probability of the food poisoning by the corresponding microbe. The mean squared error and the mean absolute value error are evaluated to compare the performances of the proposed method and the zero-inflated model.

An Analysis of Spatial Determinants of Inventor Networks in Korea (발명자 네트워크의 공간적 결정요인 분석)

  • Jeong, Jun Ho
    • Journal of the Economic Geographical Society of Korea
    • /
    • v.19 no.1
    • /
    • pp.1-17
    • /
    • 2016
  • This paper attempts to explore the spatial structure of inventor networks and their determinants among 230 shi-gun-gu regions in Korea by investigating the residence of co-inventors engaged in Korean patent applications to the Korean Intellectual Office and exploiting a zero inflated negative binomial model to accommodate an estimation to the count nature of a dependent variable and its excess of zeros. Several variables are found to affect the spatial linkage of inventor networks. Spatial links extend beyond the region if it has more own R&D-related specific assets (private R&D, patent productivity, population, education); if it is physically close to and has technological similarity with the other region. The assets of the other region plays a positive role if, in a similar way, the other region has more R&D-related specific assets.

  • PDF

Heat-Wave Data Analysis based on the Zero-Inflated Regression Models (영-과잉 회귀모형을 활용한 폭염자료분석)

  • Kim, Seong Tae;Park, Man Sik
    • Journal of the Korean Data Analysis Society
    • /
    • v.20 no.6
    • /
    • pp.2829-2840
    • /
    • 2018
  • The random variable with an arbitrary value or more is called semi-continuous variable or zero-inflated one in case that its boundary value is more frequently observed than expected. This means the boundary value is likely to be practically observed more than it should be theoretically under certain probability distribution. When the distribution considered is continuous, the variable is defined as semi-continuous and when one of discrete distribution is assumed for the variable, we regard it as zero-inflated. In this study, we introduce the two-part model, which consists of one part for modelling the binary response and the other part for modelling the variable greater than the boundary value. Especially, the zero-inflated regression models are explained by using Poisson distribution and negative binomial distribution. In real data analysis, we employ the zero-inflated regression models to estimate the number of days under extreme heat-wave circumstances during the last 10 years in South Korea. Based on the estimation results, we create prediction maps for the estimated number of days under heat-wave advisory and heat-wave warning by using the universal kriging, which is one of the spatial prediction methods.

Estimation of the Effects of Daily Walking Hours and Days on the Mental Health of Urban Residents - The Case in Seoul - (주거지역 가로환경 및 일상 걷기가 정신 건강에 미치는 영향 - 서울시 대상으로 -)

  • Koo, Bonyu;Baek, Seungjoo;Yoon, Heeyeun
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.52 no.1
    • /
    • pp.87-100
    • /
    • 2024
  • This study aimed to investigate the impact of the quality of the street environment in residential areas on the mental health of urban residents, considering the frequency of street use. Using a zero-inflated negative binomial regression model, the study analyzed the influence of walking frequency and the street environment on depressive symptoms of urban residents. The research focused on Seoul, South Korea, in 2017, with depressive symptoms as the dependent variable and street environment variables, walking variables, and individual characteristics as independent variables. Additionally, the study explores the interaction effect of street greenery and walking frequency to analyze the synergistic impacts of walking in green spaces on mental health. The findings indicate that a higher ratio of street green areas is associated with fewer depressive symptoms. Increased walking frequency is linked to a reduction in depressive symptoms or a weaker manifestation of such symptoms. The interaction effect confirms that more frequent walking in green spaces is associated with weaker depressive symptoms. Lower ratios of visual complexity are correlated with reduced depressive symptoms. This study contributes to addressing urban residents' mental health issues at the community level by emphasizing the importance of the street green environment in residential areas.

A Study on the Duration of Volunteering (자원봉사활동의 지속성에 관한 연구)

  • Song, Kee-Young;Kim, Wook-Jin
    • The Journal of the Korea Contents Association
    • /
    • v.17 no.4
    • /
    • pp.444-460
    • /
    • 2017
  • The duration of volunteering can be analyzed in terms of commitment and attachment. Previous studies have investigated the duration of volunteering predominantly from the perspective of commitment. Alternatively, this study focuses on the concept of attachment and investigates the characteristics of those who volunteer habitually over their whole life, regardless of the regularity and the intensity of the volunteer work. In so doing, the study attempts to identify factors associated with the attachment to volunteering. Data came from a sample of 8,415 participants, ages over twenty who responded to all the surveys of the Korea Welfare Panel Study, from Wave 1 to 10. Zero-inflated negative bionomial regression model was employed to analyze the total number of volunteering in the past ten years. Findings show that people with high attachment to volunteering were those with religion, less education, and a strong sense of reciprocity. Based on the findings, we provide the practical implications for the improved operation and management of volunteer organizations.

Neighborhood Environment Associated with Physical Activity among Rural Adults: Applying Zero-Inflated Negative Binominal Regression Modeling (영과잉 음이항 회귀모형을 적용한 농촌지역 성인 신체활동의 지역사회환경 요인 분석)

  • Kim, Bongjeong
    • Journal of Korean Public Health Nursing
    • /
    • v.29 no.3
    • /
    • pp.488-502
    • /
    • 2015
  • Purpose: This study was conducted to determine the neighborhood environmental factors associated with physical activity among adults living in rural communities. Methods: A cross-sectional descriptive survey was conducted with a convenience sample of 201 adults living in three Ri in Y-city, Gyeonggi-do. Data were collected from face-to-face interview by trained interviewers and were analyzed using a zero-inflated negative binominal regression model. Results: Participants reported engaged in moderate or vigorous physical activity was 76.1%; 10.5% of participants reported that they met moderate physical activity recommendations and 14.5% of participants reported that they met vigorous physical activity recommendations. Zero-inflated negative binominal regression analysis showed association of increasing days of physical activity with social cohesion (${\beta}=.130$, p=.005), social network (${\beta}=-.096$, p=.003), and safety for crime (${\beta}=-.151$, p=.036), and no days of physical activity was associated with no attainment of education and marginally associated with increasing BMI. Conclusion: Neighborhood environmental factors including social cohesion, social network, and crime for safety were significantly associated with physical activity of rural adults. Community health nurses should expand an approach for individual behavior change to incorporate rural adults' specific neighborhood environmental factors into physical activity interventions.

The Effects of Sentiment and Readability on Useful Votes for Customer Reviews with Count Type Review Usefulness Index (온라인 리뷰의 감성과 독해 용이성이 리뷰 유용성에 미치는 영향: 가산형 리뷰 유용성 정보 활용)

  • Cruz, Ruth Angelie;Lee, Hong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.43-61
    • /
    • 2016
  • Customer reviews help potential customers make purchasing decisions. However, the prevalence of reviews on websites push the customer to sift through them and change the focus from a mere search to identifying which of the available reviews are valuable and useful for the purchasing decision at hand. To identify useful reviews, websites have developed different mechanisms to give customers options when evaluating existing reviews. Websites allow users to rate the usefulness of a customer review as helpful or not. Amazon.com uses a ratio-type helpfulness, while Yelp.com uses a count-type usefulness index. This usefulness index provides helpful reviews to future potential purchasers. This study investigated the effects of sentiment and readability on useful votes for customer reviews. Similar studies on the relationship between sentiment and readability have focused on the ratio-type usefulness index utilized by websites such as Amazon.com. In this study, Yelp.com's count-type usefulness index for restaurant reviews was used to investigate the relationship between sentiment/readability and usefulness votes. Yelp.com's online customer reviews for stores in the beverage and food categories were used for the analysis. In total, 170,294 reviews containing information on a store's reputation and popularity were used. The control variables were the review length, store reputation, and popularity; the independent variables were the sentiment and readability, while the dependent variable was the number of helpful votes. The review rating is the moderating variable for the review sentiment and readability. The length is the number of characters in a review. The popularity is the number of reviews for a store, and the reputation is the general average rating of all reviews for a store. The readability of a review was calculated with the Coleman-Liau index. The sentiment is a positivity score for the review as calculated by SentiWordNet. The review rating is a preference score selected from 1 to 5 (stars) by the review author. The dependent variable (i.e., usefulness votes) used in this study is a count variable. Therefore, the Poisson regression model, which is commonly used to account for the discrete and nonnegative nature of count data, was applied in the analyses. The increase in helpful votes was assumed to follow a Poisson distribution. Because the Poisson model assumes an equal mean and variance and the data were over-dispersed, a negative binomial distribution model that allows for over-dispersion of the count variable was used for the estimation. Zero-inflated negative binomial regression was used to model count variables with excessive zeros and over-dispersed count outcome variables. With this model, the excess zeros were assumed to be generated through a separate process from the count values and therefore should be modeled as independently as possible. The results showed that positive sentiment had a negative effect on gaining useful votes for positive reviews but no significant effect on negative reviews. Poor readability had a negative effect on gaining useful votes and was not moderated by the review star ratings. These findings yield considerable managerial implications. The results are helpful for online websites when analyzing their review guidelines and identifying useful reviews for their business. Based on this study, positive reviews are not necessarily helpful; therefore, restaurants should consider which type of positive review is helpful for their business. Second, this study is beneficial for businesses and website designers in creating review mechanisms to know which type of reviews to highlight on their websites and which type of reviews can be beneficial to the business. Moreover, this study highlights the review systems employed by websites to allow their customers to post rating reviews.