• Title/Summary/Keyword: binomial data

Search Result 342, Processing Time 0.027 seconds

The Effects of Sentiment and Readability on Useful Votes for Customer Reviews with Count Type Review Usefulness Index (온라인 리뷰의 감성과 독해 용이성이 리뷰 유용성에 미치는 영향: 가산형 리뷰 유용성 정보 활용)

  • Cruz, Ruth Angelie;Lee, Hong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.43-61
    • /
    • 2016
  • Customer reviews help potential customers make purchasing decisions. However, the prevalence of reviews on websites push the customer to sift through them and change the focus from a mere search to identifying which of the available reviews are valuable and useful for the purchasing decision at hand. To identify useful reviews, websites have developed different mechanisms to give customers options when evaluating existing reviews. Websites allow users to rate the usefulness of a customer review as helpful or not. Amazon.com uses a ratio-type helpfulness, while Yelp.com uses a count-type usefulness index. This usefulness index provides helpful reviews to future potential purchasers. This study investigated the effects of sentiment and readability on useful votes for customer reviews. Similar studies on the relationship between sentiment and readability have focused on the ratio-type usefulness index utilized by websites such as Amazon.com. In this study, Yelp.com's count-type usefulness index for restaurant reviews was used to investigate the relationship between sentiment/readability and usefulness votes. Yelp.com's online customer reviews for stores in the beverage and food categories were used for the analysis. In total, 170,294 reviews containing information on a store's reputation and popularity were used. The control variables were the review length, store reputation, and popularity; the independent variables were the sentiment and readability, while the dependent variable was the number of helpful votes. The review rating is the moderating variable for the review sentiment and readability. The length is the number of characters in a review. The popularity is the number of reviews for a store, and the reputation is the general average rating of all reviews for a store. The readability of a review was calculated with the Coleman-Liau index. The sentiment is a positivity score for the review as calculated by SentiWordNet. The review rating is a preference score selected from 1 to 5 (stars) by the review author. The dependent variable (i.e., usefulness votes) used in this study is a count variable. Therefore, the Poisson regression model, which is commonly used to account for the discrete and nonnegative nature of count data, was applied in the analyses. The increase in helpful votes was assumed to follow a Poisson distribution. Because the Poisson model assumes an equal mean and variance and the data were over-dispersed, a negative binomial distribution model that allows for over-dispersion of the count variable was used for the estimation. Zero-inflated negative binomial regression was used to model count variables with excessive zeros and over-dispersed count outcome variables. With this model, the excess zeros were assumed to be generated through a separate process from the count values and therefore should be modeled as independently as possible. The results showed that positive sentiment had a negative effect on gaining useful votes for positive reviews but no significant effect on negative reviews. Poor readability had a negative effect on gaining useful votes and was not moderated by the review star ratings. These findings yield considerable managerial implications. The results are helpful for online websites when analyzing their review guidelines and identifying useful reviews for their business. Based on this study, positive reviews are not necessarily helpful; therefore, restaurants should consider which type of positive review is helpful for their business. Second, this study is beneficial for businesses and website designers in creating review mechanisms to know which type of reviews to highlight on their websites and which type of reviews can be beneficial to the business. Moreover, this study highlights the review systems employed by websites to allow their customers to post rating reviews.

Marginal Effect Analysis of Travel Behavior by Count Data Model (가산자료모형을 기초로 한 통행행태의 한계효과분석)

  • 장태연
    • Journal of Korean Society of Transportation
    • /
    • v.21 no.3
    • /
    • pp.15-22
    • /
    • 2003
  • In general, the linear regression model has been used to estimate trip generation in the travel demand forecasting procedure. However, the model suffers from several methodological limitations. First, trips as a dependent variable with non-negative integer show discrete distribution but the model assumes that the dependent variable is continuously distributed between -$\infty$ and +$\infty$. Second, the model may produce negative estimates. Third, even if estimated trips are within the valid range, the model offers only forecasted trips without discrete probability distribution of them. To overcome these limitations, a poisson model with a assumption of equidispersion has frequently been used to analyze count data such as trip frequencies. However, if the variance of data is greater than the mean. the poisson model tends to underestimate errors, resulting in unreliable estimates. Using overdispersion test, this study proved that the poisson model is not appropriate and by using Vuong test, zero inflated negative binomial model is optimal. Model reliability was checked by likelihood test and the accuracy of model by Theil inequality coefficient as well. Finally, marginal effect of the change of socio-demographic characteristics of households on trips was analyzed.

Development of Ingrowth Estimation Equations for Pinus densiflora in Korea Derived from National Forest Inventory Data (국가산림자원조사 자료를 이용한 소나무의 진계생장 추정식 개발)

  • Moon, Ga Hyun;Yim, Jong Su;Shin, Man Yong
    • Journal of Korean Society of Forest Science
    • /
    • v.107 no.4
    • /
    • pp.402-411
    • /
    • 2018
  • This study was conducted to develop ingrowth estimation equations on Pinus densiflora found in Gangwon Province and in the center of Korean Peninsula, based on the National Forest Inventory (NFI)'s permanent sampling plot data. For this study, identical sampling plots in $5^{th}$ and $6^{th}$ NFI data were collected in order to identify ingrowth amounts for the last 5 years. Following two-stage approaches in developing the ingrowth estimation equations, the logistic regression model was used in the first stage to estimate the ingrowth probability. In the second stage, regression analysis on sampling plots with ingrowth occurrence was used to estimate the ingrowth amount. A candidate model was finally selected as an optimal model after a verification based on three evaluation statistics which include mean difference (MD), standard deviation of difference (SDD) and standard error of difference (SED). In results, a logistic regression model based on the number of sampling plot which did not result in ingrowth (model VI), was selected for an ingrowth probability estimation equation and exponential function including the species composition (SC) variable was optimal for an ingrowth estimation equation (model VII). The ingrowth estimation equations developed in this study also evaluated the estimation ability in various forest stand conditions, and no particular issue in fitness or applicability was observed.

Global Big Data Analysis Exploring the Determinants of Application Ratings: Evidence from the Google Play Store

  • Seo, Min-Kyo;Yang, Oh-Suk;Yang, Yoon-Ho
    • Journal of Korea Trade
    • /
    • v.24 no.7
    • /
    • pp.1-28
    • /
    • 2020
  • Purpose - This paper empirically investigates the predictors and main determinants of consumers' ratings of mobile applications in the Google Play Store. Using a linear and nonlinear model comparison to identify the function of users' review, in determining application rating across countries, this study estimates the direct effects of users' reviews on the application rating. In addition, extending our modelling into a sentimental analysis, this paper also aims to explore the effects of review polarity and subjectivity on the application rating, followed by an examination of the moderating effect of user reviews on the polarity-rating and subjectivity-rating relationships. Design/methodology - Our empirical model considers nonlinear association as well as linear causality between features and targets. This study employs competing theoretical frameworks - multiple regression, decision-tree and neural network models - to identify the predictors and main determinants of app ratings, using data from the Google Play Store. Using a cross-validation method, our analysis investigates the direct and moderating effects of predictors and main determinants of application ratings in a global app market. Findings - The main findings of this study can be summarized as follows: the number of user's review is positively associated with the ratings of a given app and it positively moderates the polarity-rating relationship. Applying the review polarity measured by a sentimental analysis to the modelling, it was found that the polarity is not significantly associated with the rating. This result best applies to the function of both positive and negative reviews in playing a word-of-mouth role, as well as serving as a channel for communication, leading to product innovation. Originality/value - Applying a proxy measured by binomial figures, previous studies have predominantly focused on positive and negative sentiment in examining the determinants of app ratings, assuming that they are significantly associated. Given the constraints to measurement of sentiment in current research, this paper employs sentimental analysis to measure the real integer for users' polarity and subjectivity. This paper also seeks to compare the suitability of three distinct models - linear regression, decision-tree and neural network models. Although a comparison between methodologies has long been considered important to the empirical approach, it has hitherto been underexplored in studies on the app market.

Developing a Traffic Accident Prediction Model for Freeways (고속도로 본선에서의 교통사고 예측모형 개발)

  • Mun, Sung-Ra;Lee, Young-Ihn;Lee, Soo-Beom
    • Journal of Korean Society of Transportation
    • /
    • v.30 no.2
    • /
    • pp.101-116
    • /
    • 2012
  • Accident prediction models have been utilized to predict accident possibilities in existing or projected freeways and to evaluate programs or policies for improving safety. In this study, a traffic accident prediction model for freeways was developed for the above purposes. When selecting variables for the model, the highest priority was on the ease of both collecting data and applying them into the model. The dependent variable was set as the number of total accidents and the number of accidents including casualties in the unit of IC(or JCT). As a result, two models were developed; the overall accident model and the casualty-related accident model. The error structure adjusted to each model was the negative binomial distribution and the Poisson distribution, respectively. Among the two models, a more appropriate model was selected by statistical estimation. Major nine national freeways were selected and five-year dada of 2003~2007 were utilized. Explanatory variables should take on either a predictable value such as traffic volumes or a fixed value with respect to geometric conditions. As a result of the Maximum Likelihood estimation, significant variables of the overall accident model were found to be the link length between ICs(or JCTs), the daily volumes(AADT), and the ratio of bus volume to the number of curved segments between ICs(or JCTs). For the casualty-related accident model, the link length between ICs(or JCTs), the daily volumes(AADT), and the ratio of bus volumes had a significant impact on the accident. The likelihood ratio test was conducted to verify the spatial and temporal transferability for estimated parameters of each model. It was found that the overall accident model could be transferred only to the road with four or more than six lanes. On the other hand, the casualty-related accident model was transferrable to every road and every time period. In conclusion, the model developed in this study was able to be extended to various applications to establish future plans and evaluate policies.

High-Risk Area for Human Infection with Avian Influenza Based on Novel Risk Assessment Matrix (위험 매트릭스(Risk Matrix)를 활용한 조류인플루엔자 인체감염증 위험지역 평가)

  • Sung-dae Park;Dae-sung Yoo
    • Korean Journal of Poultry Science
    • /
    • v.50 no.1
    • /
    • pp.41-50
    • /
    • 2023
  • Over the last decade, avian influenza (AI) has been considered an emerging disease that would become the next pandemic, particularly in countries like South Korea, with continuous animal outbreaks. In this situation, risk assessment is highly needed to prevent and prepare for human infection with AI. Thus, we developed the risk assessment matrix for a high-risk area of human infection with AI in South Korea based on the notion that risk is the multiplication of hazards with vulnerability. This matrix consisted of highly pathogenic avian influenza (HPAI) in poultry farms and the number of poultry-associated production facilities assumed as hazards of avian influenza and vulnerability, respectively. The average number of HPAI in poultry farms at the 229-municipal level as the hazard axis of the matrix was predicted using a negative binomial regression with nationwide outbreaks data from 2003 to 2018. The two components of the matrix were classified into five groups using the K-means clustering algorithm and multiplied, consequently producing the area-specific risk level of human infection. As a result, Naju-si, Jeongeup-si, and Namwon-si were categorized as high-risk areas for human infection with AI. These findings would contribute to designing the policies for human infection to minimize socio-economic damages.

An Analysis of Factors Influencing on Temple Foods (사찰음식에 대한 수요영향요인 분석 - 템플스테이 참가자를 대상으로 -)

  • Kim, Yong-Moon;Park, Ki-Oh
    • Culinary science and hospitality research
    • /
    • v.22 no.3
    • /
    • pp.240-253
    • /
    • 2016
  • The purpose of this study is to predict factors influencing participant demand for the temple stays and to help find alternatives for temple stay marketing strategies. Specifically, the study sought to examine input variables on the visit frequency of temple visitors who partook in temple food. Research subjects were temple stay participants with experience with temple food. Through convenience sampling method, 300 self-administered questionnaires were distributed to participants at 4 temple stays in Seoul. Of the 278 questionnaires collected, 232 (83%) were used for research analysis. Given that the requirement that proper model for analysing the collected data be applied, the Truncated Negative Binomial(TNB) Poisson model, which is useful for analysing count data that are truncated at '0' and overcrowded with a certain value, was selected fort his study. Study results found that, for temple stay food revitalization, the most crucial item for temple food proponents to recognize is natural food ingredients. The degree of affection was higher among respondents over 40 years of groups and with incomes over 40 million won or more than others. In addition, unmarried and male were higher than married and female, and the Christian population in the temple food demand higher impact than Shamanism community. This match should be a priority to establish an in-depth public relations policy of targeted marketing of consumers according to various demographic characteristics. Active and aggressive efforts to expand food inspection are required to promote the healthy image of the temple food to the fragmentation of consumer marketing hierarchy.

Prevalence and risk factors of subclinical bovine mastitis in some dairy farms of Sylhet district of Bangladesh

  • Kahir, Md. Abdul;Islam, Md. Mazharul;Rahman, A.K.M. Anisur;Nahar, A.;Rahman, Md. Siddiqur;Son, Hee-Jong
    • Korean Journal of Veterinary Service
    • /
    • v.31 no.4
    • /
    • pp.497-504
    • /
    • 2008
  • A cross-sectional study was undertaken to report prevalence and to identify risk factors of subclinical mastitis of dairy cattle in Sylhet district of Bangladesh. Among 325 dairy farms of the district 12 farms(3.7%) were selected conveniently for this study. All the dairy cows of the 12 farms were selected for sample collection. Fresh milk samples from each of the selected dairy cows were collected aseptically in separate sterilized test tube as RF, RH, LF and LH quarter of the udder. Rapid modified White Side Test(WST) was used to detect subclinical mastitis(SCM). Results of WST and data derived from filled in questionnaire were entered in Microsoft Excel 2003 and transferred to $STATA^{(R)}$, version 8.0/Intercooled(Stata Corporation, Texas, USA, 2003). The overall prevalence of SCM and its distribution in different categories of variables in cow and their exact binomial 95% confidence intervals were calculated in $STATA^{(R)}$. Simple bivariable associations among independent variables were investigated by $x^2$ test in $STATA^{(R)}$. Multiple logistic regression analysis with backward elimination method was used to identify risk factors of SCM. To identify significant variation in quarter SCM, linear regression analysis was performed after arcsine transformation of the data. The overall prevalence of SCM found in this study is 54%. Dairy cows with teat lesions had significantly increased SCM(OR=12342, P value=0.000, 95% CI=762, 199798) than others without teat lesions. The Holstein Friesian X Jersey X Sahiwal breed has significantly decreased(OR=0.18, p=0.03, 95% CI 0.04, 0.85) SCM than other breeds. The prevalence of SCM found in this study is in agreement with others. The injury in the teat increases the probability of getting infected with microbes and thereby mastitis. If the prevalence of teat lesion can be decreased the probability of subclinical mastitis will also be decreased. The negatively associated Holstein Friesian X Jersey X Sahiwall breed may help in planning mastitis control program if this finding can be validated by a more powerful case-control or cohort study design.

Influences of Continuance Intention and Past Behavior on Active Users' Knowledge Sharing Continuance and Frequency: Naver Knowledge-iN case (지속의도와 과거행위가 핵심 사용자의 지식공유 지속여부 및 빈도에 미치는 효과: 네이버 지식인 사례)

  • Kang, Minhyung
    • Knowledge Management Research
    • /
    • v.21 no.3
    • /
    • pp.67-87
    • /
    • 2020
  • Maintaining active users who repeatedly share high-quality knowledge is critical for the success of online Q&A sites. This study suggests two paths that lead to active users' continuous knowledge sharing: 1) elaborated decision process, represented by continuance intention, and 2) automated cognitive process, represented by past behavior. The direct and moderating effects of continuance intention and past behavior were verified by analyzing subjective intention data and objective behavior data of 333 active users of Naver Knowledge-iN. Using Cox proportional hazards regression and negative binomial regression, the influences of continuance intention and past behavior on two types of continuous knowledge sharing were examined. The results showed that only past behavior was significantly influential on knowledge sharing continuance and as to the frequency of knowledge sharing, both continuance intention and past behavior's influences were significant. It was also confirmed that past behavior negatively moderates continuance intention's effect on the frequency of knowledge sharing. In order to maintain active users' continuous knowledge sharing, it is important to habituate knowledge sharing through repetitive knowledge sharing behavior. And in order to increase the frequency of knowledge sharing, in addition to the habituation, appropriate benefits that can increase the continuance intention should be provided.

A Study on Forest Inventory Method Using Aerial Photographs (항공사진(航空寫眞)을 이용(利用)한 산림조사(山林調査) 방법(方法)에 관한 연구(硏究))

  • Lee, Chun Yong
    • Journal of Korean Society of Forest Science
    • /
    • v.60 no.1
    • /
    • pp.10-16
    • /
    • 1983
  • This survey was carried out in Schneegattern Forest District which is located 40 km northeast of Salzburg, Austria. The purpose of interpretation with two sampling methods, stratified sampling and unstratified sampling, on B & W infrared photos, with a scale of 1:10,000 was to know coniferous stand volumn and to reduce the cost, Forest stands were classified into 4 groups; those were non-forest, young stands, beech, coniferous stands. Coniferous and beech stands were devided into age classes I (41-80 years), II (above 81 years). After this delineation sample points were designated on the orthophoto map whose data were transferred from the aerial photos. The volumn data were calculated from DBH using relascope in the field and the results were as follows. 1) Coniferous stand volumn per hactare was ($470{\pm}31.9m^3$ 2) The diameter distribution of $C_1$ was binomial, but $C_2$ showed normal distribution. 3) The stratified sampling method was better than unstratified sampling method.

  • PDF