• Title/Summary/Keyword: Count Variable

Search Result 117, Processing Time 0.025 seconds

Bayesian Parameter :Estimation and Variable Selection in Random Effects Generalised Linear Models for Count Data

  • Oh, Man-Suk;Park, Tae-Sung
    • Journal of the Korean Statistical Society
    • /
    • v.31 no.1
    • /
    • pp.93-107
    • /
    • 2002
  • Random effects generalised linear models are useful for analysing clustered count data in which responses are usually correlated. We propose a Bayesian approach to parameter estimation and variable selection in random effects generalised linear models for count data. A simple Gibbs sampling algorithm for parameter estimation is presented and a simple and efficient variable selection is done by using the Gibbs outputs. An illustrative example is provided.

Accuracy Improvement Methode of Step Count Detection Using Variable Amplitude Threshold (가변 진폭 임계값을 이용한 걸음수 검출 정확도 향상 기법)

  • Ryu, Uk Jae;Kim, En Tae;An, Kyung Ho;Chang, Yun Seok
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.2 no.6
    • /
    • pp.257-264
    • /
    • 2013
  • In this study, we have designed the variable amplitude threshold algorithm that can enhance the accuracy of step count using variable amplitude. This algorithm converts the x, y, z sensor values into a single energy value($E_t$) by using SVM(Signal Vector Magnitude) algorithm and can pick step count out over 99% of accuracy through the peak data detection algorithm and fixed peak threshold. To prove the results, We made the noise filtering with the fixed amplitude threshold from the amplitude of energy value that found out the detection error was increasing, and it's the key idea of the variable amplitude threshold that can be adapted on the continuous data evaluation. The experiment results shows that the variable amplitude threshold algorithm can improve the average step count accuracy up to 98.9% at 10 Hz sampling rate and 99.6% at 20Hz sampling rate.

The Effects of Sentiment and Readability on Useful Votes for Customer Reviews with Count Type Review Usefulness Index (온라인 리뷰의 감성과 독해 용이성이 리뷰 유용성에 미치는 영향: 가산형 리뷰 유용성 정보 활용)

  • Cruz, Ruth Angelie;Lee, Hong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.43-61
    • /
    • 2016
  • Customer reviews help potential customers make purchasing decisions. However, the prevalence of reviews on websites push the customer to sift through them and change the focus from a mere search to identifying which of the available reviews are valuable and useful for the purchasing decision at hand. To identify useful reviews, websites have developed different mechanisms to give customers options when evaluating existing reviews. Websites allow users to rate the usefulness of a customer review as helpful or not. Amazon.com uses a ratio-type helpfulness, while Yelp.com uses a count-type usefulness index. This usefulness index provides helpful reviews to future potential purchasers. This study investigated the effects of sentiment and readability on useful votes for customer reviews. Similar studies on the relationship between sentiment and readability have focused on the ratio-type usefulness index utilized by websites such as Amazon.com. In this study, Yelp.com's count-type usefulness index for restaurant reviews was used to investigate the relationship between sentiment/readability and usefulness votes. Yelp.com's online customer reviews for stores in the beverage and food categories were used for the analysis. In total, 170,294 reviews containing information on a store's reputation and popularity were used. The control variables were the review length, store reputation, and popularity; the independent variables were the sentiment and readability, while the dependent variable was the number of helpful votes. The review rating is the moderating variable for the review sentiment and readability. The length is the number of characters in a review. The popularity is the number of reviews for a store, and the reputation is the general average rating of all reviews for a store. The readability of a review was calculated with the Coleman-Liau index. The sentiment is a positivity score for the review as calculated by SentiWordNet. The review rating is a preference score selected from 1 to 5 (stars) by the review author. The dependent variable (i.e., usefulness votes) used in this study is a count variable. Therefore, the Poisson regression model, which is commonly used to account for the discrete and nonnegative nature of count data, was applied in the analyses. The increase in helpful votes was assumed to follow a Poisson distribution. Because the Poisson model assumes an equal mean and variance and the data were over-dispersed, a negative binomial distribution model that allows for over-dispersion of the count variable was used for the estimation. Zero-inflated negative binomial regression was used to model count variables with excessive zeros and over-dispersed count outcome variables. With this model, the excess zeros were assumed to be generated through a separate process from the count values and therefore should be modeled as independently as possible. The results showed that positive sentiment had a negative effect on gaining useful votes for positive reviews but no significant effect on negative reviews. Poor readability had a negative effect on gaining useful votes and was not moderated by the review star ratings. These findings yield considerable managerial implications. The results are helpful for online websites when analyzing their review guidelines and identifying useful reviews for their business. Based on this study, positive reviews are not necessarily helpful; therefore, restaurants should consider which type of positive review is helpful for their business. Second, this study is beneficial for businesses and website designers in creating review mechanisms to know which type of reviews to highlight on their websites and which type of reviews can be beneficial to the business. Moreover, this study highlights the review systems employed by websites to allow their customers to post rating reviews.

Effect of Lead Exposure on the Status of Reticulocyte Count Indices among Workers from Lead Battery Manufacturing Plant

  • Kalahasthi, Ravibabu;Barman, Tapu
    • Toxicological Research
    • /
    • v.32 no.4
    • /
    • pp.281-287
    • /
    • 2016
  • Earlier studies conducted on lead-exposed workers have determined the reticulocyte count (RC) (%), but the parameters of Absolute Reticulocyte Count (ARC), Reticulocyte Index (RI), and Reticulocyte Production Index (RPI) were not reported. This study assessed the effect of lead (Pb) exposure on the status of reticulocyte count indices in workers occupied in lead battery plants. The present cross-sectional study was carried out on 391 male lead battery workers. The blood lead levels (BLL) were determined by using an Atomic Absorption Spectrophotometer. The RC (%) was estimated by using the supravital staining method. The parameters, such as ARC, RI, and RPI, were calculated by using the RC (%) with the red cell indices (RBC count and hematocrit). The levels of RBC count and hematocrit were determined by using an ABX Micros ES-60 hematology analyzer. The levels of reticulocyte count indices - RC (%), ARC, RI, and RPI significantly increased with elevated BLL. The association between BLL and reticulocyte count indices was positive and significant. The results of linear multiple regression analysis showed that the reticulocyte count (${\beta}=0.212$, P < 0.001), ARC (${\beta}=0.217$, P < 0.001), RI (${\beta}=0.194$, P < 0.001), and RPI (${\beta}=0.208$, P < 0.001) were positively associated with BLL. The variable, smoking habits, showed a significant positive association with reticulocyte count indices: RC (%) (${\beta}=0.188$, P < 0.001), ARC (${\beta}=0.174$, P < 0.001), RI (${\beta}=0.200$, P < 0.001), and RPI (${\beta}=0.151$, P < 0.005). The study results revealed that lead exposure may cause reticulocytosis with an increase of reticulocyte count indices.

Developing the Pedestrian Accident Models Using Tobit Model (토빗모형을 이용한 가로구간 보행자 사고모형 개발)

  • Lee, Seung Ju;Kim, Yun Hwan;Park, Byung Ho
    • International Journal of Highway Engineering
    • /
    • v.16 no.3
    • /
    • pp.101-107
    • /
    • 2014
  • PURPOSES : This study deals with the pedestrian accidents in case of Cheongju. The goals are to develop the pedestrian accident model. METHODS : To analyze the accident, count data models, truncated count data models and Tobit regression models are utilized in this study. The dependent variable is the number of accident. Independent variables are traffic volume, intersection geometric structure and the transportation facility. RESULTS : The main results are as follows. First, Tobit model was judged to be more appropriate model than other models. Also, these models were analyzed to be statistically significant. Second, such the main variables related to accidents as traffic volume, pedestrian volume, number of Entry/exit, number of crosswalk and bus stop were adopted in the above model. CONCLUSIONS : The optimal model for pedestrian accidents is evaluated to be Tobit model.

Weighted zero-inflated Poisson mixed model with an application to Medicaid utilization data

  • Lee, Sang Mee;Karrison, Theodore;Nocon, Robert S.;Huang, Elbert
    • Communications for Statistical Applications and Methods
    • /
    • v.25 no.2
    • /
    • pp.173-184
    • /
    • 2018
  • In medical or public health research, it is common to encounter clustered or longitudinal count data that exhibit excess zeros. For example, health care utilization data often have a multi-modal distribution with excess zeroes as well as a multilevel structure where patients are nested within physicians and hospitals. To analyze this type of data, zero-inflated count models with mixed effects have been developed where a count response variable is assumed to be distributed as a mixture of a Poisson or negative binomial and a distribution with a point mass of zeros that include random effects. However, no study has considered a situation where data are also censored due to the finite nature of the observation period or follow-up. In this paper, we present a weighted version of zero-inflated Poisson model with random effects accounting for variable individual follow-up times. We suggested two different types of weight function. The performance of the proposed model is evaluated and compared to a standard zero-inflated mixed model through simulation studies. This approach is then applied to Medicaid data analysis.

A Study on Phon Call Big Data Analytics (전화통화 빅데이터 분석에 관한 연구)

  • Kim, Jeongrae;Jeong, Chanki
    • Journal of Information Technology and Architecture
    • /
    • v.10 no.3
    • /
    • pp.387-397
    • /
    • 2013
  • This paper proposes an approach to big data analytics for phon call data. The analytical models for phon call data is composed of the PVPF (Parallel Variable-length Phrase Finding) algorithm for identifying verbal phrases of natural language and the word count algorithm for measuring the usage frequency of keywords. In the proposed model, we identify words using the PVPF algorithm, and measure the usage frequency of the identified words using word count algorithm in MapReduce. The results can be interpreted from various viewpoints. We design and implement the model based HDFS (Hadoop Distributed File System), verify the proposed approach through a case study of phon call data. So we extract useful results through analysis of keyword correlation and usage frequency.

A Modified Computing Algorithm for Raking Ratio Estimation Subject to Partial Marginal Information

  • Son, Chang Kyoon
    • Communications for Statistical Applications and Methods
    • /
    • v.11 no.2
    • /
    • pp.419-433
    • /
    • 2004
  • We suggest the modified computing algorithm for raking ratio estimation under the assumption that the population total is partially known, and the sample total is completely known about survey variable in contingency table. We show that the proposed estimation procedure is useful to estimate the population cell count in this situation through an empirical study.

Overdispersion in count data - a review (가산자료(count data)의 과산포 검색: 일반화 과정)

  • 김병수;오경주;박철용
    • The Korean Journal of Applied Statistics
    • /
    • v.8 no.2
    • /
    • pp.147-161
    • /
    • 1995
  • The primary objective of this paper is to review parametric models and test statistics related to overdspersion of count data. Poisson or binomial assumption often fails to explain overdispersion. We reviewed real examples of overdispersion in count data that occurred in toxicological or teratological experiments. We also reviewed several models that were suggested for implementing experiments. We also reviewed several models that were suggested for implementing the extra-binomial variation or hyper-Poisson variability, and we noted how these models were generalized and further developed. The approaches that have been suggested for the overdispersion fall into two broad categories. The one is to develop a parametric model for it, and the other is to assume a particular relationship between the variance and the mean of the response variable and to derive a score test staistics for detecting the overdispersion. Recently, Dean(1992) derived a general score test statistics for detecting overdispersion from the exponential family.

  • PDF

Threshold-asymmetric volatility models for integer-valued time series

  • Kim, Deok Ryun;Yoon, Jae Eun;Hwang, Sun Young
    • Communications for Statistical Applications and Methods
    • /
    • v.26 no.3
    • /
    • pp.295-304
    • /
    • 2019
  • This article deals with threshold-asymmetric volatility models for over-dispersed and zero-inflated time series of count data. We introduce various threshold integer-valued autoregressive conditional heteroscedasticity (ARCH) models as incorporating over-dispersion and zero-inflation via conditional Poisson and negative binomial distributions. EM-algorithm is used to estimate parameters. The cholera data from Kolkata in India from 2006 to 2011 is analyzed as a real application. In order to construct the threshold-variable, both local constant mean which is time-varying and grand mean are adopted. It is noted via a data application that threshold model as an asymmetric version is useful in modelling count time series volatility.