• Title/Summary/Keyword: binomial statistics

Search Result 174, Processing Time 0.033 seconds

Randomizing Sequences of Finite Length (유한 순서열의 임의화)

  • Huh, Myung-Hoe;Lee, Yong-Goo
    • The Korean Journal of Applied Statistics
    • /
    • v.23 no.1
    • /
    • pp.189-196
    • /
    • 2010
  • It is never an easy task to physically randomize the sequence of cards. For instance, US 1970 draft lottery resulted in a social turmoil since the outcome sequence of 366 birthday numbers showed a significant relationship with the input order (Wikipedia, "Draft Lottery 1969", Retrieved 2009/05/01). We are motivated by Laplace's 1825 book titled Philosophical Essay on Probabilities that says "Suppose that the numbers 1, 2, ..., 100 are placed, according to their natural ordering, in an urn, and suppose further that, after having shaken the urn, to shuffle the numbers, one draws one number. It is clear that if the shuffling has been properly done, each number will have the same chance of being drawn. But if we fear that there are small differences between them depending on the order in which the numbers were put into the urn, we can decrease these differences considerably by placing these numbers in a second urn in the order in which they are drawn from the first urn, and then shaking the second urn to shuffle the numbers. These differences, already imperceptible in the second urn, would be diminished more and more by using a third urn, a fourth urn, &c." (translated by Andrew 1. Dale, 1995, Springer. pp. 35-36). Laplace foresaw what would happen to us in 150 years later, and, even more, suggested the possible tool to handle the problem. But he did omit the detailed arguments for the solution. Thus we would like to write the supplement in modern terms for Laplace in this research note. We formulate the problem with a lottery box model, to which Markov chain theory can be applied. By applying Markov chains repeatedly, one expects the uniform distribution on k states as stationary distribution. Additionally, we show that the probability of even-number of successes in binomial distribution with trials and the success probability $\theta$ approaches to 0.5, as n increases to infinity. Our theory is illustrated to the cases of truncated geometric distribution and the US 1970 draft lottery.

Estimation methods and interpretation of competing risk regression models (경쟁 위험 회귀 모형의 이해와 추정 방법)

  • Kim, Mijeong
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.7
    • /
    • pp.1231-1246
    • /
    • 2016
  • Cause-specific hazard model (Prentice et al., 1978) and subdistribution hazard model (Fine and Gray, 1999) are mostly used for the right censored survival data with competing risks. Some other models for survival data with competing risks have been subsequently introduced; however, those models have not been popularly used because the models cannot provide reliable statistical estimation methods or those are overly difficult to compute. We introduce simple and reliable competing risk regression models which have been recently proposed as well as compare their methodologies. We show how to use SAS and R for the data with competing risks. In addition, we analyze survival data with two competing risks using five different models.

Bayesian estimation of the Korea professional baseball players' hitting ability based on the batting average (한국프로야구 선수들의 타율에 기반된 타격 능력의 베이지안 추정)

  • Cho, Yong Ju;Lee, Kwang Ho
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.1
    • /
    • pp.197-207
    • /
    • 2015
  • In baseball game, the hitting ability of batter is frequently assessed by a batting average, a run batted in, a home run, a run scored, an on-base percentage, etc. Recently, more comprehensive indicators such as OPS, ISO, SECA, TA, RC and XR are often used. But, these measures generally shows large deviations since they are calculated from the data for a certain period of time, and they are not an estimate of a population parameter, either. In this paper, we will presume the pure hitting ability of the korea professional baseball players as a parameter which is depend upon at bat. We will estimate the parameter by using the Bayesian method.

Prediction of the Number of Food Poisoning Occurrences by Microbes (원인균별 식중독 발생 건수 예측)

  • Yeo, In-Kwon
    • The Korean Journal of Applied Statistics
    • /
    • v.26 no.6
    • /
    • pp.923-932
    • /
    • 2013
  • This paper proposes a method to predict the number of foodborne disease outbreaks by microbes. The weekly data of food poisoning occurrences by microbes in Korea contain many zero-valued observations and have dependency between outbreaks. In order to model both phenomena, the number of food poisonings is predicted by an autoregressive model and the probabilities of food poisoning occurrences by microbes (given the total of food poisonings) are estimated by the baseline category logit model. The predicted number of foodborne disease outbreaks by a microbe is obtained by multiplying the predicted number of foodborne disease outbreaks and the estimated probability of the food poisoning by the corresponding microbe. The mean squared error and the mean absolute value error are evaluated to compare the performances of the proposed method and the zero-inflated model.

Class homogeneous tests with correlation (상관관계가 존재하는 등급별 동질성 검정방법)

  • Hong, Chong Sun;Lee, Na Young
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.1
    • /
    • pp.73-83
    • /
    • 2013
  • Among class quantitative tests for the credit rating systems, the credit rating tests for calibration are to test the class homogeneous differences between observed and predicted probabilities. For one time period, binomial test and chi-square test are included, and normal test and extended traffic lights test are also contained for several time peroids. In this work, we consider real data in which there exists correlation among variables, so that these test methods could be applied to the credit rating systems as well as various kinds of the class data such as BWT data and FSI data.

Analysis of the Incidence of Macrosomia in Japan by Parental Nationalities at 5-year Intervals From 1995 to 2020

  • Tasuku Okui
    • Journal of Preventive Medicine and Public Health
    • /
    • v.56 no.4
    • /
    • pp.348-356
    • /
    • 2023
  • Objectives: We investigated trends in the incidence rate of macrosomia and its association with parental nationalities using Vital Statistics data in Japan. Methods: We used singleton birth data every 5 years from 1995 to 2020. The incidence rate of macrosomia was calculated according to specific attributes (maternal age, infant's sex, parental nationalities, parity, and household occupation) over time (years). In addition, a log-binomial model was used to investigate the relationship between the incidence of macrosomia and the attributes. This study compared Korea, China, the Philippines, Brazil, and other countries with Japan in terms of parental nationalities. "Other countries" indicates countries except for Japan, Korea, China, the Philippines, and Brazil. Results: The study included 6 180 787 births. The rate of macrosomia in Japan decreased from 1.43% in 1995 to 0.88% in 2020, and the decrease was observed across all parental nationalities. The rates for Japanese parents were the lowest values among parental nationalities during the timespan investigated. Multivariate regression analysis showed that mothers from Korea, China, the Philippines, Brazil, and other countries had a significantly higher risk of macrosomia than those from Japan (risk ratio, 1.91, 2.82, 1.59, 1.74, and 1.64, respectively). Furthermore, fathers from China, the Philippines, Brazil, and other countries had a significantly higher risk of macrosomia than those from Japan (risk ratio, 1.66, 1.38, 1.88, and 3.02, respectively). Conclusions: The rate of macrosomia decreased from 1995 to 2020 in Japan for parents of all nationalities, and the risk of macrosomia incidence was associated with parental nationality.

An Analysis on the Gender Differences in the Level of Accident Risk using Generalized Linear and Heckman Methods (일반화선형모형과 헤크먼모형을 활용한 성별 자동차사고 위험도 분석)

  • Kim, DaeHwan;Park, HwaGyu
    • The Korean Journal of Applied Statistics
    • /
    • v.27 no.1
    • /
    • pp.147-157
    • /
    • 2014
  • Women's roles have changed substantially in economically developed countries; subsequently, the ratio of female drivers has also increased. In such countries, there has been considerable interest in assessing gender differences in vehicle accident risks and reasons to explain the gender differences. This study investigates the gender differences in vehicle accident risk based on 500,000 drivers randomly selected from a population sample. A Heckman model is used for accident damage and a negative binomial model is used for the accident frequency. Empirical results show that male drivers are 8.3% riskier than female drivers in terms of accident damage; however, female drivers are 113% risker than male drivers in term of accident frequency. We can implement more practical policies to reduce vehicle accidents if we can understand the reasons for the gender differences.

A study on probability of the Korean board game Yut (윷놀이와 확률)

  • Oh, Chang-Hyuck
    • Journal of the Korean Data and Information Science Society
    • /
    • v.21 no.4
    • /
    • pp.719-727
    • /
    • 2010
  • In this study, the Korean traditional board game Yut, is considered. We clarify the definition of yut, object being cast, from investigating shapes of various types of yut. We survey some previous researches on probabilities for the board game Yut. To define goodness of yut, we define probabilistic order for Sawi and determine order of Sawi according to the probability of each yut. For the probability distribution of Sawi, Yut binomial distribution is defined and its mean and variance are calculated. We calculate the expected advancing distance of horse or mal for a player in each of her or his turn. Two indices are suggested for the goodness of yut and probabilities are found for good yut according to these indices and probabilistic order of Sawi. Here it is assumed that four yuts do not need to be all the same. Also some suggestions are given for the standardization of yut in terms of shape and probability.

Analysis of counts in the one-way layout (일원배열 가산자료에서의 처리효과 비교)

  • 이선호
    • The Korean Journal of Applied Statistics
    • /
    • v.10 no.1
    • /
    • pp.105-119
    • /
    • 1997
  • Barnwal and Paul(1988) derived the likelihood ratio statistic and $C(\alpha)$ statistic for testing the equality of the means of several groups of count data in the presence of a common dispersion parameter. These tests are generalized to be applicable without the restriction of a common dispersion parameter. And the assumed model of data is also extended from negative binomial to double exponential Poisson model. Monte Carlo simulations show the superiority of $C(\alpha)$ statistic based on the double exponential Poisson family which has a very simple form and requires estimates of the parameters only under the null hypothesis.

  • PDF

Likelihood Based Confidence Intervals for the Difference of Proportions in Two Doubly Sampled Data with a Common False-Positive Error Rate

  • Lee, Seung-Chun
    • Communications for Statistical Applications and Methods
    • /
    • v.17 no.5
    • /
    • pp.679-688
    • /
    • 2010
  • Lee (2010) developed a confidence interval for the difference of binomial proportions in two doubly sampled data subject to false-positive errors. The confidence interval seems to be adequate for a general double sampling model subject to false-positive misclassification. However, in many applications, the false-positive error rates could be the same. On this note, the construction of asymptotic confidence interval is considered when the false-positive error rates are common. The coverage behaviors of nine likelihood based confidence intervals are examined. It is shown that the confidence interval based Rao score with the expected information has good performance in terms of coverage probability and expected width.