• Title/Summary/Keyword: categorical data analysis

Search Result 195, Processing Time 0.021 seconds

Hypothesis Testing: Means and Proportions (평균과 비율 비교)

  • Pak, Son-Il;Lee, Young-Won
    • Journal of Veterinary Clinics
    • /
    • v.26 no.5
    • /
    • pp.401-407
    • /
    • 2009
  • In the previous article in this series we introduced the basic concepts for statistical analysis. The present review introduces hypothesis testing for continuous and categorical data for readers of the veterinary science literature. For the analysis of continuous data, we explained t-test to compare a single mean with a hypothesized value and the difference between two means from two independent samples or between two means arising from paired samples. When the data are categorical variables, the $x^2$ test for association and homogeneity, Fisher's exact test and Yates' continuity correction for small samples, and test for trend, in which at least one of the variables is ordinal is described, together with the worked examples. McNemar test for correlated proportions is also discussed. The topics covered may provide a basic understanding of different approaches for analyzing clinical data.

Categorical time series clustering: Case study of Korean pro-baseball data (범주형 시계열 자료의 군집화: 프로야구 자료의 사례 연구)

  • Pak, Ro Jin
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.3
    • /
    • pp.621-627
    • /
    • 2016
  • A certain professional baseball team tends to be very weak against another particular team. For example, S team, the strongest team in Korea, is relatively weak to H team. In this paper, we carried out clustering the Korean baseball teams based on the records against the team S to investigate whether the pattern of the record of the team H is different from those of the other teams. The technique we have employed is 'time series clustering', or more specifically 'categorical time series clustering'. Three methods have been considered in this paper: (i) distance based method, (ii) genetic sequencing method and (iii) periodogram method. Each method has its own advantages and disadvantages to handle categorical time series, so that it is recommended to draw conclusion by considering the results from the above three methods altogether in a comprehensive manner.

Bayesian Analysis for Categorical Data with Missing Traits Under a Multivariate Threshold Animal Model (다형질 Threshold 개체모형에서 Missing 기록을 포함한 이산형 자료에 대한 Bayesian 분석)

  • Lee, Deuk-Hwan
    • Journal of Animal Science and Technology
    • /
    • v.44 no.2
    • /
    • pp.151-164
    • /
    • 2002
  • Genetic variance and covariance components of the linear traits and the ordered categorical traits, that are usually observed as dichotomous or polychotomous outcomes, were simultaneously estimated in a multivariate threshold animal model with concepts of arbitrary underlying liability scales with Bayesian inference via Gibbs sampling algorithms. A multivariate threshold animal model in this study can be allowed in any combination of missing traits with assuming correlation among the traits considered. Gibbs sampling algorithms as a hierarchical Bayesian inference were used to get reliable point estimates to which marginal posterior means of parameters were assumed. Main point of this study is that the underlying values for the observations on the categorical traits sampled at previous round of iteration and the observations on the continuous traits can be considered to sample the underlying values for categorical data and continuous data with missing at current cycle (see appendix). This study also showed that the underlying variables for missing categorical data should be generated with taking into account for the correlated traits to satisfy the fully conditional posterior distributions of parameters although some of papers (Wang et al., 1997; VanTassell et al., 1998) presented that only the residual effects of missing traits were generated in same situation. In present study, Gibbs samplers for making the fully Bayesian inferences for unknown parameters of interests are played rolls with methodologies to enable the any combinations of the linear and categorical traits with missing observations. Moreover, two kinds of constraints to guarantee identifiability for the arbitrary underlying variables are shown with keeping the fully conditional posterior distributions of those parameters. Numerical example for a threshold animal model included the maternal and permanent environmental effects on a multiple ordered categorical trait as calving ease, a binary trait as non-return rate, and the other normally distributed trait, birth weight, is provided with simulation study.

Review and discussion of marginalized random effects models (주변화 변량효과모형의 조사 및 고찰)

  • Jeon, Joo Yeong;Lee, Keunbaik
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.6
    • /
    • pp.1263-1272
    • /
    • 2014
  • Longitudinal categorical data commonly occur from medical, health, and social sciences. In these data, the correlation of repeated outcomes is taken into account to explain the effects of covariates exactly. In this paper, we introduce marginalized random effects models that are used for the estimation of the population-averaged effects of covariates. We also review how these models have been developed. Real data analysis is presented using the marginalized random effects.

Multidimensional scaling of categorical data using the partition method (분할법을 활용한 범주형자료의 다차원척도법)

  • Shin, Sang Min;Chun, Sun-Kyung;Choi, Yong-Seok
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.1
    • /
    • pp.67-75
    • /
    • 2018
  • Multidimensional scaling (MDS) is an exploratory analysis of multivariate data to represent the dissimilarity among objects in the geometric low-dimensional space. However, a general MDS map only shows the information of objects without any information about variables. In this study, we used MDS based on the algorithm of Torgerson (Theory and Methods of Scaling, Wiley, 1958) to visualize some clusters of objects in categorical data. For this, we convert given data into a multiple indicator matrix. Additionally, we added the information of levels for each categorical variable on the MDS map by applying the partition method of Shin et al. (Korean Journal of Applied Statistics, 28, 1171-1180, 2015). Therefore, we can find information on the similarity among objects as well as find associations among categorical variables using the proposed MDS map.

Optimal Process Condition for Products with Multi-Categorical Ordinal Quality Characteristic (다범주 순서형 품질특성을 갖는 제품의 최적 공정조건 결정에 관한 연구)

  • Kim Sang-Cheol;Yun Won-Young;Chun Young-Rok
    • Journal of Korean Society for Quality Management
    • /
    • v.32 no.3
    • /
    • pp.109-125
    • /
    • 2004
  • This paper deals with an optimal process control problem in production of hull structural steel plate with high defective rate. The main quality characteristic(dependent variable) is the internal quality(defect) of plates and is dependent on process parameters(independent variables). The dependent variable(quality characteristics) has three categorical ordinal data and there are 35 independent variables(29 continuous variables and 6 categorical variables). In this paper, we determine the main factors and to develop the mathematical model between internal quality predicted probabilities and the main factors. Secondly, we find out the optimal process condition of main factors through analysis of variance(ANOVA) using simulation. We consider three models to obtain the main factors and the optimal process condition: linear, quadratic, error models.

An Analysis of Categorical Time Series Driven by Clipping GARCH Processes (연속형-GARCH 시계열의 범주형화(Clipping)를 통한 분석)

  • Choi, M.S.;Baek, J.S.;Hwan, S.Y.
    • The Korean Journal of Applied Statistics
    • /
    • v.23 no.4
    • /
    • pp.683-692
    • /
    • 2010
  • This short article is concerned with a categorical time series obtained after clipping a heteroscedastic GARCH process. Estimation methods are discussed for the model parameters appearing both in the original process and in the resulting binary time series from a clipping (cf. Zhen and Basawa, 2009). Assuming AR-GARCH model for heteroscedastic time series, three data sets from Korean stock market are analyzed and illustrated with applications to calculating certain probabilities associated with the AR-GARCH process.

Analysis of categorical data with nonresponses (무응답을 포함하는 범주형 자료의 분석)

  • 박태성;이승연
    • The Korean Journal of Applied Statistics
    • /
    • v.11 no.1
    • /
    • pp.83-95
    • /
    • 1998
  • Statistical models are proposed for analyzing categorical data in the presence of missing observations or nonresponses which might occur in the sampling surveys and polls. As an illustration, we analyzed real polling data of the pre-presidential election in the USA, 1948, It had been predicted that Dewey would win the election. However, Truman won in the actual election.

  • PDF

Computing Algorithm for Genetic Evaluations on Several Linear and Categorical Traits in A Multivariate Threshold Animal Model (범주형 자료를 포함한 다형질 임계개체모형에서 유전능력 추정 알고리즘)

  • Lee, D.H.
    • Journal of Animal Science and Technology
    • /
    • v.46 no.2
    • /
    • pp.137-144
    • /
    • 2004
  • Algorithms for estimating breeding values on several categorical data by using latent variables with threshold conception were developed and showed. Thresholds on each categorical trait were estimated by Newton’s method via gradients and Hessian matrix. This algorithm was developed by way of expansion of bivariate analysis provided by Quaas(2001). Breeding values on latent variables of categorical traits and observations on linear traits were estimated by preconditioned conjugate gradient(PCG) method, which was known having a property of fast convergence. Example was shown by simulated data with two linear traits and a categorical trait with four categories(CE=calving ease) and a dichotomous trait(SB=Still Birth) in threshold animal mixed model(TAMM). Breeding value estimates in TAMM were compared to those in linear animal mixed model (LAMM). As results, correlation estimates of breeding values to parameters were 0.91${\sim}$0.92 on CE and 0.87${\sim}$0.89 on SB in TAMM and 0.72~0.84 on CE and 0.59~0.70 on SB in LAMM. As conclusion, PCG method for estimating breeding values on several categorical traits with linear traits were feasible in TAMM.