• Title/Summary/Keyword: multivariate categorical data

Search Result 17, Processing Time 0.02 seconds

Integration of Categorical Data using Multivariate Kriging for Spatial Interpolation of Ground Survey Data (현장 조사 자료의 공간 보간을 위한 다변량 크리깅을 이용한 범주형 자료의 통합)

  • Park, No-Wook
    • Spatial Information Research
    • /
    • v.19 no.4
    • /
    • pp.81-89
    • /
    • 2011
  • This paper presents a multivariate kriging algorithm that integrates categorical data as secondary data for spatial interpolation of sparsely sampled ground survey data. Instead of using constant mean values in each attribute of categorical data, disaggregated local mean values at target grid points are first estimated by area-to-point kriging and then are used as local mean values in simple kriging with local means. This algorithm is illustrated through a case study of spatial interpolation of a geochemical copper element with geological map data. Cross validation results indicates that the presented algorithm leads to significant respective improvement of 15% and 25% in prediction capability, compared with univariate ordinary kriging and conventional simple kriging with constant mean values. It is expected that the multivariate kriging algorithm applied in this study would be effectively applied for spatial interpolation with categorical data.

Bayesian Analysis for Categorical Data with Missing Traits Under a Multivariate Threshold Animal Model (다형질 Threshold 개체모형에서 Missing 기록을 포함한 이산형 자료에 대한 Bayesian 분석)

  • Lee, Deuk-Hwan
    • Journal of Animal Science and Technology
    • /
    • v.44 no.2
    • /
    • pp.151-164
    • /
    • 2002
  • Genetic variance and covariance components of the linear traits and the ordered categorical traits, that are usually observed as dichotomous or polychotomous outcomes, were simultaneously estimated in a multivariate threshold animal model with concepts of arbitrary underlying liability scales with Bayesian inference via Gibbs sampling algorithms. A multivariate threshold animal model in this study can be allowed in any combination of missing traits with assuming correlation among the traits considered. Gibbs sampling algorithms as a hierarchical Bayesian inference were used to get reliable point estimates to which marginal posterior means of parameters were assumed. Main point of this study is that the underlying values for the observations on the categorical traits sampled at previous round of iteration and the observations on the continuous traits can be considered to sample the underlying values for categorical data and continuous data with missing at current cycle (see appendix). This study also showed that the underlying variables for missing categorical data should be generated with taking into account for the correlated traits to satisfy the fully conditional posterior distributions of parameters although some of papers (Wang et al., 1997; VanTassell et al., 1998) presented that only the residual effects of missing traits were generated in same situation. In present study, Gibbs samplers for making the fully Bayesian inferences for unknown parameters of interests are played rolls with methodologies to enable the any combinations of the linear and categorical traits with missing observations. Moreover, two kinds of constraints to guarantee identifiability for the arbitrary underlying variables are shown with keeping the fully conditional posterior distributions of those parameters. Numerical example for a threshold animal model included the maternal and permanent environmental effects on a multiple ordered categorical trait as calving ease, a binary trait as non-return rate, and the other normally distributed trait, birth weight, is provided with simulation study.

A multivariate latent class profile analysis for longitudinal data with a latent group variable

  • Lee, Jung Wun;Chung, Hwan
    • Communications for Statistical Applications and Methods
    • /
    • v.27 no.1
    • /
    • pp.15-35
    • /
    • 2020
  • In research on behavioral studies, significant attention has been paid to the stage-sequential process for multiple latent class variables. We now explore the stage-sequential process of multiple latent class variables using the multivariate latent class profile analysis (MLCPA). A latent profile variable, representing the stage-sequential process in MLCPA, is formed by a set of repeatedly measured categorical response variables. This paper proposes the extended MLCPA in order to explain an association between the latent profile variable and the latent group variable as a form of a two-dimensional contingency table. We applied the extended MLCPA to the National Longitudinal Survey on Youth 1997 (NLSY97) data to investigate the association between of developmental progression of depression and substance use behaviors among adolescents who experienced Authoritarian parental styles in their youth.

Comparison of Parameter Estimation Methods in the Analysis of Multivariate Categorical Data with Logit Models

  • Song, Hae-Hiang
    • Journal of the Korean Statistical Society
    • /
    • v.12 no.1
    • /
    • pp.24-35
    • /
    • 1983
  • In fitting models to data, selection of the most desirable estimation method and determination of the adequacy of fitted model are the central issues. This paper compares the maximum likelihood estimators and the minimum logit chi-square estimators, both being best asymptotically normal, when logit models are fitted to infant mortality data. Chi-square goodness-of-fit test and likelihood ratio one are also compared. The analysis infant mortality data shows that the outlying observations do not necessarily result in the same impact on goodness-of-fit measures.

  • PDF

Multidimensional scaling of categorical data using the partition method (분할법을 활용한 범주형자료의 다차원척도법)

  • Shin, Sang Min;Chun, Sun-Kyung;Choi, Yong-Seok
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.1
    • /
    • pp.67-75
    • /
    • 2018
  • Multidimensional scaling (MDS) is an exploratory analysis of multivariate data to represent the dissimilarity among objects in the geometric low-dimensional space. However, a general MDS map only shows the information of objects without any information about variables. In this study, we used MDS based on the algorithm of Torgerson (Theory and Methods of Scaling, Wiley, 1958) to visualize some clusters of objects in categorical data. For this, we convert given data into a multiple indicator matrix. Additionally, we added the information of levels for each categorical variable on the MDS map by applying the partition method of Shin et al. (Korean Journal of Applied Statistics, 28, 1171-1180, 2015). Therefore, we can find information on the similarity among objects as well as find associations among categorical variables using the proposed MDS map.

Small Sample Characteristics of Generalized Estimating Equations for Categorical Repeated Measurements (범주형 반복측정자료를 위한 일반화 추정방정식의 소표본 특성)

  • 김동욱;김재직
    • The Korean Journal of Applied Statistics
    • /
    • v.15 no.2
    • /
    • pp.297-310
    • /
    • 2002
  • Liang and Zeger proposed generalized estimating equations(GEE) for analyzing repeated data which is discrete or continuous. GEE model can be extended to model for repeated categorical data and its estimator has asymptotic multivariate normal distribution in large sample sizes. But GEE is based on large sample asymptotic theory. In this paper, we study the properties of GEE estimators for repeated ordinal data in small sample sizes. We generate ordinal repeated measurements for two groups using two methods. Through Monte Carlo simulation studies we investigate the empirical type 1 error rates, powers, relative efficiencies of the GEE estimators, the effect of unequal sample size of two groups, and the performance of variance estimators for polytomous ordinal response variables, especially in small sample sizes.

Computing Algorithm for Genetic Evaluations on Several Linear and Categorical Traits in A Multivariate Threshold Animal Model (범주형 자료를 포함한 다형질 임계개체모형에서 유전능력 추정 알고리즘)

  • Lee, D.H.
    • Journal of Animal Science and Technology
    • /
    • v.46 no.2
    • /
    • pp.137-144
    • /
    • 2004
  • Algorithms for estimating breeding values on several categorical data by using latent variables with threshold conception were developed and showed. Thresholds on each categorical trait were estimated by Newton’s method via gradients and Hessian matrix. This algorithm was developed by way of expansion of bivariate analysis provided by Quaas(2001). Breeding values on latent variables of categorical traits and observations on linear traits were estimated by preconditioned conjugate gradient(PCG) method, which was known having a property of fast convergence. Example was shown by simulated data with two linear traits and a categorical trait with four categories(CE=calving ease) and a dichotomous trait(SB=Still Birth) in threshold animal mixed model(TAMM). Breeding value estimates in TAMM were compared to those in linear animal mixed model (LAMM). As results, correlation estimates of breeding values to parameters were 0.91${\sim}$0.92 on CE and 0.87${\sim}$0.89 on SB in TAMM and 0.72~0.84 on CE and 0.59~0.70 on SB in LAMM. As conclusion, PCG method for estimating breeding values on several categorical traits with linear traits were feasible in TAMM.

Variable selection for latent class analysis using clustering efficiency (잠재변수 모형에서의 군집효율을 이용한 변수선택)

  • Kim, Seongkyung;Seo, Byungtae
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.6
    • /
    • pp.721-732
    • /
    • 2018
  • Latent class analysis (LCA) is an important tool to explore unseen latent groups in multivariate categorical data. In practice, it is important to select a suitable set of variables because the inclusion of too many variables in the model makes the model complicated and reduces the accuracy of the parameter estimates. Dean and Raftery (Annals of the Institute of Statistical Mathematics, 62, 11-35, 2010) proposed a headlong search algorithm based on Bayesian information criteria values to choose meaningful variables for LCA. In this paper, we propose a new variable selection procedure for LCA by utilizing posterior probabilities obtained from each fitted model. We propose a new statistic to measure the adequacy of LCA and develop a variable selection procedure. The effectiveness of the proposed method is also presented through some numerical studies.

Multiple Testing in Genomic Sequences Using Hamming Distance

  • Kang, Moonsu
    • Communications for Statistical Applications and Methods
    • /
    • v.19 no.6
    • /
    • pp.899-904
    • /
    • 2012
  • High-dimensional categorical data models with small sample sizes have not been used extensively in genomic sequences that involve count (or discrete) or purely qualitative responses. A basic task is to identify differentially expressed genes (or positions) among a number of genes. It requires an appropriate test statistics and a corresponding multiple testing procedure so that a multivariate analysis of variance should not be feasible. A family wise error rate(FWER) is not appropriate to test thousands of genes simultaneously in a multiple testing procedure. False discovery rate(FDR) is better than FWER in multiple testing problems. The data from the 2002-2003 SARS epidemic shows that a conventional FDR procedure and a proposed test statistic based on a pseudo-marginal approach with Hamming distance performs better.