• Title/Summary/Keyword: Categorical data

Search Result 370, Processing Time 0.022 seconds

A Bayesian uncertainty analysis for nonignorable nonresponse in two-way contingency table

  • Woo, Namkyo;Kim, Dal Ho
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.6
    • /
    • pp.1547-1555
    • /
    • 2015
  • We study the problem of nonignorable nonresponse in a two-way contingency table and there may be one or two missing categories. We describe a nonignorable nonresponse model for the analysis of two-way categorical table. One approach to analyze these data is to construct several tables (one complete and the others incomplete). There are nonidentifiable parameters in incomplete tables. We describe a hierarchical Bayesian model to analyze two-way categorical data. We use a nonignorable nonresponse model with Bayesian uncertainty analysis by placing priors in nonidentifiable parameters instead of a sensitivity analysis for nonidentifiable parameters. To reduce the effects of nonidentifiable parameters, we project the parameters to a lower dimensional space and we allow the reduced set of parameters to share a common distribution. We use the griddy Gibbs sampler to fit our models and compute DIC and BPP for model diagnostics. We illustrate our method using data from NHANES III data to obtain the finite population proportions.

Categorical time series clustering: Case study of Korean pro-baseball data (범주형 시계열 자료의 군집화: 프로야구 자료의 사례 연구)

  • Pak, Ro Jin
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.3
    • /
    • pp.621-627
    • /
    • 2016
  • A certain professional baseball team tends to be very weak against another particular team. For example, S team, the strongest team in Korea, is relatively weak to H team. In this paper, we carried out clustering the Korean baseball teams based on the records against the team S to investigate whether the pattern of the record of the team H is different from those of the other teams. The technique we have employed is 'time series clustering', or more specifically 'categorical time series clustering'. Three methods have been considered in this paper: (i) distance based method, (ii) genetic sequencing method and (iii) periodogram method. Each method has its own advantages and disadvantages to handle categorical time series, so that it is recommended to draw conclusion by considering the results from the above three methods altogether in a comprehensive manner.

Comparing Accuracy of Imputation Methods for Categorical Incomplete Data (범주형 자료의 결측치 추정방법 성능 비교)

  • 신형원;손소영
    • The Korean Journal of Applied Statistics
    • /
    • v.15 no.1
    • /
    • pp.33-43
    • /
    • 2002
  • Various kinds of estimation methods have been developed for imputation of categorical missing data. They include category method, logistic regression, and association rule. In this study, we propose two fusions algorithms based on both neural network and voting scheme that combine the results of individual imputation methods. A Mont-Carlo simulation is used to compare the performance of these methods. Five factors used to simulate the missing data pattern are (1) input-output function, (2) data size, (3) noise of input-output function (4) proportion of missing data, and (5) pattern of missing data. Experimental study results indicate the following: when the data size is small and missing data proportion is large, modal category method, association rule, and neural network based fusion have better performances than the other methods. However, when the data size is small and correlation between input and missing output is strong, logistic regression and neural network barred fusion algorithm appear better than the others. When data size is large with low missing data proportion, a large noise, and strong correlation between input and missing output, neural networks based fusion algorithm turns out to be the best choice.

Investigation of Biases for Variance Components on Multiple Traits with Varying Number of Categories in Threshold Models Using Bayesian Inferences

  • Lee, D.H.
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.15 no.7
    • /
    • pp.925-931
    • /
    • 2002
  • Gibbs sampling algorithms were implemented to the multi-trait threshold animal models with any combinations of multiple binary, ordered categorical, and linear traits and investigate the amount of bias on these models with two kinds of parameterization and algorithms for generating underlying liabilities. Statistical models which included additive genetic and residual effects as random and contemporary group effects as fixed were considered on the models using simulated data. The fully conditional posterior means of heritabilities and genetic (residual) correlations were calculated from 1,000 samples retained every 10th samples after 15,000 samples discarded as "burn-in" period. Under the models considered, several combinations of three traits with binary, multiple ordered categories, and continuous were analyzed. Five replicates were carried out. Estimates for heritabilities and genetic (residual) correlations as the posterior means were unbiased when underlying liabilities for a categorical trait were generated given by underlying liabilities of the other traits and threshold estimates were rescaled. Otherwise, when parameterizing threshold of zero and residual variance of one for binary traits, heritability estimates were inflated 7-10% upward. Genetic correlation estimates were biased upward if positively correlated and downward if negatively correlated when underling liabilities were generated without accounting for correlated traits on prior information. Residual correlation estimates were, consequently, much biased downward if positively correlated and upward if negatively correlated in that case. The more categorical trait had categories, the better mixing rate was shown.

Imputation for Binary or Ordered Categorical Traits Based on the Bayesian Threshold Model (베이지안 분계점 모형에 의한 순서 범주형 변수의 대체)

  • Lee Seung-Chun
    • The Korean Journal of Applied Statistics
    • /
    • v.18 no.3
    • /
    • pp.597-606
    • /
    • 2005
  • The nonresponse in sample survey causes a problem when it comes time to analyze dataset in public-use files where the user has only complete-data methods available and has limited information about the reasons for nonresponse. Recently imputation for nonresponse is becoming a standard approach for handling nonresponse and various imputation methods have been devised . However, most imputation methods concern with continuous traits while many interesting features are measured by binary or ordered categorical scales in sample survey. In this note. an imputation method for ignorable nonresponse in binary or ordered categorical traits is considered.

A polychotomous regression model with tensor product splines and direct sums (연속형의 텐서곱과 범주형의 직합을 사용한 다항 로지스틱 회귀모형)

  • Sim, Songyong;Kang, Heemo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.1
    • /
    • pp.19-26
    • /
    • 2014
  • In this paper, we propose a polychotomous regression model when independent variables include both categorical and numerical variables. For categorical independent variables, we use direct sums, and tensor product splines are used for continuous independent variables. We use BIC for varible selections criterior. We implemented the algorithm and apply the algorithm to real data. The use of direct sums and tensor products outperformed the usual multinomial logistic regression model.

Evaluation Method of Quality of Service in Telecommunications Using Logit Model (로짓모형을 이용한 통신 서비스품질 평가방법)

  • Cho, Jae-Gyeun;Ahn, Hae-Sook
    • IE interfaces
    • /
    • v.15 no.2
    • /
    • pp.209-217
    • /
    • 2002
  • Quality of Service(QoS) in the telecommunications can be evaluated by analyzing the opinion data which result from the surveyed opinions of respondents and quantify subjective satisfaction on the QoS from the customers' viewpoints. For analyzing the opinion data, MOS(mean opinion score) method and Cumulative Probability Curve method are often used. The methods are based on the scoring method, and therefore, have the intrinsic deficiency due to the assignment of arbitrary scores. In this paper, we propose an analysis method of the opinion data using logit models which can be used to analyze the ordinal categorical data without assigning arbitrary scores to customers' opinion, and develop an analysis procedure considering the usage of procedures provided by SAS(Statistical Analysis System) statistical package. By the proposed method, we can estimate the relationship between customer satisfaction and network performance parameters, and provide guidelines for network planning. In addition, the proposed method is compared with Cumulative Probability Curve method with respect to prediction errors.

Modeling of random effects covariance matrix in marginalized random effects models

  • Lee, Keunbaik;Kim, Seolhwa
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.3
    • /
    • pp.815-825
    • /
    • 2016
  • Marginalized random effects models (MREMs) are often used to analyze longitudinal categorical data. The models permit direct estimation of marginal mean parameters and specify the serial correlation of longitudinal categorical data via the random effects. However, it is not easy to estimate the random effects covariance matrix in the MREMs because the matrix is high-dimensional and must be positive-definite. To solve these restrictions, we introduce two modeling approaches of the random effects covariance matrix: partial autocorrelation and the modified Cholesky decomposition. These proposed methods are illustrated with the real data from Korean genomic epidemiology study.

Optimal Process Condition for Products with Multi-Categorical Ordinal Quality Characteristic (다범주 순서형 품질특성을 갖는 제품의 최적 공정조건 결정에 관한 연구)

  • Kim Sang-Cheol;Yun Won-Young;Chun Young-Rok
    • Journal of Korean Society for Quality Management
    • /
    • v.32 no.3
    • /
    • pp.109-125
    • /
    • 2004
  • This paper deals with an optimal process control problem in production of hull structural steel plate with high defective rate. The main quality characteristic(dependent variable) is the internal quality(defect) of plates and is dependent on process parameters(independent variables). The dependent variable(quality characteristics) has three categorical ordinal data and there are 35 independent variables(29 continuous variables and 6 categorical variables). In this paper, we determine the main factors and to develop the mathematical model between internal quality predicted probabilities and the main factors. Secondly, we find out the optimal process condition of main factors through analysis of variance(ANOVA) using simulation. We consider three models to obtain the main factors and the optimal process condition: linear, quadratic, error models.

An Analysis of Categorical Time Series Driven by Clipping GARCH Processes (연속형-GARCH 시계열의 범주형화(Clipping)를 통한 분석)

  • Choi, M.S.;Baek, J.S.;Hwan, S.Y.
    • The Korean Journal of Applied Statistics
    • /
    • v.23 no.4
    • /
    • pp.683-692
    • /
    • 2010
  • This short article is concerned with a categorical time series obtained after clipping a heteroscedastic GARCH process. Estimation methods are discussed for the model parameters appearing both in the original process and in the resulting binary time series from a clipping (cf. Zhen and Basawa, 2009). Assuming AR-GARCH model for heteroscedastic time series, three data sets from Korean stock market are analyzed and illustrated with applications to calculating certain probabilities associated with the AR-GARCH process.