• Title/Summary/Keyword: Categorical data

Search Result 369, Processing Time 0.022 seconds

Development of Analysis Method of Ordered Categorical Data for Optimal Parameter Design (순차 범주형 데이타의 최적 모수 설계를 위한 분석법 개발)

  • Jeon, Tae-Jun;Park, Ho-Il;Hong, Nam-Pyo;Choe, Seong-Jo
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.20 no.1
    • /
    • pp.27-38
    • /
    • 1994
  • Accumulation analysis is difficult to analyze the ordered categorical data except smaller-the-better type problem. The purpose of this paper is to develop the statistic and method that can be easily applied to general type of problem, including nominal-the-best type problem. The experimental data of contact window process is analyzed and new procedure is compared with accumulation analysis.

  • PDF

Sensitivity Analysis for Ordered Categorical Data

  • Cho, Il-Hyun;Park, Taesung
    • Communications for Statistical Applications and Methods
    • /
    • v.6 no.2
    • /
    • pp.375-382
    • /
    • 1999
  • Linear-by-linear association models are commonly used to analyze ordered categorical data. To fit these models appropriate scores need to be chosen. In this paper we perform sensitivity analyses in two-way contingency tables to investigate the effect of scores on goodness-of-fits and on tests of significance. In addition we show that the best score which yields the best fit of data can be selected based on the sensitivity analysis results.

  • PDF

Bayesian Analysis for Categorical Data with Missing Traits Under a Multivariate Threshold Animal Model (다형질 Threshold 개체모형에서 Missing 기록을 포함한 이산형 자료에 대한 Bayesian 분석)

  • Lee, Deuk-Hwan
    • Journal of Animal Science and Technology
    • /
    • v.44 no.2
    • /
    • pp.151-164
    • /
    • 2002
  • Genetic variance and covariance components of the linear traits and the ordered categorical traits, that are usually observed as dichotomous or polychotomous outcomes, were simultaneously estimated in a multivariate threshold animal model with concepts of arbitrary underlying liability scales with Bayesian inference via Gibbs sampling algorithms. A multivariate threshold animal model in this study can be allowed in any combination of missing traits with assuming correlation among the traits considered. Gibbs sampling algorithms as a hierarchical Bayesian inference were used to get reliable point estimates to which marginal posterior means of parameters were assumed. Main point of this study is that the underlying values for the observations on the categorical traits sampled at previous round of iteration and the observations on the continuous traits can be considered to sample the underlying values for categorical data and continuous data with missing at current cycle (see appendix). This study also showed that the underlying variables for missing categorical data should be generated with taking into account for the correlated traits to satisfy the fully conditional posterior distributions of parameters although some of papers (Wang et al., 1997; VanTassell et al., 1998) presented that only the residual effects of missing traits were generated in same situation. In present study, Gibbs samplers for making the fully Bayesian inferences for unknown parameters of interests are played rolls with methodologies to enable the any combinations of the linear and categorical traits with missing observations. Moreover, two kinds of constraints to guarantee identifiability for the arbitrary underlying variables are shown with keeping the fully conditional posterior distributions of those parameters. Numerical example for a threshold animal model included the maternal and permanent environmental effects on a multiple ordered categorical trait as calving ease, a binary trait as non-return rate, and the other normally distributed trait, birth weight, is provided with simulation study.

Improving Classification Performance for Data with Numeric and Categorical Attributes Using Feature Wrapping (특징 래핑을 통한 숫자형 특징과 범주형 특징이 혼합된 데이터의 클래스 분류 성능 향상 기법)

  • Lee, Jae-Sung;Kim, Dae-Won
    • Journal of KIISE:Software and Applications
    • /
    • v.36 no.12
    • /
    • pp.1024-1027
    • /
    • 2009
  • In this letter, we evaluate the classification performance of mixed numeric and categorical data for comparing the efficiency of feature filtering and feature wrapping. Because the mixed data is composed of numeric and categorical features, the feature selection method was applied to data set after discretizing the numeric features in the given data set. In this study, we choose the feature subset for improving the classification performance of the data set after preprocessing. The experimental result of comparing the classification performance show that the feature wrapping method is more reliable than feature filtering method in the aspect of classification accuracy.

Case Studies on the Optimal Parameter Design with Respect to Categorial Characteristics (범주형 품질특성의 최적설계 사례연구)

  • Park, Jong-In;Bae, Suk-Joo;Kim, Man-Soo
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.32 no.3
    • /
    • pp.135-141
    • /
    • 2009
  • A variety of statistical methods are applied to model and optimize responses, related to product or system's quality, in terms of control and noise factors at design and manufacturing stages. Most of them assume continuous response variables but, assessing the performance of a product or system often involves categorical observations, such as ratings and scores. Although most previous works to deal with the categorical data provide sorhisticated response models and ensure unbiased outcomes, they require heavy computation to estimate the model parameters, as well as enough replications. In this study, we present some practical approaches for optimal parameter design with ordered categorical response when only a few or no replication is available. Two real-life examples are given to illustrate the presented methods.

A Method for Reduction of Categorical Variables Based on a Concept of Pseudo-Correlation Coefficient (유사상관계수의 개념을 도입한 범주형 변수의 축약에 관한 연구)

  • Kwon, Cheol-Shin;Hong, Soon-Wook
    • IE interfaces
    • /
    • v.14 no.1
    • /
    • pp.79-83
    • /
    • 2001
  • In this paper, we propose a simple method to reduce categorical variables into smaller, but significant numbers, and also demonstrate how the proposed method can be applied to the problem of reduction that empirical research often faces in the course of data processing. For the purpose, we introduce a concept of pseudo-correlation coefficient to make it possible to use factor analysis (FA) as a tool for reducing variables. The main idea of the concept is to deal with the measures of association of categorical variables in the sense of the concept of Pearson's correlation coefficient in order to meet the input requirement of FA. Upon examination of existing measures that could play as pseudo-correlation coefficients, Cramer's V coefficient is selected for the best result among them. To show the detailed procedure of the proposed method, a specific demonstration with the data from 329 R&D projects conducted in 18 private laboratories in electric and electronics industry is presented.

  • PDF

GOODNESS OF FIT TESTS BASED ON DIVERGENCE MEASURES

  • Pasha, Eynollah;Kokabi, Mohsen;Mohtashami, Gholam Reza
    • Journal of applied mathematics & informatics
    • /
    • v.26 no.1_2
    • /
    • pp.177-189
    • /
    • 2008
  • In this paper, we have considered an investigation on goodness of fit tests based on divergence measures. In the case of categorical data, under certain regularity conditions, we obtained asymptotic distribution of these tests. Also, we have proposed a modified test that improves the rate of convergence. In continuous case, we used our modified entropy estimator [10], for Kullback-Leibler information estimation. A comparative study based on simulation results is discussed also.

  • PDF

On the Categorical Variable Clustering

  • Kim, Dae-Hak
    • Journal of the Korean Data and Information Science Society
    • /
    • v.7 no.2
    • /
    • pp.219-226
    • /
    • 1996
  • Basic objective in cluster analysis is to discover natural groupings of items or variables. In general, variable clustering was conducted based on some similarity measures between variables which have binary characteristics. We propose a variable clustering method when variables have more categories ordered in some sense. We also consider some measures of association as a similarity between variables. Numerical example is included.

  • PDF

A study on the optimal parameter design by analyzing the ordered categorical data (순차 범주형 데이타분석을 위한 최적모수설계에 관한 연구)

  • 전태준;홍남표;박호일
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 1992.04b
    • /
    • pp.188-197
    • /
    • 1992
  • 제품 개발에 관한 응용 연구 혹은 개발 연구의 실험 결과가 품질특성의 본질적인 성격이나 측정시의 편의때문에 순차 범주형 자료(ordered categorical data)로 분류되는 경우가 있다. 본 논문에서는 망목 특성 문제(nominal-the-best type problem)를 분석하는데 있어서 기존의 다구찌 누적법이 순차 범주형 자료분석법이 안고 있는 문제점들을 고찰하고, 이를 개선하기 위해 품질손실에 근거한 목표 누적법을 제시한다. 본 논문에서 제시한 기법을 post-etch contact window데이타에 적용해 본 결과 인자의 최적수준을 결정하는데 용이하였다.

  • PDF