• Title/Summary/Keyword: Categorical Information

Search Result 219, Processing Time 0.021 seconds

A Bayesian uncertainty analysis for nonignorable nonresponse in two-way contingency table

  • Woo, Namkyo;Kim, Dal Ho
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.6
    • /
    • pp.1547-1555
    • /
    • 2015
  • We study the problem of nonignorable nonresponse in a two-way contingency table and there may be one or two missing categories. We describe a nonignorable nonresponse model for the analysis of two-way categorical table. One approach to analyze these data is to construct several tables (one complete and the others incomplete). There are nonidentifiable parameters in incomplete tables. We describe a hierarchical Bayesian model to analyze two-way categorical data. We use a nonignorable nonresponse model with Bayesian uncertainty analysis by placing priors in nonidentifiable parameters instead of a sensitivity analysis for nonidentifiable parameters. To reduce the effects of nonidentifiable parameters, we project the parameters to a lower dimensional space and we allow the reduced set of parameters to share a common distribution. We use the griddy Gibbs sampler to fit our models and compute DIC and BPP for model diagnostics. We illustrate our method using data from NHANES III data to obtain the finite population proportions.

Review and discussion of marginalized random effects models (주변화 변량효과모형의 조사 및 고찰)

  • Jeon, Joo Yeong;Lee, Keunbaik
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.6
    • /
    • pp.1263-1272
    • /
    • 2014
  • Longitudinal categorical data commonly occur from medical, health, and social sciences. In these data, the correlation of repeated outcomes is taken into account to explain the effects of covariates exactly. In this paper, we introduce marginalized random effects models that are used for the estimation of the population-averaged effects of covariates. We also review how these models have been developed. Real data analysis is presented using the marginalized random effects.

Improving Classification Performance for Data with Numeric and Categorical Attributes Using Feature Wrapping (특징 래핑을 통한 숫자형 특징과 범주형 특징이 혼합된 데이터의 클래스 분류 성능 향상 기법)

  • Lee, Jae-Sung;Kim, Dae-Won
    • Journal of KIISE:Software and Applications
    • /
    • v.36 no.12
    • /
    • pp.1024-1027
    • /
    • 2009
  • In this letter, we evaluate the classification performance of mixed numeric and categorical data for comparing the efficiency of feature filtering and feature wrapping. Because the mixed data is composed of numeric and categorical features, the feature selection method was applied to data set after discretizing the numeric features in the given data set. In this study, we choose the feature subset for improving the classification performance of the data set after preprocessing. The experimental result of comparing the classification performance show that the feature wrapping method is more reliable than feature filtering method in the aspect of classification accuracy.

General properties and phylogenetic utilities of nuclear ribosomal DNA and mitochondrial DNA commonly used in molecular systematics

  • Hwang, Ui-Wook;Kim, Won
    • Parasites, Hosts and Diseases
    • /
    • v.37 no.4
    • /
    • pp.215-228
    • /
    • 1999
  • To choose one or more appropriate molecular markers or gene regions for resolving a particular systematic question among the organisms at a certain categorical level is still a very difficult process. The primary goal of this review, therefore, is to provide a theoretical information in choosing one or more molecular markers or gene regions by illustrating general properties and phylogenetic utilities of nuclear ribosomal DNA (rDNA) and mitochondrial DNA (mtDNA) that have been most commonly used for phylogenetic researches. The highly conserved molecular markers and/or gene regions are useful for investigating phylogenetic relationships at higher categorical levels (deep branches of evolutionary history). On the other hand, the hypervariable molecular markers and/or gene regions are useful for elucidating phylogenetic relationships at lower categorical levels (recently diverged branches). In summary, different selective forces have led to the evolution of various molecular markers or gene regions with varying degrees of sequence conservation. Thus, appropriate molecular markers or gene regions should be chosen with even greater caution to deduce true phylogenetic relationships over a broad taxonomic spectrum.

  • PDF

Parallel k-Modes Algorithm for Spark Framework (스파크 프레임워크를 위한 병렬적 k-Modes 알고리즘)

  • Chung, Jaehwa
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.10
    • /
    • pp.487-492
    • /
    • 2017
  • Clustering is a technique which is used to measure similarities between data in big data analysis and data mining field. Among various clustering methods, k-Modes algorithm is representatively used for categorical data. To increase the performance of iterative-centric tasks such as k-Modes, a distributed and concurrent framework Spark has been received great attention recently because it overcomes the limitation of Hadoop. Spark provides an environment that can process large amount of data in main memory using the concept of abstract objects called RDD. Spark provides Mllib, a dedicated library for machine learning, but Mllib only includes k-means that can process only continuous data, so there is a limitation that categorical data processing is impossible. In this paper, we design RDD for k-Modes algorithm for categorical data clustering in spark environment and implement an algorithm that can operate effectively. Experiments show that the proposed algorithm increases linearly in the spark environment.

Poll System using E-mails

  • Kim, Yon Hyong;Oh, Min Gweon
    • Communications for Statistical Applications and Methods
    • /
    • v.8 no.3
    • /
    • pp.767-775
    • /
    • 2001
  • In this paper we propose a poll system using e-mail. This system expects to increase the response ratio because of including a questionnaire inner e-mail. Especially, this system automatically provides a general paper which is a result of categorical data analysis.

  • PDF

Nonlinear Canonical Correlation Analysis for Paralysis Disease Data

  • Shin, Yang-Kyu
    • Journal of the Korean Data and Information Science Society
    • /
    • v.15 no.3
    • /
    • pp.515-521
    • /
    • 2004
  • Categorical data are mostly found in oriental medical research. The nonlinear canonical correlation analysis does not assume an interval level of measurement. In this paper, we apply nonlinear canonical correlation analysis to quantification and explain how similar sets of variables are to one another for paralysis disease data.

  • PDF

Evaluation Method of Quality of Service in Telecommunications Using Logit Model (로짓모형을 이용한 통신 서비스품질 평가방법)

  • Cho, Jae-Gyeun;Ahn, Hae-Sook
    • IE interfaces
    • /
    • v.15 no.2
    • /
    • pp.209-217
    • /
    • 2002
  • Quality of Service(QoS) in the telecommunications can be evaluated by analyzing the opinion data which result from the surveyed opinions of respondents and quantify subjective satisfaction on the QoS from the customers' viewpoints. For analyzing the opinion data, MOS(mean opinion score) method and Cumulative Probability Curve method are often used. The methods are based on the scoring method, and therefore, have the intrinsic deficiency due to the assignment of arbitrary scores. In this paper, we propose an analysis method of the opinion data using logit models which can be used to analyze the ordinal categorical data without assigning arbitrary scores to customers' opinion, and develop an analysis procedure considering the usage of procedures provided by SAS(Statistical Analysis System) statistical package. By the proposed method, we can estimate the relationship between customer satisfaction and network performance parameters, and provide guidelines for network planning. In addition, the proposed method is compared with Cumulative Probability Curve method with respect to prediction errors.

Graphical Methods for Hierarchical Log-Linear Models

  • Hong, Chong-Sun;Lee, Ui-Ki
    • Communications for Statistical Applications and Methods
    • /
    • v.13 no.3
    • /
    • pp.755-764
    • /
    • 2006
  • Most graphical methods for categorical data can describe the structure of data and represent a measure of association among categorical variables. Among them the polyhedron plot represents sequential relationships among hierarchical log-linear models for a multidimensional contingency table. This kind of plot could be explored to describe the differences among sequential models. In this paper we suggest graphical methods, containing all the information, that reflect the relationship among all log-linear models in a certain hierarchical structure. We use the ideas of a correlation diagram.

Initial Value Selection in Applying an EM Algorithm for Recursive Models of Categorical Variables

  • Jeong, Mi-Sook;Kim, Sung-Ho;Jeong, Kwang-Mo
    • Journal of the Korean Statistical Society
    • /
    • v.27 no.1
    • /
    • pp.25-55
    • /
    • 1998
  • Maximum likelihood estimates (MLEs) for recursive models of categorical variables are discussed under an EM framework. Since MLEs by EM often depend on the choice of the initial values for MLEs, we explore reasonable rules for selecting the initial values for EM. Simulation results strongly support the proposed rules.

  • PDF