• Title/Summary/Keyword: categorical data analysis

Search Result 195, Processing Time 0.024 seconds

Empirical Analysis on Rao-Scott First Order Adjustment for Two Population Homogeneity test Based on Stratified Three-Stage Cluster Sampling with PPS

  • Heo, Sunyeong
    • Journal of Integrative Natural Science
    • /
    • v.7 no.3
    • /
    • pp.208-213
    • /
    • 2014
  • National-wide and/or large scale sample surveys generally use complex sample design. Traditional Pearson chi-square test is not appropriate for the categorical complex sample data. Rao-Scott suggested an adjustment method for Pearson chi-square test, which uses the average of eigenvalues of design matrix of cell probabilities. This study is to compare the efficiency of Rao-Scott first order adjusted test to Wald test for homogeneity between two populations using 2009 Gyeongnam regional education offices's customer satisfaction survey (2009 GREOCSS) data. The 2009 GREOCSS data were collected based on stratified three-stage cluster sampling with probability proportional to size. The empirical results show that the Rao-Scott adjusted test statistic using only the variances of cell probabilities is very close to the Wald test statistic, which uses the covariance matrix of cell probabilities, under the 2009 GREOCSS data based. However it is necessary to be cautious to use the Rao-Scott first order adjusted test statistic in the place of Wald test because its efficiency is decreasing as the relative variance of eigenvalues of the design matrix of cell probabilities is increasing, specially more when the number of degrees of freedom is small.

More Efficient k-Modes Clustering Algorithm

  • Kim, Dae-Won;Chae, Yi-Geun
    • Journal of the Korean Data and Information Science Society
    • /
    • v.16 no.3
    • /
    • pp.549-556
    • /
    • 2005
  • A hard-type centroids in the conventional clustering algorithm such as k-modes algorithm cannot keep the uncertainty inherently in data sets as long as possible before actual clustering(decision) are made. Therefore, we propose the k-populations algorithm to extend clustering ability and to heed the data characteristics. This k-population algorithm as found to give markedly better clustering results through various experiments.

  • PDF

Poll System using E-mails

  • Kim, Yon Hyong;Oh, Min Gweon
    • Communications for Statistical Applications and Methods
    • /
    • v.8 no.3
    • /
    • pp.767-775
    • /
    • 2001
  • In this paper we propose a poll system using e-mail. This system expects to increase the response ratio because of including a questionnaire inner e-mail. Especially, this system automatically provides a general paper which is a result of categorical data analysis.

  • PDF

Categorical Data Analysis by Using Spatial Scan Statistics and Echelon Analysis

  • Mun, Seung-Ho;Sin, Jae-Gyeong
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 2004.04a
    • /
    • pp.183-194
    • /
    • 2004
  • 본 연구에서는 공간 검색 통계량(spatial scan statistics)과 에셜론 해석법을 이용한 범주형 자료분석을 다룬다. 이를 위해 우선, 에셜론 덴드로그램을 이용하여 주어진 분활표의 계층적 구조(hierarchical structure)를 결정하고서 이로부터 핫스팟(hotspot)의 후보를 검출한다. 다음으로 우도비(likelihood ratio)를 기초로 유의하게 높거나 낮게 나타나는 지역에 대한 공간 검색 통계량을 산출한다. 마지막으로, 이 통계량을 바탕으로 핫스팟을 검출한다.

  • PDF

Use of the Quantitatively Transformed Field Soil Structure Description of the US National Pedon Characterization Database to Improve Soil Pedotransfer Function

  • Yoon, Sung-Won;Gimenez, Daniel;Nemes, Attila;Chun, Hyen-Chung;Zhang, Yong-Seon;Sonn, Yeon-Kyu;Kang, Seong-Soo;Kim, Myung-Sook;Kim, Yoo-Hak;Ha, Sang-Keun
    • Korean Journal of Soil Science and Fertilizer
    • /
    • v.44 no.5
    • /
    • pp.944-958
    • /
    • 2011
  • Soil hydraulic properties such as hydraulic conductivity or water retention which are costly to measure can be indirectly generated by soil pedotransfer function (PTF) using easily obtainable soil data. The field soil structure description which is routinely recorded could also be used in PTF as an input to reduce the uncertainty. The purposes of this study were to use qualitative morphological soil structure descriptions and soil structural index into PTF and to evaluate their contribution in the prediction of soil hydraulic properties. We transformed categorical morphological descriptions of soil structure into quantitative values using categorical principal component analysis (CATPCA). This approach was tested with a large data set from the US National Pedon Characterization database with the aid of a categorical regression tree analysis. Six different PTFs were used to predict the saturated hydraulic conductivity and those results were averaged to quantify the uncertainty. Quantified morphological description was successively used in multiple linear regression approach to predict the averaged ensemble saturated conductivity. The selected stepwise regression model with only the transformed morphological variables and structural index as predictors predicted the $K_{sat}$ with $r^2$ = 0.48 (p = 0.018), indicating the feasibility of CATPCA approach. In a regression tree analysis, soil structure index and soil texture turned out to be important factors in the prediction of the hydraulic properties. Among structural descriptions size class turned out to be an important grouping parameter in the regression tree. Bulk density, clay content, W33 and structural index explained clusters selected by a two step clustering technique, implying the morphologically described soil structural features are closely related to soil physical as well as hydraulic properties. Although this study provided relatively new method which related soil structure description to soil structure index, the same approach should be tested using a datasets containing the actual measurement of hydraulic properties. More insight on the predictive power of soil structure index to estimate hydraulic properties would be achieved by considering measured the saturated hydraulic conductivity and the soil water retention.

Chi-squared Tests for Homogeneity based on Complex Sample Survey Data Subject to Misclassification Error

  • Heo, Sunyeong
    • Communications for Statistical Applications and Methods
    • /
    • v.9 no.3
    • /
    • pp.853-864
    • /
    • 2002
  • In the analysis of categorical data subject to misclassification errors, the observed cell proportions are adjusted by a misclassification probabilities and estimates of variances are adjusted accordingly. In this case, it is important to determine the extent to which misclassification probabilities are homogeneous within a population. This paper considers methods to evaluate the power of chi-squared tests for homogeneity with complex survey data subject to misclassification errors. Two cases are considered: adjustment with homogeneous misclassification probabilities; adjustment with heterogeneous misclassification probabilities. To estimate misclassification probabilities, logistic regression method is considered.

Monitoring social networks based on transformation into categorical data

  • Lee, Joo Weon;Lee, Jaeheon
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.4
    • /
    • pp.487-498
    • /
    • 2022
  • Social network analysis (SNA) techniques have recently been developed to monitor and detect abnormal behaviors in social networks. As a useful tool for process monitoring, control charts are also useful for network monitoring. In this paper, the degree and closeness centrality measures, in which each has global and local perspectives, respectively, are applied to an exponentially weighted moving average (EWMA) chart and a multinomial cumulative sum (CUSUM) chart for monitoring undirected weighted networks. In general, EWMA charts monitor only one variable in a single chart, whereas multinomial CUSUM charts can monitor a categorical variable, in which several variables are transformed through classification rules, in a single chart. To monitor both degree centrality and closeness centrality simultaneously, we categorize them based on the average of each measure and then apply to the multinomial CUSUM chart. In this case, the global and local attributes of the network can be monitored simultaneously with a single chart. We also evaluate the performance of the proposed procedure through a simulation study.

Implementing an Analysis System for Housing Business Based on Seoul Apartment Price Data (주택 사업 분석 시스템 구축 : 서울지역 아파트 가격 데이터를 중심으로)

  • 김태훈;이희석;김재윤;전진오;이은식
    • The Journal of Information Technology and Database
    • /
    • v.6 no.2
    • /
    • pp.115-130
    • /
    • 1999
  • The price structure of housing market varies depending upon market price policy rather than low or high price policy because of IMF. The object of this study is to develop an analysis system for analyzing housing market and its demand. The analysis system consists of four major categories: macro index analysis, market decision analysis, housing market analysis, and consumer analysis. We model each category by using a variety of techniques such as generalized linear model, categorical analysis, bubble analysis, drill-down analysis, price sensitivity meter analysis, optimum price index analysis, profit index measurement analysis, correspondence analysis, conjoint analysis, and multidimensional scaling analysis. Seoul apartment data is analyzed to demonstrate the practical usefulness of the system.

  • PDF

A generalized model for categorical data from epidemiological studies (질병의 범주적 자료에 대한 통계적 분석모형)

  • 최재성
    • The Korean Journal of Applied Statistics
    • /
    • v.9 no.1
    • /
    • pp.1-15
    • /
    • 1996
  • This paper discusses the effectiveness of an infection rate under a certain disease on an immunity rate by a protective inoculation. A sequence of dependense models concerning the infection rate is derived by defining conditionally nested binary random variables for the analysis of polytomous data with hierarchical response scale. Maximum likelihood estimates based on the marginal log-likelihood functin are obtained numerically in the Nelder and Mead's(1965) simplex method.

  • PDF

Analysis of Public System's Quality and User Behavior Using PLS-MGA Methodology : An Institutional Perspective (PLS-MGA 방법론을 활용한 제도론적 관점에서의 공공제도 품질과 사용자 행태의 분석)

  • Lee, Jae Yul;Hwang, Seung-June
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.40 no.2
    • /
    • pp.78-91
    • /
    • 2017
  • In this study, we conducted a comparative study on user's perception and behavior on public system service (PSS) using institutionalism theory and MGA (multi-group analysis) methodology. In particular, this study focuses on how institutional isomorphism is applied to public system services and how MGA can be implemented correctly in a variance based SEM (structural equation model) such as PLS (partial least square). A data set of 496 effective responses was collected from pubic system users and an empirical research was conducted using three segmented models categorized by public proximity theory (public firms = 113, government contractors = 210, private contractors = 173). For rigorous group comparisons, each model was estimated by the same indicators and approaches. PLS-SEM was used in testing research hypotheses, followed by parametric and non-parametric PLS-MGA procedures in testing categorical moderation effects. This study applied novel procedures for testing composite measurement invariance prior to multi-group comparisons. The following main results and implications are drawn : 1) Partial measurement invariance was established. Multi-group analysis can be done by decomposed models although data can not be pooled for one integrated model. 2) Multi-group analysis using various approaches showed that proximity to public sphere moderated some hypothesized paths from quality dimensions to user satisfaction, which means that categorical moderating effects were partially supported. 3) Careful attention should be given to the selection of statistical test methods and the interpretation of the results of multi-group analysis, taking into account the different outcomes of the PLS-MGA test methods and the low statistical power of the moderating effect. It is necessary to use various methods such as comparing the difference in the path coefficient significance and the significance of the path coefficient difference between the groups. 4) Substantial differences in the perceptions and behaviors of PSS users existed according to proximity to public sphere, including the significance of path coefficients, mediation and categorical moderation effects. 5) The paper also provides detailed analysis and implication from a new institutional perspective. This study using a novel and appropriate methodology for performing group comparisons would be useful for researchers interested in comparative studies employing institutionalism theory and PLS-SEM multi-group analysis technique.