• 제목/요약/키워드: 범주형 자료분석

Search Result 176, Processing Time 0.019 seconds

Model selection method for categorical data with non-response (무응답을 가지고 있는 범주형 자료에 대한 모형 선택 방법)

  • Yoon, Yong-Hwa;Choi, Bo-Seung
    • Journal of the Korean Data and Information Science Society
    • /
    • v.23 no.4
    • /
    • pp.627-641
    • /
    • 2012
  • We consider a model estimation and model selection methods for the multi-way contingency table data with non-response or missing values. We also consider hierarchical Bayesian model in order to handle a boundary solution problem that can happen in the maximum likelihood estimation under non-ignorable non-response model and we deal with a model selection method to find the best model for the data. We utilized Bayes factors to handle model selection problem under Bayesian approach. We applied proposed method to the pre-election survey for the 2004 Korean National Assembly race. As a result, we got the non-ignorable non-response model was favored and the variable of voting intention was most suitable.

Effect of complex sample design on Pearson test statistic for homogeneity (복합표본자료에서 동질성검정을 위한 피어슨 검정통계량의 효과)

  • Heo, Sun-Yeong;Chung, Young-Ae
    • Journal of the Korean Data and Information Science Society
    • /
    • v.23 no.4
    • /
    • pp.757-764
    • /
    • 2012
  • This research is for comparison of test statistics for homogeneity when the data is collected based on complex sample design. The survey data based on complex sample design does not satisfy the condition of independency which is required for the standard Pearson multinomial-based chi-squared test. Today, lots of data sets ara collected by complex sample designs, but the tests for categorical data are conducted using the standard Pearson chi-squared test. In this study, we compared the performance of three test statistics for homogeneity between two populations using data from the 2009 customer satisfaction evaluation survey to the service from Gyeongsangnam-do regional offices of education: the standard Pearson test, the unbiasedWald test, and the Pearsontype test with survey-based point estimates. Through empirical analyses, we fist showed that the standard Pearson test inflates the values of test statistics very much and the results are not reliable. Second, in the comparison of Wald test and Pearson-type test, we find that the test results are affected by the number of categories, the mean and standard deviation of the eigenvalues of design matrix.

Methods for Genetic Parameter Estimations of Carcass Weight, Longissimus Muscle Area and Marbling Score in Korean Cattle (한우의 도체중, 배장근단면적 및 근내지방도의 유전모수 추정방법)

  • Lee, D.H.
    • Journal of Animal Science and Technology
    • /
    • v.46 no.4
    • /
    • pp.509-516
    • /
    • 2004
  • This study is to investigate the amount of biased estimates for heritability and genetic correlation according to data structure on marbling scores in Korean cattle. Breeding population with 5 generations were simulated by way of selection for carcass weight, Longissimus muscle area and latent values of marbling scores and random mating. Latent variables of marbling scores were categorized into five by the thresholds of 0, I, 2, and 3 SD(DSI) or seven by the thresholds of -2, -1, 0,1I, 2, and 3 SD(DS2). Variance components and genetic pararneters(Heritabilities and Genetic correlations) were estimated by restricted maximum likelihood on multivariate linear mixed animal models and by Gibbs sampling algorithms on multivariate threshold mixed animal models in DS1 and DS2. Simulation was performed for 10 replicates and averages and empirical standard deviation were calculated. Using REML, heritabilitis of marbling score were under-estimated as 0.315 and 0.462 on DS1 and DS2, respectively, with comparison of the pararneter(0.500). Otherwise, using Gibbs sampling in the multivariate threshold animal models, these estimates did not significantly differ to the parameter. Residual correlations of marbling score to other traits were reduced with comparing the parameters when using REML algorithm with assuming linear and normal distribution. This would be due to loss of information and therefore, reduced variation on marbling score. As concluding, genetic variation of marbling would be well defined if liability concepts were adopted on marbling score and implemented threshold mixed model on genetic parameter estimation in Korean cattle.

Processes of Voluntary Services Delivered by Korean Undergraduates: An Approach Based on the Grounded Theory (대학생의 자발적 봉사활동에 대한 질적 연구: 근거이론을 중심으로)

  • Hu, Sungho;Jung, Taeyun
    • Korean Journal of Culture and Social Issue
    • /
    • v.17 no.3
    • /
    • pp.287-304
    • /
    • 2011
  • The Purpose of this study is to understand phases and paradigms related to voluntary services offered by undergraduates and the processes in which voluntary services are implemented. For this, interviews for 23(men: 10, women: 13) undergraduates were conducted from Aug., 2008 to Apr., 2009 were conducted and the data collected from those interviews were analyzed on the basis of the Grounded Theory. Main analysis procedure is known as codings(open coding, axial coding, selective coding). This analyses produced 119 concepts, 41 subcategories, and 16 categories in open coding. Then, axial coding was conducted to organize the basic framework of generic relationships among psychological motivation, social context, personal perception, practical action, psychological response, and psychological consequence. Core essence is "Volunteer types are categorized simple practice type, self-serving type, and community type." Finally, undergraduate volunteers were explained in 3 types(simple practice, self-serving, and community) on the basis of paradigms. These results were discussed in terms of further research and limitation.

  • PDF

The Transform of Multidimensional Categorical Data and its Applications (다차원 범주형 자료의 변환과 그의 응용)

  • Ahn, Ju-Sun
    • The Korean Journal of Applied Statistics
    • /
    • v.20 no.3
    • /
    • pp.585-595
    • /
    • 2007
  • The squared Euclid distance of the values which is transformed by P-matrix of Ahn et al. (2003) is in proportion to the squared Euclid distance of cell's relative frequencies in two Contingency Tables. We propose the method of using the PP-values for the analysis of modern poems and questionnaire data.

A Study on Causes of Industrial Accident Cases by a Categorical Analysis (범주형 분석에 의한 산업재해사례 요인의 고찰)

  • 지경택;송영호;정국삼
    • Proceedings of the Korean Institute of Industrial Safety Conference
    • /
    • 1998.11a
    • /
    • pp.199-204
    • /
    • 1998
  • 우리나라의 산업재해통계는 산업재해의 규모 및 원인 등의 분포상태와 근로자에 대한 특성 등을 파악하여 산업재해 예방정책 및 산업재해 보상 보험 운용 방침 수립의 기초 자료로 사용되고 있다. 그런데, 우리나라의 현행 산업재해 통계 산출 방법은 산업재해보험 가입 사업장의 재해자가 제출한 요양신청서 중 업무상 재해로 인정된 재해만을 대상으로 통계를 산출하는 것이다. (중략)

  • PDF

LAD Estimators for Categorical Data Analysis (범주형 자료 분석을 위한 LAD 추정량)

  • 최현집
    • The Korean Journal of Applied Statistics
    • /
    • v.16 no.1
    • /
    • pp.55-69
    • /
    • 2003
  • In this article, we propose the weighted LAD (least absolute deviations) estimators for multi-dimensional contingency tables and drive an estimation method to estimate the proposed estimators. To illustrate the robustness of the estimators, simulation results are presented for several models Including log-linear models and models for ordinal variables in multidimensional contingency tables. Examples were also introduced.

Error cause analysis of Pearson test statistics for k-population homogeneity test (k-모집단 동질성검정에서 피어슨검정의 오차성분 분석에 관한 연구)

  • Heo, Sunyeong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.4
    • /
    • pp.815-824
    • /
    • 2013
  • Traditional Pearson chi-squared test is not appropriate for the data collected by the complex sample design. When one uses the traditional Pearson chi-squared test to the complex sample categorical data, it may give wrong test results, and the error may occur not only due to the biased variance estimators but also due to the biased point estimators of cell proportions. In this study, the design based consistent Wald test statistics was derived for k-population homogeneity test, and the traditional Pearson chi-squared test statistics was partitioned into three parts according to the causes of error; the error due to the bias of variance estimator, the error due to the bias of cell proportion estimator, and the unseparated error due to the both bias of variance estimator and bias of cell proportion estimator. An analysis was conducted for empirical results of the relative size of each error component to the Pearson chi-squared test statistics. The second year data from the fourth Korean national health and nutrition examination survey (KNHANES, IV-2) was used for the analysis. The empirical results show that the relative size of error from the bias of variance estimator was relatively larger than the size of error from the bias of cell proportion estimator, but its degrees were different variable by variable.

A generalized logit model with mixed effects for categorical data (다가자료에 대한 혼합효과모형)

  • 최재성
    • The Korean Journal of Applied Statistics
    • /
    • v.15 no.1
    • /
    • pp.129-137
    • /
    • 2002
  • This paper suggests a generalized logit model with mixed effects for analysing frequency data in multi-contingency table. In this model nominal response variable is assumed to be polychotomous. When some factors are fixed but considered as ordinal and others are random, this paper shows how to use baseline-category logits to incoporate the mixed-effects of those factors into the model. A numerical algorithm was used to estimate model parameters by using marginal log-likelihood.