• Title/Summary/Keyword: categorical data analysis

Search Result 195, Processing Time 0.018 seconds

A Study on Industrial Accident Cases by an Application of Correlation Analysis (상관분석을 응용한 산업재해 사례요인의 고찰)

  • 정국삼;홍광수
    • Journal of the Korean Society of Safety
    • /
    • v.14 no.1
    • /
    • pp.141-149
    • /
    • 1999
  • At present time, industrial accidents statistics are used as the basic data of the policy to prevent industrial accidents and the plan to applicate the industrial accident insurance. But this statistical data is not sufficient for the effective safety management because it is the expression of the itemized distribution and the frequency for the whole cases. This study tried to correlational analysis for each causes by defining investigational items as their accident parameters. The correlational analysis, between the unsafe action and status and their relational causes, was performed to analyze the occurrence causes of industrial accident. And to assume the severity of accident, the correlativity and independency between causes and direct causes which are defined hospital days subordinate parameter were analyzed. In addition, this study expressed numerically the effectiveness of subordinate parameters depended on the level of independent parameter by presenting the predictive model between dependent parameter and independent parameter, which have the categorical parameter, through the Logit analysis method.

  • PDF

Examining Categorical Transition and Query Reformulation Patterns in Image Search Process (이미지 검색 과정에 나타난 질의 전환 및 재구성 패턴에 관한 연구)

  • Chung, Eun-Kyung;Yoon, Jung-Won
    • Journal of the Korean Society for information Management
    • /
    • v.27 no.2
    • /
    • pp.37-60
    • /
    • 2010
  • The purpose of this study is to investigate image search query reformulation patterns in relation to image attribute categories. A total of 592 sessions and 2,445 queries from the Excite Web search engine log data were analyzed by utilizing Batley's visual information types and two facets and seven sub-facets of query reformulation patterns. The results of this study are organized with two folds: query reformulation and categorical transition. As the most dominant categories of queries are specific and general/nameable, this tendency stays over various search stages. From the perspective of reformulation patterns, while the Parallel movement is the most dominant, there are slight differences depending on initial or preceding query categories. In examining categorical transitions, it was found that 60-80% of search queries were reformulated within the same categories of image attributes. These findings may be applied to practice and implementation of image retrieval systems in terms of assisting users' query term selection and effective thesauri development.

Estimation of Log-Odds Ratios for Incomplete $2{\times}2$ Tables with Covariates using FEFI

  • Kang, Shin-Soo;Bae, Je-Min
    • Journal of the Korean Data and Information Science Society
    • /
    • v.18 no.1
    • /
    • pp.185-194
    • /
    • 2007
  • The information of covariates are available to do fully efficient fractional imputation(FEFI). The new method, FEFI with logistic regression is proposed to construct complete contingency tables. Jackknife method is used to get a standard errors of log-odds ratio from the completed table by the new method. Simulation results, when covariates have more information about categorical variables, reveal that the new method provides more efficient estimates of log-odds ratio than either multiple imputation(MI) based on data augmentation or complete case analysis.

  • PDF

Input Variable Importance in Supervised Learning Models

  • Huh, Myung-Hoe;Lee, Yong Goo
    • Communications for Statistical Applications and Methods
    • /
    • v.10 no.1
    • /
    • pp.239-246
    • /
    • 2003
  • Statisticians, or data miners, are often requested to assess the importances of input variables in the given supervised learning model. For the purpose, one may rely on separate ad hoc measures depending on modeling types, such as linear regressions, the neural networks or trees. Consequently, the conceptual consistency in input variable importance measures is lacking, so that the measures cannot be directly used in comparing different types of models, which is often done in data mining processes, In this short communication, we propose a unified approach to the importance measurement of input variables. Our method uses sensitivity analysis which begins by perturbing the values of input variables and monitors the output change. Research scope is limited to the models for continuous output, although it is not difficult to extend the method to supervised learning models for categorical outcomes.

Qualitative Research Method in Mathematics Education (수학교육에서 질적(Qualitative) 연구 방법)

  • 이중권
    • The Mathematical Education
    • /
    • v.42 no.2
    • /
    • pp.111-119
    • /
    • 2003
  • This research discussed a general concept on the qualitative research methods in mathematics education. It provided a classification of research methods in mathematics education. It also described research trends in mathematics education. It addressed how research design facilitates formulating a research problem, selecting a research design, choosing who and what to study, deciding how to approach Participants, selecting means to collect data choosing how to analyzing data, and interpreting data and applying the analysis. This study addressed the issues involved in choosing relevant populations and in selecting and sampling qualitative data. It described how populations are conceptualized and distinguished between probability sampling and criterion based selection. It discussed not only data arrangement such as, cross-sectional and categorical indexing, non-cross- sectional data organization, but also diagram flow chart matrix, cognitive map, family tree to facilitate analyzing data.

  • PDF

A Study on Materialism of University Students (대학생의 물질주의 가치관에 대한 연구)

  • Song, Soon;Shin, Hyoun-Shill
    • Korean Journal of Human Ecology
    • /
    • v.11 no.3
    • /
    • pp.223-235
    • /
    • 2002
  • The purpose of this study was to examine the influences of the materialism of university students. The data were collected for 331 university students. The data were analyzed by the package of SPSS program. The methods of analyses included basic descriptive categorical analysis (frequencies, means, percentages) as well as t-test, one way ANOVA, and multiple regressions. To summarize major findings from the analysis: (1) A significant difference was found in the materialism of university students by the socio-economic variables such as the amount of pocket money. (2) A significant difference was found in the materialism of university students by more self-esteem than life satisfaction. (3) A significant difference was found in the materialism of university students by parent's materialism and competitive achievement pressure. (4) According to the multiple regression analysis, it was found that the materialism of university students was influenced by the order of self-esteem, parent's materialism and competitive achievement pressure.

  • PDF

Variable Selection for Multi-Purpose Multivariate Data Analysis (다목적 다변량 자료분석을 위한 변수선택)

  • Huh, Myung-Hoe;Lim, Yong-Bin;Lee, Yong-Goo
    • The Korean Journal of Applied Statistics
    • /
    • v.21 no.1
    • /
    • pp.141-149
    • /
    • 2008
  • Recently we frequently analyze multivariate data with quite large number of variables. In such data sets, virtually duplicated variables may exist simultaneously even though they are conceptually distinguishable. Duplicate variables may cause problems such as the distortion of principal axes in principal component analysis and factor analysis and the distortion of the distances between observations, i.e. the input for cluster analysis. Also in supervised learning or regression analysis, duplicated explanatory variables often cause the instability of fitted models. Since real data analyses are aimed often at multiple purposes, it is necessary to reduce the number of variables to a parsimonious level. The aim of this paper is to propose a practical algorithm for selection of a subset of variables from a given set of p input variables, by the criterion of minimum trace of partial variances of unselected variables unexplained by selected variables. The usefulness of proposed method is demonstrated in visualizing the relationship between selected and unselected variables, in building a predictive model with very large number of independent variables, and in reducing the number of variables and purging/merging categories in categorical data.

Genetic parameters for marbling and body score in Anglonubian goats using Bayesian inference via threshold and linear models

  • Figueiredo Filho, Luiz Antonio Silva;Sarmento, Jose Lindenberg Rocha;Campelo, Jose Elivalto Guimaraes;de Oliveira Almeida, Marcos Jacob;de Sousa, Antonio Junior;da Silva Santos, Natanael Pereira;da Silva Costa, Marcio;Torres, Tatiana Saraiva;Sena, Luciano Silva
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.31 no.9
    • /
    • pp.1407-1414
    • /
    • 2018
  • Objective: The aim of this study was to estimate (co) variance components and genetic parameters for categorical carcass traits using Bayesian inference via mixed linear and threshold animal models in Anglonubian goats. Methods: Data were obtained from Anglonubian goats reared in the Brazilian Mid-North region. The traits in study were body condition score, marbling in the rib eye, ribeye area, fat thickness of the sternum, hip height, leg perimeter, and body weight. The numerator relationship matrix contained information from 793 animals. The single- and two-trait analyses were performed to estimate (co) variance components and genetic parameters via linear and threshold animal models. For estimation of genetic parameters, chains with 2 and 4 million cycles were tested. An 1,000,000-cycle initial burn-in was considered with values taken every 250 cycles, in a total of 4,000 samples. Convergence was monitored by Geweke criteria and Monte Carlo error chain. Results: Threshold model best fits categorical data since it is more efficient to detect genetic variability. In two-trait analysis the contribution of the increase in information and the correlations between traits contributed to increase the estimated values for (co) variance components and heritability, in comparison to single-trait analysis. Heritability estimates for the study traits were from low to moderate magnitude. Conclusion: Direct selection of the continuous distribution of traits such as thickness sternal fat and hip height allows obtaining the indirect selection for marbling of ribeye.

Test of Homogeneity Baseon Complex Survey Data : Discussion Based on Power of Test

  • Heo, Sun-Yeong;Yi, Su-Cheol
    • Journal of the Korean Data and Information Science Society
    • /
    • v.16 no.3
    • /
    • pp.609-620
    • /
    • 2005
  • In the secondary data analysis for categorical data, situations often arise in which the estimated cell variances are available, but not the full matrix of variances. In this case researchers are often inclined to use Pearson-type test statistics for homogeneity. However, for a complex sample observed cell proportions are not distributed as multinomial and Pearson-type test statistic generally is not distributed asymptotically as chi-square distribution. This paper evaluates powers for Wald test and Pearson-type test and the first order corrected test of Pearson-type test for homogeneity. The resulting power curves indicate that as the misspecification effect increases, the amount of inflation of significance level and the loss of power Pearson-type test are getting more severe.

  • PDF

Negative binomial loglinear mixed models with general random effects covariance matrix

  • Sung, Youkyung;Lee, Keunbaik
    • Communications for Statistical Applications and Methods
    • /
    • v.25 no.1
    • /
    • pp.61-70
    • /
    • 2018
  • Modeling of the random effects covariance matrix in generalized linear mixed models (GLMMs) is an issue in analysis of longitudinal categorical data because the covariance matrix can be high-dimensional and its estimate must satisfy positive-definiteness. To satisfy these constraints, we consider the autoregressive and moving average Cholesky decomposition (ARMACD) to model the covariance matrix. The ARMACD creates a more flexible decomposition of the covariance matrix that provides generalized autoregressive parameters, generalized moving average parameters, and innovation variances. In this paper, we analyze longitudinal count data with overdispersion using GLMMs. We propose negative binomial loglinear mixed models to analyze longitudinal count data and we also present modeling of the random effects covariance matrix using the ARMACD. Epilepsy data are analyzed using our proposed model.