• 제목/요약/키워드: categorical data analysis

검색결과 195건 처리시간 0.025초

상관분석을 응용한 산업재해 사례요인의 고찰 (A Study on Industrial Accident Cases by an Application of Correlation Analysis)

  • 정국삼;홍광수
    • 한국안전학회지
    • /
    • 제14권1호
    • /
    • pp.141-149
    • /
    • 1999
  • At present time, industrial accidents statistics are used as the basic data of the policy to prevent industrial accidents and the plan to applicate the industrial accident insurance. But this statistical data is not sufficient for the effective safety management because it is the expression of the itemized distribution and the frequency for the whole cases. This study tried to correlational analysis for each causes by defining investigational items as their accident parameters. The correlational analysis, between the unsafe action and status and their relational causes, was performed to analyze the occurrence causes of industrial accident. And to assume the severity of accident, the correlativity and independency between causes and direct causes which are defined hospital days subordinate parameter were analyzed. In addition, this study expressed numerically the effectiveness of subordinate parameters depended on the level of independent parameter by presenting the predictive model between dependent parameter and independent parameter, which have the categorical parameter, through the Logit analysis method.

  • PDF

이미지 검색 과정에 나타난 질의 전환 및 재구성 패턴에 관한 연구 (Examining Categorical Transition and Query Reformulation Patterns in Image Search Process)

  • 정은경;윤정원
    • 정보관리학회지
    • /
    • 제27권2호
    • /
    • pp.37-60
    • /
    • 2010
  • 이 연구는 이미지 특성 범주와 관련하여 질의 재구성 패턴을 탐색하고자 하였다. 이러한 연구 목적을 수행하기 위해서 Excite 웹검색 엔진 로그 데이터가 사용되었으며, 총 592 세션과 2,445 질의어가 분석되었다. 데이터 분석은 Batley의 정보 형태 구분과 선행 연구에서 밝혀진 팻싯과 서브팻싯을 활용하여 수행되었다. 분석결과는 두가지 형태로 구분하여 제시되었다. 첫째, 질의 재구성에 관한 분석결과이다. 질의 분석 결과, 가장 많은 부분을 차지하는 범주는 특정어(specific)와 지칭어(nameable)이며, 이러한 경향은 다양한 정보 탐색 단계에서도 지속적으로 나타났다. 둘째, 질의 재구성 패턴과 관려하여, 평행이동이 가장 많이 나타났으며, 이러한 경향은 최초 혹은 직전 질의 범주에 따라 근소한 차이를 보였다. 범주 전환 분석에서는 높은 비율(60%-80%)로 검색 질의의 범주가 지속적으로 동일한 범주에 머무르는 경향을 밝혀내었다. 이러한 결과는 이미지 검색 시스템 설계와 구현에 있어서, 이용자의 질의 선정 과정에 도움을 제공하고 효과적인 시소러스 구축 등에 활용될 수 있을 것으로 기대된다.

Estimation of Log-Odds Ratios for Incomplete $2{\times}2$ Tables with Covariates using FEFI

  • Kang, Shin-Soo;Bae, Je-Min
    • Journal of the Korean Data and Information Science Society
    • /
    • 제18권1호
    • /
    • pp.185-194
    • /
    • 2007
  • The information of covariates are available to do fully efficient fractional imputation(FEFI). The new method, FEFI with logistic regression is proposed to construct complete contingency tables. Jackknife method is used to get a standard errors of log-odds ratio from the completed table by the new method. Simulation results, when covariates have more information about categorical variables, reveal that the new method provides more efficient estimates of log-odds ratio than either multiple imputation(MI) based on data augmentation or complete case analysis.

  • PDF

Input Variable Importance in Supervised Learning Models

  • Huh, Myung-Hoe;Lee, Yong Goo
    • Communications for Statistical Applications and Methods
    • /
    • 제10권1호
    • /
    • pp.239-246
    • /
    • 2003
  • Statisticians, or data miners, are often requested to assess the importances of input variables in the given supervised learning model. For the purpose, one may rely on separate ad hoc measures depending on modeling types, such as linear regressions, the neural networks or trees. Consequently, the conceptual consistency in input variable importance measures is lacking, so that the measures cannot be directly used in comparing different types of models, which is often done in data mining processes, In this short communication, we propose a unified approach to the importance measurement of input variables. Our method uses sensitivity analysis which begins by perturbing the values of input variables and monitors the output change. Research scope is limited to the models for continuous output, although it is not difficult to extend the method to supervised learning models for categorical outcomes.

수학교육에서 질적(Qualitative) 연구 방법 (Qualitative Research Method in Mathematics Education)

  • 이중권
    • 한국수학교육학회지시리즈A:수학교육
    • /
    • 제42권2호
    • /
    • pp.111-119
    • /
    • 2003
  • This research discussed a general concept on the qualitative research methods in mathematics education. It provided a classification of research methods in mathematics education. It also described research trends in mathematics education. It addressed how research design facilitates formulating a research problem, selecting a research design, choosing who and what to study, deciding how to approach Participants, selecting means to collect data choosing how to analyzing data, and interpreting data and applying the analysis. This study addressed the issues involved in choosing relevant populations and in selecting and sampling qualitative data. It described how populations are conceptualized and distinguished between probability sampling and criterion based selection. It discussed not only data arrangement such as, cross-sectional and categorical indexing, non-cross- sectional data organization, but also diagram flow chart matrix, cognitive map, family tree to facilitate analyzing data.

  • PDF

대학생의 물질주의 가치관에 대한 연구 (A Study on Materialism of University Students)

  • 송순;신현실
    • 한국생활과학회지
    • /
    • 제11권3호
    • /
    • pp.223-235
    • /
    • 2002
  • The purpose of this study was to examine the influences of the materialism of university students. The data were collected for 331 university students. The data were analyzed by the package of SPSS program. The methods of analyses included basic descriptive categorical analysis (frequencies, means, percentages) as well as t-test, one way ANOVA, and multiple regressions. To summarize major findings from the analysis: (1) A significant difference was found in the materialism of university students by the socio-economic variables such as the amount of pocket money. (2) A significant difference was found in the materialism of university students by more self-esteem than life satisfaction. (3) A significant difference was found in the materialism of university students by parent's materialism and competitive achievement pressure. (4) According to the multiple regression analysis, it was found that the materialism of university students was influenced by the order of self-esteem, parent's materialism and competitive achievement pressure.

  • PDF

다목적 다변량 자료분석을 위한 변수선택 (Variable Selection for Multi-Purpose Multivariate Data Analysis)

  • 허명회;임용빈;이용구
    • 응용통계연구
    • /
    • 제21권1호
    • /
    • pp.141-149
    • /
    • 2008
  • 다변량 자료분석에서 최근의 추세는 관측개체의 수 n이 커지는 외에 변수의 수 p가 큰사례들이 많아지고 있다는 것이다. n개 개체 각각에서 획득된 p개 변수들 $X_1$, $X_2$, $\ldots$, $X_p$ 가운데는 이름이나 개념적으로는 구분이 가능하지 만 실제로 거의 중복이 되는 변수들이 있을 수 있는데, 이들 변수들이 모두 분석에 포함되면 여러 문제가 유발될 수 있다. 예컨대 주성분 분석이나 인자분석에서는 중복 변수들이 주축(主軸, principal axis) 결정에, 관측개체 군집 화에서는 개체간 거리 산출에 왜곡된 영향을 줄 수 있다. 또한 목적변수가 지정된 지도학습(supervised learning)에서 설명변수들의 중복성은 추정모형의 안정성을 해치는 결과를 초래한다. 실제 자료 분석에서는 한 자료 세트가 여러 기법으로 탐색되고 다수의 모형이 추출되므로 변수세트를 최대한 절약적(parsimonious)으로 구성할 필요가 있다. 본 연구의 목적은 $X_1$, $X_2$, $\ldots$, $X_p$ 중에서 필요한 변수들은 선적하고 불필요한 변수들은 제거함으로써 주어진 변수세트를 보다 적은 크기의 변수세트로 대치하는 방법을 제시하는 데 있다. 제안 방법을 몇 개의 수치적 사례에 적용해 봄으로써 선적 변수와 제거변수간 관계의 시각화, 회귀모형에서의 유용성, 범주형 자료분석에서의 활용 등에 대해 논의 하고자 한다.

Genetic parameters for marbling and body score in Anglonubian goats using Bayesian inference via threshold and linear models

  • Figueiredo Filho, Luiz Antonio Silva;Sarmento, Jose Lindenberg Rocha;Campelo, Jose Elivalto Guimaraes;de Oliveira Almeida, Marcos Jacob;de Sousa, Antonio Junior;da Silva Santos, Natanael Pereira;da Silva Costa, Marcio;Torres, Tatiana Saraiva;Sena, Luciano Silva
    • Asian-Australasian Journal of Animal Sciences
    • /
    • 제31권9호
    • /
    • pp.1407-1414
    • /
    • 2018
  • Objective: The aim of this study was to estimate (co) variance components and genetic parameters for categorical carcass traits using Bayesian inference via mixed linear and threshold animal models in Anglonubian goats. Methods: Data were obtained from Anglonubian goats reared in the Brazilian Mid-North region. The traits in study were body condition score, marbling in the rib eye, ribeye area, fat thickness of the sternum, hip height, leg perimeter, and body weight. The numerator relationship matrix contained information from 793 animals. The single- and two-trait analyses were performed to estimate (co) variance components and genetic parameters via linear and threshold animal models. For estimation of genetic parameters, chains with 2 and 4 million cycles were tested. An 1,000,000-cycle initial burn-in was considered with values taken every 250 cycles, in a total of 4,000 samples. Convergence was monitored by Geweke criteria and Monte Carlo error chain. Results: Threshold model best fits categorical data since it is more efficient to detect genetic variability. In two-trait analysis the contribution of the increase in information and the correlations between traits contributed to increase the estimated values for (co) variance components and heritability, in comparison to single-trait analysis. Heritability estimates for the study traits were from low to moderate magnitude. Conclusion: Direct selection of the continuous distribution of traits such as thickness sternal fat and hip height allows obtaining the indirect selection for marbling of ribeye.

Test of Homogeneity Baseon Complex Survey Data : Discussion Based on Power of Test

  • Heo, Sun-Yeong;Yi, Su-Cheol
    • Journal of the Korean Data and Information Science Society
    • /
    • 제16권3호
    • /
    • pp.609-620
    • /
    • 2005
  • In the secondary data analysis for categorical data, situations often arise in which the estimated cell variances are available, but not the full matrix of variances. In this case researchers are often inclined to use Pearson-type test statistics for homogeneity. However, for a complex sample observed cell proportions are not distributed as multinomial and Pearson-type test statistic generally is not distributed asymptotically as chi-square distribution. This paper evaluates powers for Wald test and Pearson-type test and the first order corrected test of Pearson-type test for homogeneity. The resulting power curves indicate that as the misspecification effect increases, the amount of inflation of significance level and the loss of power Pearson-type test are getting more severe.

  • PDF

Negative binomial loglinear mixed models with general random effects covariance matrix

  • Sung, Youkyung;Lee, Keunbaik
    • Communications for Statistical Applications and Methods
    • /
    • 제25권1호
    • /
    • pp.61-70
    • /
    • 2018
  • Modeling of the random effects covariance matrix in generalized linear mixed models (GLMMs) is an issue in analysis of longitudinal categorical data because the covariance matrix can be high-dimensional and its estimate must satisfy positive-definiteness. To satisfy these constraints, we consider the autoregressive and moving average Cholesky decomposition (ARMACD) to model the covariance matrix. The ARMACD creates a more flexible decomposition of the covariance matrix that provides generalized autoregressive parameters, generalized moving average parameters, and innovation variances. In this paper, we analyze longitudinal count data with overdispersion using GLMMs. We propose negative binomial loglinear mixed models to analyze longitudinal count data and we also present modeling of the random effects covariance matrix using the ARMACD. Epilepsy data are analyzed using our proposed model.