• Title/Summary/Keyword: 범주형 자료

Search Result 223, Processing Time 0.02 seconds

Bayesian Analysis of Korean Alcohol Consumption Data Using a Zero-Inflated Ordered Probit Model (영 과잉 순서적 프로빗 모형을 이용한 한국인의 음주자료에 대한 베이지안 분석)

  • Oh, Man-Suk;Oh, Hyun-Tak;Park, Se-Mi
    • The Korean Journal of Applied Statistics
    • /
    • v.25 no.2
    • /
    • pp.363-376
    • /
    • 2012
  • Excessive zeroes are often observed in ordinal categorical response variables. An ordinary ordered Probit model is not appropriate for zero-inflated data especially when there are many different sources of generating 0 observations. In this paper, we apply a two-stage zero-inflated ordered Probit (ZIOP) model which incorporate the zero-flated nature of data, propose a Bayesian analysis of a ZIOP model, and apply the method to alcohol consumption data collected by the National Bureau of Statistics, Korea. In the first stage of a ZIOP model, a Probit model is introduced to divide the non-drinkers into genuine non-drinkers who do not participate in drinking due to personal beliefs or permanent health problems and potential drinkers who did not drink at the time of the survey but have the potential to become drinkers. In the second stage, an ordered probit model is applied to drinkers that consists of zero-consumption potential drinkers and positive consumption drinkers. The analysis results show that about 30% of non-drinkers are genuine non-drinkers and hence the Korean alcohol consumption data has the feature of zero-inflated data. A study on the marginal effect of each explanatory variable shows that certain explanatory variables have effects on the genuine non-drinkers and potential drinkers in opposite directions, which may not be detected by an ordered Probit model.

The Marginal Model for Categorical Data Analysis of $3\times3$ Cross-Trials ($3\times3$ 교차실험을 범주형 자료 분석을 위한 주변확률모형)

  • 안주선
    • The Korean Journal of Applied Statistics
    • /
    • v.14 no.1
    • /
    • pp.25-37
    • /
    • 2001
  • The marginal model is proposed for the analysis of data which have c(2: 3) categories in the 3 x 3 cross-over trials with three periods and three treatments. This model could be used for the counterpart of the Kenward-Jones' joint probability one and should be the generalization of Balagtas et ai's univariate marginal logits one, which analyze the treatment effects in the 3 x 3 cross-over trials with binary response variables[Kenward and Jones(1991), Balagtas et al(1995)]. The model equations for the marginal probability are constructed by the three types of link functions. The methods would be given for making of the link function matrices and model ones, and the estimation of parameters shall be discussed. The proposed model is applied to the analysis of Kenward and Jones' data.

  • PDF

A generalized logit model with mixed effects for categorical data (다가자료에 대한 혼합효과모형)

  • 최재성
    • The Korean Journal of Applied Statistics
    • /
    • v.15 no.1
    • /
    • pp.129-137
    • /
    • 2002
  • This paper suggests a generalized logit model with mixed effects for analysing frequency data in multi-contingency table. In this model nominal response variable is assumed to be polychotomous. When some factors are fixed but considered as ordinal and others are random, this paper shows how to use baseline-category logits to incoporate the mixed-effects of those factors into the model. A numerical algorithm was used to estimate model parameters by using marginal log-likelihood.

Methods for Genetic Parameter Estimations of Carcass Weight, Longissimus Muscle Area and Marbling Score in Korean Cattle (한우의 도체중, 배장근단면적 및 근내지방도의 유전모수 추정방법)

  • Lee, D.H.
    • Journal of Animal Science and Technology
    • /
    • v.46 no.4
    • /
    • pp.509-516
    • /
    • 2004
  • This study is to investigate the amount of biased estimates for heritability and genetic correlation according to data structure on marbling scores in Korean cattle. Breeding population with 5 generations were simulated by way of selection for carcass weight, Longissimus muscle area and latent values of marbling scores and random mating. Latent variables of marbling scores were categorized into five by the thresholds of 0, I, 2, and 3 SD(DSI) or seven by the thresholds of -2, -1, 0,1I, 2, and 3 SD(DS2). Variance components and genetic pararneters(Heritabilities and Genetic correlations) were estimated by restricted maximum likelihood on multivariate linear mixed animal models and by Gibbs sampling algorithms on multivariate threshold mixed animal models in DS1 and DS2. Simulation was performed for 10 replicates and averages and empirical standard deviation were calculated. Using REML, heritabilitis of marbling score were under-estimated as 0.315 and 0.462 on DS1 and DS2, respectively, with comparison of the pararneter(0.500). Otherwise, using Gibbs sampling in the multivariate threshold animal models, these estimates did not significantly differ to the parameter. Residual correlations of marbling score to other traits were reduced with comparing the parameters when using REML algorithm with assuming linear and normal distribution. This would be due to loss of information and therefore, reduced variation on marbling score. As concluding, genetic variation of marbling would be well defined if liability concepts were adopted on marbling score and implemented threshold mixed model on genetic parameter estimation in Korean cattle.

A DB for facial expression and its user-interface (얼굴표정 DB 및 사용자 인터페이스 개발)

  • 한재현;문수종;김진관;김영아;홍상욱;심연숙;반세범;변혜란;오경자
    • Proceedings of the Korean Society for Emotion and Sensibility Conference
    • /
    • 1999.11a
    • /
    • pp.373-378
    • /
    • 1999
  • 얼굴 및 얼굴표정 연구의 기초 자료를 제공하고 실제 표정을 디자인하는 작업의 지침으로 사용되도록 하기 위하여 대규모의 표정 DB를 구축하였다. 이 DB 내에는 여러 가지 방법으로 수집된 배우 24명의 자연스럽고 다양한 표정 영상자료 약 1,500장이 저장되어 있다. 수집된 표정자료 각각에 대하여 내적상태의 범주모형과 차원모형을 모두 고려하여 다수의 사람들이 반응한 내적상태 평정 정보를 포함하도록 하였으며 사진별로 평정의 일치율을 기록함으로써 자료 이용에 참고할 수 있도록 하였다. 표정인식 및 합성 시스템에 사용될 수 있도록 각 표정자료들을 한국인 표준형 상모형에 정합하였을 때 측정된 MPEG-4 FAP 기준 39개 꼭지점들(vertices)의 좌표값들 및 표정추출의 맥락정보를 저장하였다. 실제 DB를 사용할 사람들이 가진 한정된 정보로써 전체 DB의 영상자료들을 용이하게 검색할 수 있도록 사용자 인터페이스를 개발하였다.

  • PDF

Variable Selection for Multi-Purpose Multivariate Data Analysis (다목적 다변량 자료분석을 위한 변수선택)

  • Huh, Myung-Hoe;Lim, Yong-Bin;Lee, Yong-Goo
    • The Korean Journal of Applied Statistics
    • /
    • v.21 no.1
    • /
    • pp.141-149
    • /
    • 2008
  • Recently we frequently analyze multivariate data with quite large number of variables. In such data sets, virtually duplicated variables may exist simultaneously even though they are conceptually distinguishable. Duplicate variables may cause problems such as the distortion of principal axes in principal component analysis and factor analysis and the distortion of the distances between observations, i.e. the input for cluster analysis. Also in supervised learning or regression analysis, duplicated explanatory variables often cause the instability of fitted models. Since real data analyses are aimed often at multiple purposes, it is necessary to reduce the number of variables to a parsimonious level. The aim of this paper is to propose a practical algorithm for selection of a subset of variables from a given set of p input variables, by the criterion of minimum trace of partial variances of unselected variables unexplained by selected variables. The usefulness of proposed method is demonstrated in visualizing the relationship between selected and unselected variables, in building a predictive model with very large number of independent variables, and in reducing the number of variables and purging/merging categories in categorical data.

Analysis of Large Tables (대규모 분할표 분석)

  • Choi, Hyun-Jip
    • The Korean Journal of Applied Statistics
    • /
    • v.18 no.2
    • /
    • pp.395-410
    • /
    • 2005
  • For the analysis of large tables formed by many categorical variables, we suggest a method to group the variables into several disjoint groups in which the variables are completely associated within the groups. We use a simple function of Kullback-Leibler divergence as a similarity measure to find the groups. Since the groups are complete hierarchical sets, we can identify the association structure of the large tables by the marginal log-linear models. Examples are introduced to illustrate the suggested method.

LAD Estimators for Categorical Data Analysis (범주형 자료 분석을 위한 LAD 추정량)

  • 최현집
    • The Korean Journal of Applied Statistics
    • /
    • v.16 no.1
    • /
    • pp.55-69
    • /
    • 2003
  • In this article, we propose the weighted LAD (least absolute deviations) estimators for multi-dimensional contingency tables and drive an estimation method to estimate the proposed estimators. To illustrate the robustness of the estimators, simulation results are presented for several models Including log-linear models and models for ordinal variables in multidimensional contingency tables. Examples were also introduced.

상관분석을 응용한 산업재해사례 요인의 고찰

  • 홍광수;정국삼
    • Proceedings of the Korean Institute of Industrial Safety Conference
    • /
    • 1997.11a
    • /
    • pp.331-336
    • /
    • 1997
  • 본 연구에서 산업재해 사례를 연구 대상으로 재해 발생의 여러 가지 요인들의 관련을 검토하고자 통계적 기법을 이용한 재해요인별 상관분석, 또는 영향의 정도 파악, 재해 요인의 통제에 따른 기타 재해요인에 대한 영향 분석을 시도하는 통계학적 분석 방법을 이용한 재해 발생의 중요요인을 분석하고자 첫째, 산업재해 통계 자료의 내용을 분석하여 재해 관련 변수들을 파악하는데 불안전 행동 및 불안전상태에 의한 재해 형태와 기타 변수들 간의 정성적 상관분석을 통한 상관계수를 고찰, 둘째, 명목척도인 범주형 변수 상호 간의 관련 여부를 파악하기 위해 카이제곱(chi-square)검정을 행하여 입원 일수를 종속 변수로 하는 기타 변수들의 독립성 여부와 변수 상호간 연관이 있다고 판단될 때 각 변수의 연관의 정도 비교, 셋째, 어떤 변수 상호간 일정한 관계를 가질 때 변수의 범주별로 반응변수(종속변수)에 미치는 영향을 회귀식 형태로 파악하고 비교하기 위하여 로짓(logit)모형을 적용하였다. (중략)

  • PDF

Prediction and Applicability of Snow Damage Using Random Forest (랜덤포레스트를 이용한 대설피해액 예측 및 적용성 검토)

  • Lee, Hyeong Joo;Chung, Gun Hui
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2019.05a
    • /
    • pp.399-399
    • /
    • 2019
  • 최근 세계적인 기상이변으로 국지적인 대설과 한파의 발생이력이 증가하고 있다. 특히 최근 2018년 1월 8일 미국에 100년만의 한파로 인해 체감온도가 영하 69도까지 떨어지고, 우리나라에서도 같은 해 2월 8일 제주도 폭설과 한파로 인해 교통이 마비되는 피해가 발생한 것으로 알려져 대표적인 겨울철 자연재해인 대설 피해에 대한 관심이 증가하고 있는 추세이다. 이로 인해 대설 피해예측 및 저감에 대한 연구가 다수 진행되고 있으나, 시 군 구 별 과거 피해이력이 적고, 피해가 발생한 지역과 관측소 사이의 거리가 멀어 정확한 피해예측이 어려운 상황이다. 따라서 본 연구에서는 대설피해에 영향을 미치는 변수들의 데이터를 수집한 뒤 랜덤포레스트를 이용하여 대설피해액을 범주형으로 구분하고, 어느 범주에 포함되는지 예측 및 적용성을 검토하였다. 현재 과거 피해자료의 부족, 과거 피해 발생 환경과 현재 피해 발생 환경의 차이, 대설로 인해 피해가 가장 많이 발생하는 비닐하우스 설계 기준의 변화 등으로 인해 예측 정확도가 높지 않았다. 따라서 대설피해 발생지역의 정확한 기상자료가 확보되고, 변수로 사용한 데이터의 최신화가 진행된다면 본 연구결과의 정확도 향상과 대략적인 대설피해규모 예측이 가능 할 것으로 기대된다.

  • PDF