• Title/Summary/Keyword: 응용 범주

Search Result 243, Processing Time 0.02 seconds

Analysis of categorical data with nonresponses (무응답을 포함하는 범주형 자료의 분석)

  • 박태성;이승연
    • The Korean Journal of Applied Statistics
    • /
    • v.11 no.1
    • /
    • pp.83-95
    • /
    • 1998
  • Statistical models are proposed for analyzing categorical data in the presence of missing observations or nonresponses which might occur in the sampling surveys and polls. As an illustration, we analyzed real polling data of the pre-presidential election in the USA, 1948, It had been predicted that Dewey would win the election. However, Truman won in the actual election.

  • PDF

상관분석을 응용한 산업재해사례 요인의 고찰

  • 홍광수;정국삼
    • Proceedings of the Korean Institute of Industrial Safety Conference
    • /
    • 1997.11a
    • /
    • pp.331-336
    • /
    • 1997
  • 본 연구에서 산업재해 사례를 연구 대상으로 재해 발생의 여러 가지 요인들의 관련을 검토하고자 통계적 기법을 이용한 재해요인별 상관분석, 또는 영향의 정도 파악, 재해 요인의 통제에 따른 기타 재해요인에 대한 영향 분석을 시도하는 통계학적 분석 방법을 이용한 재해 발생의 중요요인을 분석하고자 첫째, 산업재해 통계 자료의 내용을 분석하여 재해 관련 변수들을 파악하는데 불안전 행동 및 불안전상태에 의한 재해 형태와 기타 변수들 간의 정성적 상관분석을 통한 상관계수를 고찰, 둘째, 명목척도인 범주형 변수 상호 간의 관련 여부를 파악하기 위해 카이제곱(chi-square)검정을 행하여 입원 일수를 종속 변수로 하는 기타 변수들의 독립성 여부와 변수 상호간 연관이 있다고 판단될 때 각 변수의 연관의 정도 비교, 셋째, 어떤 변수 상호간 일정한 관계를 가질 때 변수의 범주별로 반응변수(종속변수)에 미치는 영향을 회귀식 형태로 파악하고 비교하기 위하여 로짓(logit)모형을 적용하였다. (중략)

  • PDF

Improvements of K-modes Algorithm and ROCK Algorithm (K-모드 알고리즘과 ROCK 알고리즘의 개선)

  • 김보화;김규성
    • The Korean Journal of Applied Statistics
    • /
    • v.15 no.2
    • /
    • pp.381-393
    • /
    • 2002
  • K-modes algorithm and ROCK(RObust Clustering using linKs) algorithm we useful clustering methods for large categorical data. In the paper, we investigate these algorithms and propose improved algorithms of them to correct their weakness. A simulation study shows that the proposed algorithms could increase the performance of data clustering.

Sublimation of Multimedia Games' Holding Power: An Application of Vij apti-m trat and Behavior Modification Methods (멀티미디어 게임 흡인요소 악영향 순화: 유식학 및 행동수정 방법 응용)

  • Son, In-Sook;Cho, Yun-Gyeong;Bae, Jae-Hak
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2003.05b
    • /
    • pp.967-970
    • /
    • 2003
  • 본 논문에서는 멀티미디어 게임 흡인요소를 매개로 유식학 수행단계와 행동수정방법간의 연관성을 밝혔다. 수행단계와 흡인요소간의 연관성 파악을 위하여 11가지 선심소를 기저범주로 놓고 Roget 시소러스를 이용하여 참조정보를 탐색한 곁과 선심소의 상세범주를 얻었다. 이것과 함께 게임 흡인요소 분류를 Roget 시소러스에서 같은 방법으로 탐색하여 수행단계와 게임 흡인요소의 대응관계를 확인하였다. 한편, 게임 흡인요소에 대응하는 행동수정방법은 문헌연구를 통하여 확인하였다. 이로써 게임 흡인요소 악영향 순화방안을 서구의 심리학이론인 행동수정방법과 함께 불친 심리학인 유식학 이론의 수행방법을 적용하여 상호보완적으로 생각할 수 있었다.

  • PDF

Sublimation of Multimedia Games Holding Power: An Application of Vij apti-m trat (멀티미디어 게임 흡인요소의 순화: 유식학 응용)

  • Cho, Yun-Gyeong;Son, In-Sook;Bae, Jae-Hak
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2002.11c
    • /
    • pp.2451-2454
    • /
    • 2002
  • 본 논문에서는 유식학의 선심소와 멀티미디어 게임 흡인요소간의 연관성을 밝혔다. 11가지 선심소를 기저범주로 놓고 Roget 시소러스를 이용하여 참조정보를 탐색한 결과 선심소의 상세범주를 얻었다. 이것을 게임의 흡인요소 분류표와 대조를 통하여 선심소와 게임 흡인요소의 대응관계를 확인하였다. 이로써 게임의 흡인요소를 유식학 이론의 입장에서 파악할 수 있는 토대를 마련하여, 유식학의 수행법으로 게임이 가지는 부정적 흡인요소의 영향을 완화시킬 수 있는 방법을 강구할 수 있었다.

  • PDF

A Bayesian Threshold Model for Ordered Categorical Traits (순서범주형자료 분석을 위한 베이지안 분계점 모형)

  • Choi Byangsu;Lee Seung-Chun
    • The Korean Journal of Applied Statistics
    • /
    • v.18 no.1
    • /
    • pp.173-182
    • /
    • 2005
  • A Bayesian threshold model is considered to analyze binary or ordered categorical traits. Gibbs sampler for making full Bayesian inferences about the category probability as well as the regression coefficients is described. The model can be regarded as an alternative to the ordered logit regression model. Numerical examples are shown to demonstrate the efficiency of the model.

A generalized logit model with mixed effects for categorical data (다가자료에 대한 혼합효과모형)

  • 최재성
    • The Korean Journal of Applied Statistics
    • /
    • v.15 no.1
    • /
    • pp.129-137
    • /
    • 2002
  • This paper suggests a generalized logit model with mixed effects for analysing frequency data in multi-contingency table. In this model nominal response variable is assumed to be polychotomous. When some factors are fixed but considered as ordinal and others are random, this paper shows how to use baseline-category logits to incoporate the mixed-effects of those factors into the model. A numerical algorithm was used to estimate model parameters by using marginal log-likelihood.

Generation and Selection of Nominal Virtual Examples for Improving the Classifier Performance (분류기 성능 향상을 위한 범주 속성 가상예제의 생성과 선별)

  • Lee, Yu-Jung;Kang, Byoung-Ho;Kang, Jae-Ho;Ryu, Kwang-Ryel
    • Journal of KIISE:Software and Applications
    • /
    • v.33 no.12
    • /
    • pp.1052-1061
    • /
    • 2006
  • This paper presents a method of using virtual examples to improve the classification accuracy for data with nominal attributes. Most of the previous researches on virtual examples focused on data with numeric attributes, and they used domain-specific knowledge to generate useful virtual examples for a particularly targeted learning algorithm. Instead of using domain-specific knowledge, our method samples virtual examples from a naive Bayesian network constructed from the given training set. A sampled example is considered useful if it contributes to the increment of the network's conditional likelihood when added to the training set. A set of useful virtual examples can be collected by repeating this process of sampling followed by evaluation. Experiments have shown that the virtual examples collected this way.can help various learning algorithms to derive classifiers of improved accuracy.

Predicting the number of disease occurrence using recurrent neural network (순환신경망을 이용한 질병발생건수 예측)

  • Lee, Seunghyeon;Yeo, In-Kwon
    • The Korean Journal of Applied Statistics
    • /
    • v.33 no.5
    • /
    • pp.627-637
    • /
    • 2020
  • In this paper, the 1.24 million elderly patient medical data (HIRA-APS-2014-0053) provided by the Health Insurance Review and Assessment Service and weather data are analyzed with generalized estimating equation (GEE) model and long short term memory (LSTM) based recurrent neural network (RNN) model to predict the number of disease occurrence. To this end, we estimate the patient's residence as the area of the served medical institution, and the local weather data and medical data were merged. The status of disease occurrence is divided into three categories(occurrence of disease of interest, occurrence of other disease, no occurrence) during a week. The probabilities of categories are estimated by the GEE model and the RNN model. The number of cases of categories are predicted by adding the probabilities of categories. The comparison result shows that predictions of RNN model are more accurate than that of GEE model.

Selecting the optimal threshold based on impurity index in imbalanced classification (불균형 자료에서 불순도 지수를 활용한 분류 임계값 선택)

  • Jang, Shuin;Yeo, In-Kwon
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.5
    • /
    • pp.711-721
    • /
    • 2021
  • In this paper, we propose the method of adjusting thresholds using impurity indices in classification analysis on imbalanced data. Suppose the minority category is Positive and the majority category is Negative for the imbalanced binomial data. When categories are determined based on the commonly used 0.5 basis, the specificity tends to be high in unbalanced data while the sensitivity is relatively low. Increasing sensitivity is important when proper classification of objects in minority categories is relatively important. We explore how to increase sensitivity through adjusting thresholds. Existing studies have adjusted thresholds based on measures such as G-Mean and F1-score, but in this paper, we propose a method to select optimal thresholds using the chi-square statistic of CHAID, the Gini index of CART, and the entropy of C4.5. We also introduce how to get a possible unique value when multiple optimal thresholds are obtained. Empirical analysis shows what improvements have been made compared to the results based on 0.5 through classification performance metrics.