• Title/Summary/Keyword: categorical data analysis

Search Result 195, Processing Time 0.027 seconds

Associations Between Compliance With Non-pharmaceutical Interventions and Social-distancing Policies in Korea During the COVID-19 Pandemic

  • Hwang, Yu Seong;Jo, Heui Sug
    • Journal of Preventive Medicine and Public Health
    • /
    • v.54 no.4
    • /
    • pp.230-237
    • /
    • 2021
  • Objectives: This study explored changes in individuals' behavior in response to social distancing (SD) levels and the "no gatherings of more than 5 people" (NGM5) rule in Korea during the coronavirus disease 2019 (COVID-19) pandemic. Methods: Using survey data from the COVID-19 Behavior Tracker, exploratory factor analysis extracted 3 preventive factors: maintenance of personal hygiene, avoiding going out, and avoiding meeting people. Each factor was used as a dependent variable. The chisquare test was used to compare differences in distributions between categorical variables, while binary logistic regression was performed to identify factors associated with high compliance with measures to prevent transmission. Results: In men, all 3 factors were significantly associated with lower compliance. Younger age groups were associated with lower compliance with maintenance of personal hygiene and avoiding meeting people. Employment status was significantly associated with avoiding going out and avoiding meeting people. Residence in the capital area was significantly associated with higher compliance with personal hygiene and avoiding venturing out. Increasing SD levels were associated with personal hygiene, avoiding going out, and avoiding meeting people. The NGM5 policy was not significantly associated with compliance. Conclusions: SD levels, gender, age, employment status, and region had explanatory power for compliance with non-pharmaceutical interventions (NPIs). Strengthening social campaigns to inspire voluntary compliance with NPIs, especially focused on men, younger people, full-time workers, and residents of the capital area is recommended. Simultaneously, efforts need to be made to segment SD measures into substrategies with detailed guidance at each level.

Estimation of Occurrence Probability of Socioeconomic Damage Caused by Meteorological Drought Using Categorical Data Analysis (범주형 자료 분석을 활용한 사회경제적 가뭄 피해 발생확률 산정 : 충청북도의 적용사례를 중심으로)

  • Yu, Ji Soo;Yoo, Jiyoung;Kim, Min-ji;Kim, Tae-Woong
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2021.06a
    • /
    • pp.348-348
    • /
    • 2021
  • 가뭄 연구의 궁극적 목표는 가뭄 발생의 메커니즘에 대한 이해를 높이고, 예측기술을 향상시켜 선제적 대응이 가능하도록 하는 것이다. 일반적으로 가뭄분석에 활용되는 가뭄지표는 연속형 변수로 간주하여 확률모형을 구축하지만, 가뭄상태와 가뭄피해 자료는 순서형 및 이산형 변수이므로 범주형 자료 분석 기법을 적용하는 것이 더 적절하다. 따라서 본 연구에서는 기상학적 가뭄과 피해발생 사이의 관계를 규명하기 위해 범주형 자료 분석 방법 중 로그선형(log-linear) 모형과 로지스틱(logistic) 회귀모형을 활용하였다. 가뭄피해 예측을 위한 가뭄 피해 정보를 수집하는 것은 매우 어려운 일이다. 가뭄의 영향으로 인해 발생할 수 있는 피해의 종류가 다양하며, 여러 분야의 이해관계자가 받아들이는 가뭄의 피해 양상이 다르기 때문이다. 본 연구에서는 국가가뭄정보포털(drought.go.kr)에서 충청북도의 가뭄피해현황 자료를 수집하였다. 30년(1991~2020년)동안 238개 읍면동 중 34개 행정구역에서 총 272건의 가뭄피해가 발생한 것으로 확인되었다. 표준강수지수(SPI)를 이용하여 분석된 지역별 연평균 가뭄발생횟수는 약 8.44회이며, 가뭄이 가장 많이 발생한 해는 2001년(평균 가뭄발생 18.7회)이었다. 강수의 부족으로 인해 발생하는 기상학적 가뭄이 사회경제적 피해를 야기하는 수문학적 가뭄으로 전이되기까지 몇 주에서 몇 달까지 시간이 소요된다. 이러한 관계를 파악하기 위해 가뭄피해 발생 여부를 예측변수, 가뭄피해 발생 이전의 가뭄상태를 설명변수로 설정하여 기상학적 가뭄 발생에 따른 가뭄피해 발생 확률을 산정하였다. 그 결과 가뭄피해 발생 당시의 가뭄상태보다 그 이전에 연속된 가뭄상태가 있을 경우 가뭄피해 발생 확률이 약 2.5배 상승하는 것으로 나타났다.

  • PDF

Applicability Evaluation of a Mixed Model for the Analysis of Repeated Inventory Data : A Case Study on Quercus variabilis Stands in Gangwon Region (반복측정자료 분석을 위한 혼합모형의 적용성 검토: 강원지역 굴참나무 임분을 대상으로)

  • Pyo, Jungkee;Lee, Sangtae;Seo, Kyungwon;Lee, Kyungjae
    • Journal of Korean Society of Forest Science
    • /
    • v.104 no.1
    • /
    • pp.111-116
    • /
    • 2015
  • The purpose of this study was to evaluate mixed model of dbh-height relation containing random effect. Data were obtained from a survey site for Quercus variabilis in Gangwon region and remeasured the same site after three years. The mixed model were used to fixed effect in the dbh-height relation for Quercus variabilis, with random effect representing correlation of survey period were obtained. To verify the evaluation of the model for random effect, the akaike information criterion (abbreviated as, AIC) was used to calculate the variance-covariance matrix, and residual of repeated data. The estimated variance-covariance matrix, and residual were -0.0291, 0.1007, respectively. The model with random effect (AIC = -215.5) has low AIC value, comparison with model with fixed effect (AIC = -154.4). It is for this reason that random effect associated with categorical data is used in the data fitting process, the model can be calibrated to fit repeated site by obtaining measurements. Therefore, the results of this study could be useful method for developing model using repeated measurement.

Classification Tree Analysis to Assess Contributing Factors Influencing Biosecurity Level on Farrow-to-Finish Pig Farms in Korea (분류 트리 기법을 이용한 국내 일괄사육 양돈장의 차단방역 수준에 영향을 미치는 기여 요인 평가)

  • Kim, Kyu-Wook;Pak, Son-Il
    • Journal of Veterinary Clinics
    • /
    • v.33 no.2
    • /
    • pp.107-112
    • /
    • 2016
  • The objective of this study was to determine potential contributing factors associated with biosecurity level of farrow-to-finish pig farms and to develop a classification tree model to explore how these factors related to each other based on prediction model. To this end, the author analyzed data (n = 193) extracted from a cross-sectional study of 344 farrow-to-finish farms which was conducted between March and September 2014 aimed to explore swine disease status at farm level. Standardized questionnaires with information about basic demographical data and management practices were collected in each farm by on-site visit of trained veterinarians. For the classification of the data sets regarding biosecurity level as a dependent variable and predictor variables, Chi-squared Automatic Interaction Detection (CHAID) algorithm was applied for modeling classification tree. The statistics of misclassification risk was used to evaluate the fitness of the model in terms of prediction results. Categorical multivariate input data (40 variables) was used to construct a classification tree, and the target variable was biosecurity level dichotomized into low versus high. In general, the level of biosecurity was lower in the majority of farms studied, mainly due to the limited implementation of on-farm basic biosecurity measures aimed at controlling the potential introduction and transmission of swine diseases. The CHAID model illustrated the relative importance of significant predictors in explaining the level of biosecurity; maintenance of medical records of treatment and vaccination, use of dedicated clothing to enter the farm, installing fence surrounding the farm perimeter, and periodic monitoring of the herd using written biosecurity plan in place. The misclassification risk estimate of the prediction model was 0.145 with the standard error of 0.025, indicating that 85.5% of the cases could be classified correctly by using the decision rule based on the current tree. Although CHAID approach could provide detailed information and insight about interactions among factors associated with biosecurity level, further evaluation of potential bias intervened in the course of data collection should be included in future studies. In addition, there is still need to validate findings through the external dataset with larger sample size to improve the external validity of the current model.

A Review of Multivariate Analysis Studies Applied for Plant Morphology in Korea (국내 식물 형태 연구에 사용된 다변량분석 논문에 대한 재고)

  • Chang, Kae Sun;Oh, Hana;Kim, Hui;Lee, Heung Soo;Chang, Chin-Sung
    • Journal of Korean Society of Forest Science
    • /
    • v.98 no.3
    • /
    • pp.215-224
    • /
    • 2009
  • A review was given of the role of traditional morphometrics in plant morphological studies using 54 published studies in three major journals and others in Korea, such as Journal of Korean Forestry Society, Korean Journal of Plant Taxonomy, Korean Journal of Breeding, Korean Journal of Apiculture, Journal of Life Science, and Korean Journal of Plant Resources from 1997 to 2008. The two most commonly used techniques of data analysis, cluster analysis (CA) and principal components analysis (PCA) with other statistical tests were discussed. The common problem of PCA is the underlying assumptions of methods, like random sampling and multivariate normal distribution of data. The procedure was intended mainly for continuous data and was not efficient for data which were not well summarized by variances or covariances. Likewise CA was most appropriate for categorical rather than continuous data. Also, the CA produced clusters whether or not natural groupings existed, and the results depended on both the similarity measure chosen and the algorithm used for clustering. An additional problems of the PCA and the CA arised with both qualitative and quantitative data with a limited number of variables and/or too few numbers of samples. Some of these problems may be avoided if a certain number of variables (more than 20 at least) and sufficient samples (40-50 at least) are considered for morphometric analyses, but we do not think that the methods are all mighty tools for data analysts. Instead, we do believe that reasonable applications combined with focus on objectives and limitations of each procedure would be a step forward.

Development of the Accident Prediction Model for Enlisted Men through an Integrated Approach to Datamining and Textmining (데이터 마이닝과 텍스트 마이닝의 통합적 접근을 통한 병사 사고예측 모델 개발)

  • Yoon, Seungjin;Kim, Suhwan;Shin, Kyungshik
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.3
    • /
    • pp.1-17
    • /
    • 2015
  • In this paper, we report what we have observed with regards to a prediction model for the military based on enlisted men's internal(cumulative records) and external data(SNS data). This work is significant in the military's efforts to supervise them. In spite of their effort, many commanders have failed to prevent accidents by their subordinates. One of the important duties of officers' work is to take care of their subordinates in prevention unexpected accidents. However, it is hard to prevent accidents so we must attempt to determine a proper method. Our motivation for presenting this paper is to mate it possible to predict accidents using enlisted men's internal and external data. The biggest issue facing the military is the occurrence of accidents by enlisted men related to maladjustment and the relaxation of military discipline. The core method of preventing accidents by soldiers is to identify problems and manage them quickly. Commanders predict accidents by interviewing their soldiers and observing their surroundings. It requires considerable time and effort and results in a significant difference depending on the capabilities of the commanders. In this paper, we seek to predict accidents with objective data which can easily be obtained. Recently, records of enlisted men as well as SNS communication between commanders and soldiers, make it possible to predict and prevent accidents. This paper concerns the application of data mining to identify their interests, predict accidents and make use of internal and external data (SNS). We propose both a topic analysis and decision tree method. The study is conducted in two steps. First, topic analysis is conducted through the SNS of enlisted men. Second, the decision tree method is used to analyze the internal data with the results of the first analysis. The dependent variable for these analysis is the presence of any accidents. In order to analyze their SNS, we require tools such as text mining and topic analysis. We used SAS Enterprise Miner 12.1, which provides a text miner module. Our approach for finding their interests is composed of three main phases; collecting, topic analysis, and converting topic analysis results into points for using independent variables. In the first phase, we collect enlisted men's SNS data by commender's ID. After gathering unstructured SNS data, the topic analysis phase extracts issues from them. For simplicity, 5 topics(vacation, friends, stress, training, and sports) are extracted from 20,000 articles. In the third phase, using these 5 topics, we quantify them as personal points. After quantifying their topic, we include these results in independent variables which are composed of 15 internal data sets. Then, we make two decision trees. The first tree is composed of their internal data only. The second tree is composed of their external data(SNS) as well as their internal data. After that, we compare the results of misclassification from SAS E-miner. The first model's misclassification is 12.1%. On the other hand, second model's misclassification is 7.8%. This method predicts accidents with an accuracy of approximately 92%. The gap of the two models is 4.3%. Finally, we test if the difference between them is meaningful or not, using the McNemar test. The result of test is considered relevant.(p-value : 0.0003) This study has two limitations. First, the results of the experiments cannot be generalized, mainly because the experiment is limited to a small number of enlisted men's data. Additionally, various independent variables used in the decision tree model are used as categorical variables instead of continuous variables. So it suffers a loss of information. In spite of extensive efforts to provide prediction models for the military, commanders' predictions are accurate only when they have sufficient data about their subordinates. Our proposed methodology can provide support to decision-making in the military. This study is expected to contribute to the prevention of accidents in the military based on scientific analysis of enlisted men and proper management of them.

Improvement and Validation of Convective Rainfall Rate Retrieved from Visible and Infrared Image Bands of the COMS Satellite (COMS 위성의 가시 및 적외 영상 채널로부터 복원된 대류운의 강우강도 향상과 검증)

  • Moon, Yun Seob;Lee, Kangyeol
    • Journal of the Korean earth science society
    • /
    • v.37 no.7
    • /
    • pp.420-433
    • /
    • 2016
  • The purpose of this study is to improve the calibration matrixes of 2-D and 3-D convective rainfall rates (CRR) using the brightness temperature of the infrared $10.8{\mu}m$ channel (IR), the difference of brightness temperatures between infrared $10.8{\mu}m$ and vapor $6.7{\mu}m$ channels (IR-WV), and the normalized reflectance of the visible channel (VIS) from the COMS satellite and rainfall rate from the weather radar for the period of 75 rainy days from April 22, 2011 to October 22, 2011 in Korea. Especially, the rainfall rate data of the weather radar are used to validate the new 2-D and 3-DCRR calibration matrixes suitable for the Korean peninsula for the period of 24 rainy days in 2011. The 2D and 3D calibration matrixes provide the basic and maximum CRR values ($mm\;h^{-1}$) by multiplying the rain probability matrix, which is calculated by using the number of rainy and no-rainy pixels with associated 2-D (IR, IR-WV) and 3-D (IR, IR-WV, VIS) matrixes, by the mean and maximum rainfall rate matrixes, respectively, which is calculated by dividing the accumulated rainfall rate by the number of rainy pixels and by the product of the maximum rain rate for the calibration period by the number of rain occurrences. Finally, new 2-D and 3-D CRR calibration matrixes are obtained experimentally from the regression analysis of both basic and maximum rainfall rate matrixes. As a result, an area of rainfall rate more than 10 mm/h is magnified in the new ones as well as CRR is shown in lower class ranges in matrixes between IR brightness temperature and IR-WV brightness temperature difference than the existing ones. Accuracy and categorical statistics are computed for the data of CRR events occurred during the given period. The mean error (ME), mean absolute error (MAE), and root mean squire error (RMSE) in new 2-D and 3-D CRR calibrations led to smaller than in the existing ones, where false alarm ratio had decreased, probability of detection had increased a bit, and critical success index scores had improved. To take into account the strong rainfall rate in the weather events such as thunderstorms and typhoon, a moisture correction factor is corrected. This factor is defined as the product of the total precipitable waterby the relative humidity (PW RH), a mean value between surface and 500 hPa level, obtained from a numerical model or the COMS retrieval data. In this study, when the IR cloud top brightness temperature is lower than 210 K and the relative humidity is greater than 40%, the moisture correction factor is empirically scaled from 1.0 to 2.0 basing on PW RH values. Consequently, in applying to this factor in new 2D and 2D CRR calibrations, the ME, MAE, and RMSE are smaller than the new ones.

Comparative Analysis of Spontaneous Infectious Spondylitis : Pyogenic versus Tuberculous

  • Lee, Yangwon;Kim, Bum-Joon;Kim, Se-Hoon;Lee, Seung-Hwan;Kim, Won-Hyung;Jin, Sung-Won
    • Journal of Korean Neurosurgical Society
    • /
    • v.61 no.1
    • /
    • pp.81-88
    • /
    • 2018
  • Objective : Spondylitis is often chemotherapy resistant and requires long-term treatment. Without adequate chemotherapy, the outcome can be fatal or result in severe neurologic damage. Therefore, differentiating the etiology of spondylitis is very important, particularly in spontaneous cases. As the prevalence of tuberculosis in Korea has decreased in recent years, updated clinical research about spondylitis is warranted. Methods : From April 2010 to March 2016, data from spondylitis patients were collected retrospectively. In total, 69 patients (51 with pyogenic spondylitis and 18 with tuberculous spondylitis) were included. Clinical data, laboratory findings including erythrocyte sedimentation rate (ESR) and C-reactive protein (CRP) level, measurements of Cobb angles at the initial and final follow-up, and radiologic features on magnetic resonance imaging (MRI) scans were evaluated. To test differences between the pyogenic and tuberculous groups, numerical data were compared using the student's t-test and Mann-Whitney U test, and categorical data were compared using the chi-square test and Fisher's exact test. Results : The patients' mean age was 60.0 years. Male sex was slightly predominant (56.5%). There was no difference in mean age and sex between the two groups. The pyogenic group had a relatively higher proportion of immunocompromised patients. The peak CRP value was higher in the pyogenic group than in the tuberculous group (14.08 mg/dL and 8.50 mg/dL, respectively, p=0.009), whereas the ESR was not significantly different between the groups (81.5 mm/h and 75.6 mm/h, respectively, p=0.442). Radiologically, the presence of disc space sparing and vertebral body collapse differed between the groups. In the tuberculous group, the disc was more commonly preserved on contrast-enhanced MRI (50% and 23.5%, respectively, p=0.044), and vertebral body collapse was more common (66.6% and 15.7%, respectively, p<0.001). The mean length of hospitalization was longer in the pyogenic group (56.5 days and 41.2 days, respectively, p=0.001). Four mortality cases were observed only in the pyogenic group. The most commonly isolated microorganism in the pyogenic group was Staphylococcus aureus(S. aureus) (methicillin susceptible S. aureus and methicillin resistant S. aureus [MRSA] in 8 and 4 cases, respectively). Conclusion : The clinical and radiological manifestations of spontaneous spondylitis differ based on the causative organism. Pyogenic spondylitis patients tend to have a higher CRP level and a more severe clinical course, whereas tuberculous spondylitis patients present with destruction of the vertebral body with disc sparing more frequently. The presence of MRSA is increasing in community-acquired spondylitis cases.

Establishment of Bank Channel Strategy using Correspondence Analysis : Based on the Customer's Choice Factors of Bank Channel (대응분석을 이용한 은행 채널전략 수립연구 : 고객의 은행채널 선택요인을 바탕으로)

  • Park, Un Hak;Park, Young Bae
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.28 no.6
    • /
    • pp.151-171
    • /
    • 2023
  • For the efficient establishment of a channel strategy for banks, this study aims to propose a channel model by classifying channels into types, and carrying out a correspondence analysis per type. A survey of bankers was conducted to visualize categorical data and create a positioning map. As a result, first, 12 banking channels were classified into 4 types based on business processing subjects and places, which were then, further grouped into the categories of full-banking and self-banking. Second, a correspondence analysis according to the classified types was carried out, and it was found that the branch-type is suitable for product description and customer management, while the banking-type is suitable for efficient business processing without time and space constraints. Furthermore, the analysis also showed that the machine-type and banking-type are inappropriate for customer management, and the mobility-type demonstrates low operational effectiveness due to a lack of awareness. The aforementioned findings suggest the need for a hybrid convergence channel that reflects the characteristics of banking tasks and fills in the gaps between the different channels. Third, a channel model was derived by adding a common area to the 2×2 model consisting of the business processing subjects and places. Therefore, this study is meaningful in that it examines the diversification of channels and factors in the division of roles by channel type based on customers' banking channel selection factors, and presents basic research findings for future channel strategy establishment and efficient channel operation.

The Analysis of Young Children Science Educational Content Shown in the Child Picture Book (유아용 그림책에 나타난 유아과학교육 내용분석)

  • Yun, Eun-Gyung;Lee, Mi-Na
    • The Journal of the Korea Contents Association
    • /
    • v.15 no.8
    • /
    • pp.588-597
    • /
    • 2015
  • This study is to distribute 5-year-old nuri curriculum science education contents in child picture books, and to investigate the categorical difference of science education contents between domestic and foreign picture books and among genres. The subjects were 219 picture books for children from 4 to 7, listed in which is published by Children's Book Study Group in 2012 and 2013. The research tool was from the article of 5-year-old nuri curriculum nature study, to analyze the contents of young children science education in the child picture books. Content analysis categories was set to two upper-categories and seven sub-categories. Research data were calculated in the analysis of the frequency and percentage of each item's category by the method of analysis conformity. In conclusion, first, in the analyzed result of the upper categories of young children science education contents in 219 picture books, the frequency appeared in order of 'Curious to maintain and expand', 'Learn living things and the natural environment', 'To explore the investigation technique', 'To enjoy the investigation technique', 'Utilize simple tools and machines', 'To search objects and materials', 'Learn natural phenomena'. Second, in the compared result between the domestic and foreign picture books and among genres, "scientific inquiry" is appeared more than "fostering an attitude of exploration".