• Title/Summary/Keyword: Categorical data analysis

Search Result 198, Processing Time 0.024 seconds

Relationship between Business Type on Sales Orders and Major Factors in Domestic Ecommerce Markets

  • JEONG, Dong-Bin
    • The Journal of Economics, Marketing and Management
    • /
    • v.8 no.2
    • /
    • pp.19-26
    • /
    • 2020
  • Purpose: The goal of this study is to comprehensively grasp the current status of ecommerce and to use as basic data for information-related policies. In this work, we understand recent ecommerce utilization, purchasing business by main factors, and look over the association between business type on sales orders (BTSO) and three variables: region, occupation and group type. Research design, data and methodology: The resource of this research is obtained by Ministry of Science and Technology Information and Communication in 2017, and investigated about 14,000 national business samples. Two statistical methods are used to analyze the association between the three variables: chi-square test and correspondence analysis. Results: The findings show that BTSO is pairwise associated with thee categorical variables, and the association between the categories of the two variables can be visually examined on two dimensional plane. Conclusions: This study suggests 'household & individual consumers' among BTSO are closely connected with 'Chungbuk' and 'Kyungnam' for region, 'others', 'finance & insurance' and 'association, repairing & other personal service' for occupation, and 'national & local government' for group type. Additionally, 'other companies' among BTSO are, particularly, related to 'Chunnam' for region, 'manufacturing industry' for occupation, and 'company corporations' for group type.

The Meta-Analysis on Effects of Living Lab-Based Education (리빙랩 기반 교육 프로그램의 효과에 대한 메타분석)

  • So Hee Yoon
    • Journal of Practical Engineering Education
    • /
    • v.14 no.3
    • /
    • pp.505-512
    • /
    • 2022
  • The purpose of this study is to synthesize effects of the living lab-based education through meta-analysis. Seven primary studies reporting the effect of living lab-based education were carefully selected for data analysis. Research questions are as follows. First, what is the overall effect size of the living lab-based education? The overall effect size refers to the effect on the cognitive and affective domains. Second, what is the effect size of the living lab-based education according to categorical variables? Categorical variables are outcome characteristics, study characteristics, and design characteristics. Results are summarized as follows. First, the overall effect size of living lab-based education was 0.347. Second, the effect size according to the cognitive domain was 1.244 for information process, 0.593 for communication, 0.261 for problem solving, and 0.26 for creativity. Third, the effect size according to subject area was shown in the order of electrical and electronic engineering 1.146, technology and home economics 0.489, artificial intelligence 0.379, and practical arts 0.168. Fourth, the effect size according to school level was 1.058 for high school, 0.312 for middle school, and 0.217 for elementary school. Fifth, the effect size by grade level was 0.295 when two or more grades were integrated and 0.294 for a single grade.

A Qualitative Research on Clothing Habit of Women in Multicultural Families (다문화가정 여성의 의생활착의습관에 관한 질적 연구)

  • Lee, Yun-Jung
    • The Korean Journal of Community Living Science
    • /
    • v.21 no.3
    • /
    • pp.395-410
    • /
    • 2010
  • This qualitative research on women in multi-cultural families aims to analyze their clothing weight, habit, management and purchasing of clothing as well as their children's in order to provide fundamental data or literature for their adjustment in Korean clothing culture and health management. The research was conducted by interviews to eleven married foreign women with nationality of various climates, and subsequently by categorical analysis and subject analysis. The final outcome in terms of subjects included 'heating/cooling system as to environmental temperature', 'scope of climate adaptation differences in the amount of clothing', 'sleepwear and bedding' and 'clothing purchasing behaviour'. The empirical survey showed that those who came from colder regions or warmer regions had difficulties adjusting to the climate. And their clothing weight & clothing habits, originated from their home countries, were found to be kept stable and to be systematically transferred to their children as well. When it comes to sleepwear and bedding, the women seemed to be less interested in them than normal outerwear, but they tended to like to cover the belly of their babies while they didn't have sufficient nightwear for themselves. And shopping and management of clothing were another area with differences between those women and Korean ones. These results imply that further research on the multicultural families, in particular on their clothing behavior, and on changeability of the behaviour through education or through evolution is needed.

A Study on Comparison of Generalized Kappa Statistics in Agreement Analysis

  • Kim, Min-Seon;Song, Ki-Jun;Nam, Chung-Mo;Jung, In-Kyung
    • The Korean Journal of Applied Statistics
    • /
    • v.25 no.5
    • /
    • pp.719-731
    • /
    • 2012
  • Agreement analysis is conducted to assess reliability among rating results performed repeatedly on the same subjects by one or more raters. The kappa statistic is commonly used when rating scales are categorical. The simple and weighted kappa statistics are used to measure the degree of agreement between two raters, and the generalized kappa statistics to measure the degree of agreement among more than two raters. In this paper, we compare the performance of four different generalized kappa statistics proposed by Fleiss (1971), Conger (1980), Randolph (2005), and Gwet (2008a). We also examine how sensitive each of four generalized kappa statistics can be to the marginal probability distribution as to whether marginal balancedness and/or homogeneity hold or not. The performance of the four methods is compared in terms of the relative bias and coverage rate through simulation studies in various scenarios with different numbers of raters, subjects, and categories. A real data example is also presented to illustrate the four methods.

The research of Correspondence Analysis centered on the Failure Period to improve the reliability of Weapon Systems (무기체계의 신뢰성 향상을 위한 고장발생기간 중심의 대응분석 연구)

  • Song, Bong-Geun;Kim, Geun-Hyung;Kim, Young-Kuk;Park, Seung Hwan;Baek, Jun-Geol
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.17 no.10
    • /
    • pp.289-299
    • /
    • 2016
  • Weapon systems require reliability in the development phase for efficient combat readiness. Improved reliability in various manufacturing processes have been achieved using data analysis. However, data analysis in the development phase is difficult due to problems such as the lack of data, high cost, and the importance of security. Therefore, Post Logistics Support (PLS) data collected following integration is analyzed for long-term quality improvement of weapon systems. In this study, we propose a methodology for examining the correlation between the failure rate and PLS data as follows: First, key variables affecting reliability were identified the correlation between variables on the failure rate examined. Second, corresponding analysis was conducted for determining the correlation between patterns of categorical data. Third, extract categories with the higher contribution and quality of representation, and find the highest variable correlated with failure period through visualization. Then, after selecting patterns which have shorter failure period, the cause of decreased reliability was confirmed through frequency analysis. This study will contribute to improving reliability when developing new weapon systems and will help to strengthen the combat readiness of military.

Development and application of prediction model of hyperlipidemia using SVM and meta-learning algorithm (SVM과 meta-learning algorithm을 이용한 고지혈증 유병 예측모형 개발과 활용)

  • Lee, Seulki;Shin, Taeksoo
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.111-124
    • /
    • 2018
  • This study aims to develop a classification model for predicting the occurrence of hyperlipidemia, one of the chronic diseases. Prior studies applying data mining techniques for predicting disease can be classified into a model design study for predicting cardiovascular disease and a study comparing disease prediction research results. In the case of foreign literatures, studies predicting cardiovascular disease were predominant in predicting disease using data mining techniques. Although domestic studies were not much different from those of foreign countries, studies focusing on hypertension and diabetes were mainly conducted. Since hypertension and diabetes as well as chronic diseases, hyperlipidemia, are also of high importance, this study selected hyperlipidemia as the disease to be analyzed. We also developed a model for predicting hyperlipidemia using SVM and meta learning algorithms, which are already known to have excellent predictive power. In order to achieve the purpose of this study, we used data set from Korea Health Panel 2012. The Korean Health Panel produces basic data on the level of health expenditure, health level and health behavior, and has conducted an annual survey since 2008. In this study, 1,088 patients with hyperlipidemia were randomly selected from the hospitalized, outpatient, emergency, and chronic disease data of the Korean Health Panel in 2012, and 1,088 nonpatients were also randomly extracted. A total of 2,176 people were selected for the study. Three methods were used to select input variables for predicting hyperlipidemia. First, stepwise method was performed using logistic regression. Among the 17 variables, the categorical variables(except for length of smoking) are expressed as dummy variables, which are assumed to be separate variables on the basis of the reference group, and these variables were analyzed. Six variables (age, BMI, education level, marital status, smoking status, gender) excluding income level and smoking period were selected based on significance level 0.1. Second, C4.5 as a decision tree algorithm is used. The significant input variables were age, smoking status, and education level. Finally, C4.5 as a decision tree algorithm is used. In SVM, the input variables selected by genetic algorithms consisted of 6 variables such as age, marital status, education level, economic activity, smoking period, and physical activity status, and the input variables selected by genetic algorithms in artificial neural network consist of 3 variables such as age, marital status, and education level. Based on the selected parameters, we compared SVM, meta learning algorithm and other prediction models for hyperlipidemia patients, and compared the classification performances using TP rate and precision. The main results of the analysis are as follows. First, the accuracy of the SVM was 88.4% and the accuracy of the artificial neural network was 86.7%. Second, the accuracy of classification models using the selected input variables through stepwise method was slightly higher than that of classification models using the whole variables. Third, the precision of artificial neural network was higher than that of SVM when only three variables as input variables were selected by decision trees. As a result of classification models based on the input variables selected through the genetic algorithm, classification accuracy of SVM was 88.5% and that of artificial neural network was 87.9%. Finally, this study indicated that stacking as the meta learning algorithm proposed in this study, has the best performance when it uses the predicted outputs of SVM and MLP as input variables of SVM, which is a meta classifier. The purpose of this study was to predict hyperlipidemia, one of the representative chronic diseases. To do this, we used SVM and meta-learning algorithms, which is known to have high accuracy. As a result, the accuracy of classification of hyperlipidemia in the stacking as a meta learner was higher than other meta-learning algorithms. However, the predictive performance of the meta-learning algorithm proposed in this study is the same as that of SVM with the best performance (88.6%) among the single models. The limitations of this study are as follows. First, various variable selection methods were tried, but most variables used in the study were categorical dummy variables. In the case with a large number of categorical variables, the results may be different if continuous variables are used because the model can be better suited to categorical variables such as decision trees than general models such as neural networks. Despite these limitations, this study has significance in predicting hyperlipidemia with hybrid models such as met learning algorithms which have not been studied previously. It can be said that the result of improving the model accuracy by applying various variable selection techniques is meaningful. In addition, it is expected that our proposed model will be effective for the prevention and management of hyperlipidemia.

Determinants of Satisfaction, Revisit Intention, and Recommendation Intention Using Decision Tree Analysis - Foreign Tourists Visiting Korea during the COVID-19 Pandemic - (의사결정나무분석을 활용한 방문 만족도, 재방문 의사, 타인 권유 의사 결정요인 분석 - 코로나19 상황에서의 한국 방문 외래관광객을 대상으로 -)

  • Won-Sik Kim
    • The Journal of the Convergence on Culture Technology
    • /
    • v.9 no.3
    • /
    • pp.129-136
    • /
    • 2023
  • The study aims to examine the determinants that affect satisfaction, revisit intention, and recommendation intention with foreign tourists who visited Korea despite the threat of COVID-19. This study employs the survey data collected by the Korea Tourism Organization from 8,135 foreign tourists who visited Korea in 2020. As the survey data contains a mixture of continuous and categorical variables, decision tree analysis can ensure analytical validity for the research. According to the analytical results, the determinants affecting satisfaction are the purpose of the visit and acceptance of self-quarantine during their stay. The factors influencing revisit intention are the purpose of the visit, frequency of the visit, and acceptance of self-quarantine during their stay. The determinants affecting recommendation intention are the purpose of the visit, length of stay, and gender. Based on the results of this analysis, this study not only explains the relationship between these determinants and tourism satisfaction, revisit intention, and recommendation intention, but also suggests implications for revitalizing tourism activities.

Categorical data analysis of sensory evaluation data with Hanwoo bull beef (한우 수소 고기 관능평가 데이터에 대한 범주형 자료 분석)

  • Lee, Hye-Jung;Cho, Soo-Hyun;Kim, Jae-Hee
    • Journal of the Korean Data and Information Science Society
    • /
    • v.20 no.5
    • /
    • pp.819-827
    • /
    • 2009
  • This study was conducted to investigate the relationship between the sociodemographic factors and the Korean consumers palatability evaluation grades with Hanwoo sensory evaluation data. The dichotomy logistic regression model and the multinomial logistic regression model are fitted with the independent variables such as the consumer living location, age, gender, occupation, monthly income, and beef cut and the the palatability grade as the dependent variable. Stepwise variable selection procedure is incorporated to find the final model and odds ratios are calculated to find the associations between categories.

  • PDF

Application of GIS-based Probabilistic Empirical and Parametric Models for Landslide Susceptibility Analysis (산사태 취약성 분석을 위한 GIS 기반 확률론적 추정 모델과 모수적 모델의 적용)

  • Park, No-Wook;Chi, Kwang-Hoon;Chung, Chang-Jo F.;Kwon, Byung-Doo
    • Economic and Environmental Geology
    • /
    • v.38 no.1
    • /
    • pp.45-55
    • /
    • 2005
  • Traditional GIS-based probabilistic spatial data integration models for landslide susceptibility analysis have failed to provide the theoretical backgrounds and effective methods for integration of different types of spatial data such as categorical and continuous data. This paper applies two spatial data integration models including non-parametric empirical estimation and parametric predictive discriminant analysis models that can directly use the original continuous data within a likelihood ratio framework. Similarity rates and a prediction rate curve are computed to quantitatively compare those two models. To illustrate the proposed models, two case studies from the Jangheung and Boeun areas were carried out and analyzed. As a result of the Jangheung case study, two models showed similar prediction capabilities. On the other hand, in the Boeun area, the parametric predictive discriminant analysis model showed the better prediction capability than that from the non-parametric empirical estimation model. In conclusion, the proposed models could effectively integrate the continuous data for landslide susceptibility analysis and more case studies should be carried out to support the results from the case studies, since each model has a distinctive feature in continuous data representation.

Impact of Chemotherapy on Hypercalcemia in Breast and Lung Cancer Patients

  • Hassan, Bassam Abdul Rasool;Yusoff, Zuraidah Binti Mohd;Hassali, Mohamed Azmi;Othman, Saad Bin;Weiderpass, Elisabete
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.13 no.9
    • /
    • pp.4373-4378
    • /
    • 2012
  • Introduction: Hypercalcemia is mainly caused by bone resorption due to either secretion of cytokines including parathyroid hormone-related protein (PTHrP) or bone metastases. However, hypercalcemia may occur in patients with or without bone metastases. The present study aimed to describe the effect of chemotherapy treatment, regimens and doses on calcium levels among breast and lung cancer patients with hypercalcemia. Methods: We carried a review of medical records of breast and lung cancer patients hospitalized in years 2003 and 2009 at Penang General Hospital, a public tertiary care center in Penang Island, north of Malaysia. Patients with hypercalcemia (defined as a calcium level above 10.5 mg/dl) at the time of cancer diagnosis or during cancer treatment had their medical history abstracted, including presence of metastasis, chemotherapy types and doses, calcium levels throughout cancer treatment, and other co-morbidity. The mean calcium levels at first hospitalization before chemotherapy were compared with calcium levels at the end of or at the latest chemotherapy treatment. Statistical analysis was conducted using the Chi-square test for categorical data, logistic regression test for categorical variables, and Spearman correlation test, linear regression and the paired sample t tests for continuous data. Results: Of a total 1,023 of breast cancer and 814 lung cancer patients identified, 292 had hypercalcemia at first hospitalization or during cancer treatment (174 breast and 118 lung cancer patients). About a quarter of these patients had advanced stage cancers: 26.4% had mild hypercalcemia (10.5-11.9 mg/dl), 55.5% had moderate (12-12.9 mg/dl), and 18.2% severe hypercalcemia (13-13.9; 14-16 mg/dl). Chemotherapy lowered calcium levels significantly both in breast and lung cancer patients with hypercalcemia; in particular with chemotherapy type 5-flurouracil+epirubicin+cyclophosphamide (FEC) for breast cancer, and gemcitabine+cisplatin in lung cancer. Conclusion: Chemotherapy decreases calcium levels in breast and lung cancer cases with hypercalcemia at cancer diagnosis, probably by reducing PTHrP levels.