• Title/Summary/Keyword: Categorical data analysis

Search Result 196, Processing Time 0.028 seconds

Influence of Global Competitive Capability on Global Performance of Distribution Industry in South Korea

  • KIM, Boine;KIM, Byoung-Goo
    • Journal of Distribution Science
    • /
    • v.19 no.12
    • /
    • pp.83-89
    • /
    • 2021
  • Purpose: Purpose of this study is to empirically analyze influence of global competitive capability on global performance of distribution industry in South Korea. Also based on the empirical results, give managerial implication to distribution industry and contribute to academies of management. Research design, data and methodology: This study focuses on relationship analysis between global competitive capability and global performance. This study measured global competitive capability with three concepts; human capability, network capability and product/service capability. And measured global performance with export performance. To empirically analyze relationship between variables, this study used 2,316 data of GCL Test by KOTRA and Kdata. This study used SPSS26 and analyzed frequency, reliability, correlation and stepwise regression analysis. Results: Result shows that, in control variable, business period and business field give significant positive influence on export performance. Among antecedents, human capability and network capability give significant positive influence on export performance. However, product/goods/service was not significant. Due to significant influence of business field which is categorical variable. This study additionally analyze relationship by business field group to confirm whether relationship differ by group or similar. Conclusions: Based on the results, this study try to give implication to distribution industry management and contribute to academic.

Analysis on the Correlation Between Occupation and Disease in Korea

  • KANG, Il-Won;KWON, Lee-Seung
    • The Journal of Industrial Distribution & Business
    • /
    • v.12 no.9
    • /
    • pp.7-18
    • /
    • 2021
  • Purpose: This study aims to investigate whether there is a difference in the prevalence of hypertension according to gender, occupational group, and occupational group according to gender. Research design, data, and methodology: This study classified the occupational groups according to the gender of men and women between the ages of 20 and 49 into office workers and non-office workers from the 7th 2017 National Health and Nutrition Examination Survey. A total of 2,691 people were surveyed, including 1,394 office workers and 1,297 non-office workers. Frequency analysis, chi-square, and Independent T-test for distribution difference analysis of categorical variables analysis for occupation and hypertension were applied. Statistical significance was verified at 0.001 to determine the validity analysis. All statistical analyses were performed using the IBM SPSS 24.0 program. Results: The main risk factors for hypertension were gender, age, education, obesity, smoking, drinking, family history, and chronic diseases. There were differences in the prevalence of hypertension among office workers and non-office workers. Conclusions: Men had a higher prevalence of hypertension than women, while non-office workers had a higher prevalence than office workers. In the case of women, non-office workers have a higher prevalence of hypertension than white-collar workers.

LAD Estimators for Categorical Data Analysis (범주형 자료 분석을 위한 LAD 추정량)

  • 최현집
    • The Korean Journal of Applied Statistics
    • /
    • v.16 no.1
    • /
    • pp.55-69
    • /
    • 2003
  • In this article, we propose the weighted LAD (least absolute deviations) estimators for multi-dimensional contingency tables and drive an estimation method to estimate the proposed estimators. To illustrate the robustness of the estimators, simulation results are presented for several models Including log-linear models and models for ordinal variables in multidimensional contingency tables. Examples were also introduced.

A survey on the voice related needs of occupational voice users (직업적 음성사용자의 음성관련 요구 조사)

  • Lee, Eun-Jeong;Kim, Wha-Soo
    • Phonetics and Speech Sciences
    • /
    • v.7 no.2
    • /
    • pp.39-45
    • /
    • 2015
  • This research was conducted to investigate the voice related needs of occupational voice users. The data collected from teachers(379), tele-marketers(156), therapists(50) was classified according to its content, by colaizzi's inductive categorical analysis. The voice related needs are classified into 3 big categories, 1) how to use, 2) how to care, 3) how to be healthy. Again the category 'how to use' my voice was into 6 sub-categories: (1) efficiently, (2) as I desired, (3) without pain(discomfort), (4) expressively, (5) phonation (methods) and (6) clear articulation. The result showed that the needs from 3 groups of occupational voice users reflect their own environment which they have to use their voice as well as the voice characteristics wanted from their specific listeners.

Statistical Errors of Articles Published in the Journal of Oriental Rehabilitation Medicine(I) (한방재활의학과학회지의 통계적 오류에 관한 고찰(I))

  • Park, Tae-Yong;Heo, Tae-Young;Shin, Byung-Cheul
    • Journal of Korean Medicine Rehabilitation
    • /
    • v.20 no.4
    • /
    • pp.105-130
    • /
    • 2010
  • Objectives : The purpose of this study was to assess the statistical methods errors used in the journal of Oriental Rehabilitation Medicine(JORM) and to identify the types of errors in statistical analysis. Methods : We reviewed quantitative articles that were published in the JORM from January 2005 through October 2009. Those were not used by statistical analysis such as literature studies, case study, review articles were not included in this analysis. A total of 296 articles was reviewed. We evaluated the adequacy and the validity of the statistical techniques with our checklist established be modified Lee's checklist, and three statistical evaluators assessed together to minimize bias. Results : Of the 222 articles, 213 were used in inferential and descriptive statistics. Of those 80% of articles adopting descriptive and inferential statistics were detected having statistical errors. One articles used 1.7 statistical method unit generally. Most frequently employed statistics were student t-test, one way ANOVA. pearson correlation analysis, Mann-whitney U test, paired t-test, and chi-square test in their order. However, most frequent statistics having errors were similar in order. The most common statistic errors were as follow: 1. absence of normality test, 2. misuse between paired test and unpaired test, 3. wrong choice of repeated measures analysis without consideration of time variables, 4, increase of Type I error by using inappropriate multiple test, 5. inappropriate application of discrete or categorical data instead of continuous data in correlation analysis, 6. poor consideration of basic consumption in chi-square test, 7. confusion between frequency comparison and average comparison, 8. mentioning the statistical technique without using it. Conclusions : We found various mistake or misuses in the applications of statistical methodologies in the articles published in the JORM. Careful consideration of statistical use and review from the specialist of statistics are warranted for improving the quality of JORM.

Clustering Algorithm for Data Mining using Posterior Probability-based Information Entropy (데이터마이닝을 위한 사후확률 정보엔트로피 기반 군집화알고리즘)

  • Park, In-Kyoo
    • Journal of Digital Convergence
    • /
    • v.12 no.12
    • /
    • pp.293-301
    • /
    • 2014
  • In this paper, we propose a new measure based on the confidence of Bayesian posterior probability so as to reduce unimportant information in the clustering process. Because the performance of clustering is up to selecting the important degree of attributes within the databases, the concept of information entropy is added to posterior probability for attributes discernibility. Hence, The same value of attributes in the confidence of the proposed measure is considerably much less due to the natural logarithm. Therefore posterior probability-based clustering algorithm selects the minimum of attribute reducts and improves the efficiency of clustering. Analysis of the validation of the proposed algorithms compared with others shows their discernibility as well as ability of clustering to handle uncertainty with ACME categorical data.

Extended Information Entropy via Correlation for Autonomous Attribute Reduction of BigData (빅 데이터의 자율 속성 감축을 위한 확장된 정보 엔트로피 기반 상관척도)

  • Park, In-Kyu
    • Journal of Korea Game Society
    • /
    • v.18 no.1
    • /
    • pp.105-114
    • /
    • 2018
  • Various data analysis methods used for customer type analysis are very important for game companies to understand their type and characteristics in an attempt to plan customized content for our customers and to provide more convenient services. In this paper, we propose a k-mode cluster analysis algorithm that uses information uncertainty by extending information entropy to reduce information loss. Therefore, the measurement of the similarity of attributes is considered in two aspects. One is to measure the uncertainty between each attribute on the center of each partition and the other is to measure the uncertainty about the probability distribution of the uncertainty of each property. In particular, the uncertainty in attributes is taken into account in the non-probabilistic and probabilistic scales because the entropy of the attribute is transformed into probabilistic information to measure the uncertainty. The accuracy of the algorithm is observable to the result of cluster analysis based on the optimal initial value through extensive performance analysis and various indexes.

Latent class analysis with multiple latent group variables

  • Lee, Jung Wun;Chung, Hwan
    • Communications for Statistical Applications and Methods
    • /
    • v.24 no.2
    • /
    • pp.173-191
    • /
    • 2017
  • This study develops a new type of latent class analysis (LCA) in order to explain the associations between one latent variable and several other categorical latent variables. Our model postulates that the prevalence of the latent variable of interest is affected by another latent variable composed of other several latent variables. For the parameter estimation, we propose deterministic annealing EM (DAEM) to deal with local maxima problem in the proposed model. We perform simulation study to demonstrate how DAEM can find the set of parameter estimates at the global maximum of the likelihood over the repeated samples. We apply the proposed LCA model in an investigation of the effect of and joint patterns for drug-using behavior to violent behavior among US high school male students using data from the Youth Risk Behavior Surveillance System 2015. Considering the age of male adolescents as a covariate influencing violent behavior, we identified three classes of violent behavior and three classes of drug-using behavior. We also discovered that the prevalence of violent behavior is affected by the type of drug used for drug-using behavior.

Mapping Categories of Heterogeneous Sources Using Text Analytics (텍스트 분석을 통한 이종 매체 카테고리 다중 매핑 방법론)

  • Kim, Dasom;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.4
    • /
    • pp.193-215
    • /
    • 2016
  • In recent years, the proliferation of diverse social networking services has led users to use many mediums simultaneously depending on their individual purpose and taste. Besides, while collecting information about particular themes, they usually employ various mediums such as social networking services, Internet news, and blogs. However, in terms of management, each document circulated through diverse mediums is placed in different categories on the basis of each source's policy and standards, hindering any attempt to conduct research on a specific category across different kinds of sources. For example, documents containing content on "Application for a foreign travel" can be classified into "Information Technology," "Travel," or "Life and Culture" according to the peculiar standard of each source. Likewise, with different viewpoints of definition and levels of specification for each source, similar categories can be named and structured differently in accordance with each source. To overcome these limitations, this study proposes a plan for conducting category mapping between different sources with various mediums while maintaining the existing category system of the medium as it is. Specifically, by re-classifying individual documents from the viewpoint of diverse sources and storing the result of such a classification as extra attributes, this study proposes a logical layer by which users can search for a specific document from multiple heterogeneous sources with different category names as if they belong to the same source. Besides, by collecting 6,000 articles of news from two Internet news portals, experiments were conducted to compare accuracy among sources, supervised learning and semi-supervised learning, and homogeneous and heterogeneous learning data. It is particularly interesting that in some categories, classifying accuracy of semi-supervised learning using heterogeneous learning data proved to be higher than that of supervised learning and semi-supervised learning, which used homogeneous learning data. This study has the following significances. First, it proposes a logical plan for establishing a system to integrate and manage all the heterogeneous mediums in different classifying systems while maintaining the existing physical classifying system as it is. This study's results particularly exhibit very different classifying accuracies in accordance with the heterogeneity of learning data; this is expected to spur further studies for enhancing the performance of the proposed methodology through the analysis of characteristics by category. In addition, with an increasing demand for search, collection, and analysis of documents from diverse mediums, the scope of the Internet search is not restricted to one medium. However, since each medium has a different categorical structure and name, it is actually very difficult to search for a specific category insofar as encompassing heterogeneous mediums. The proposed methodology is also significant for presenting a plan that enquires into all the documents regarding the standards of the relevant sites' categorical classification when the users select the desired site, while maintaining the existing site's characteristics and structure as it is. This study's proposed methodology needs to be further complemented in the following aspects. First, though only an indirect comparison and evaluation was made on the performance of this proposed methodology, future studies would need to conduct more direct tests on its accuracy. That is, after re-classifying documents of the object source on the basis of the categorical system of the existing source, the extent to which the classification was accurate needs to be verified through evaluation by actual users. In addition, the accuracy in classification needs to be increased by making the methodology more sophisticated. Furthermore, an understanding is required that the characteristics of some categories that showed a rather higher classifying accuracy of heterogeneous semi-supervised learning than that of supervised learning might assist in obtaining heterogeneous documents from diverse mediums and seeking plans that enhance the accuracy of document classification through its usage.

Relationship between Business Type on Sales Orders and Major Factors in Domestic Ecommerce Markets

  • JEONG, Dong-Bin
    • The Journal of Economics, Marketing and Management
    • /
    • v.8 no.2
    • /
    • pp.19-26
    • /
    • 2020
  • Purpose: The goal of this study is to comprehensively grasp the current status of ecommerce and to use as basic data for information-related policies. In this work, we understand recent ecommerce utilization, purchasing business by main factors, and look over the association between business type on sales orders (BTSO) and three variables: region, occupation and group type. Research design, data and methodology: The resource of this research is obtained by Ministry of Science and Technology Information and Communication in 2017, and investigated about 14,000 national business samples. Two statistical methods are used to analyze the association between the three variables: chi-square test and correspondence analysis. Results: The findings show that BTSO is pairwise associated with thee categorical variables, and the association between the categories of the two variables can be visually examined on two dimensional plane. Conclusions: This study suggests 'household & individual consumers' among BTSO are closely connected with 'Chungbuk' and 'Kyungnam' for region, 'others', 'finance & insurance' and 'association, repairing & other personal service' for occupation, and 'national & local government' for group type. Additionally, 'other companies' among BTSO are, particularly, related to 'Chunnam' for region, 'manufacturing industry' for occupation, and 'company corporations' for group type.