• Title/Summary/Keyword: categorical variable

Search Result 104, Processing Time 0.025 seconds

Robust Variable Selection in Classification Tree

  • Jang Jeong Yee;Jeong Kwang Mo
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2001.11a
    • /
    • pp.89-94
    • /
    • 2001
  • In this study we focus on variable selection in decision tree growing structure. Some of the splitting rules and variable selection algorithms are discussed. We propose a competitive variable selection method based on Kruskal-Wallis test, which is a nonparametric version of ANOVA F-test. Through a Monte Carlo study we note that CART has serious bias in variable selection towards categorical variables having many values, and also QUEST using F-test is not so powerful to select informative variables under heavy tailed distributions.

  • PDF

A Comparative Analysis of Risk Assessment Models for Asbestos Demolition (석면 해체 작업의 위험성평가모델 비교 분석)

  • Kim, Dong-Gyu;Kim, Min-Seung;Lee, Su-Min;Kim, Yu-Jin;Han, Seung-Woo
    • Proceedings of the Korean Institute of Building Construction Conference
    • /
    • 2022.11a
    • /
    • pp.99-100
    • /
    • 2022
  • As the danger of exposure to the asbestos has been revealed, the importance of demolition asbestos in existing buildings has been raised. Extensive body of study has been conducted to evaluate the risk of demolition asbestos, but there were confined types of variables caused by not reflecting categorical information and limitations in collecting quantitative information. Thus, this study aims to derive a model that predicts the risk in workplace of demolition asbestos by collecting categorical and continuous variables. For this purpose, categorical and continuous variables were collected from asbestos demolition reports, and the risk assessment score was set as the dependent variable. In this study, the influence of each variable was identified using logistic regression, and the risk prediction model methodologies were compared through decision tree regression and artificial neural network. As a result, a conditional risk prediction model was derived to evaluate the risk of demolition asbestos, and this model is expected to be used to ensure the safety of asbestos demolition workers.

  • PDF

Understanding of the Misuse Cases of Quantitative and Qualitative Regression Analysis (정량적, 정성적 회귀분석의 오적용과 이해)

  • Choe, Seong-Un
    • Proceedings of the Safety Management and Science Conference
    • /
    • 2011.11a
    • /
    • pp.213-217
    • /
    • 2011
  • The research shows misuse cases of quantitative regression analysis used in QC circle activity and six sigma movement which presents guidelines of correct use for quality practitioners. Additionally, the qualitative regression analysis that responses nonconforming ratio of variable y, is reviewed based on misuse cases for proper use by practitioners in the field. In most cases, there are frequent errors that involve the correlation analysis or ANOVA, regardless of using quantitative regression analysis. In addition, qualitative regression analysis for the nonconforming ratio that has dependent variable of discrete and categorical data, is often applied with quantitative regression and result in ineffective quality improvement.

  • PDF

Monitoring social networks based on transformation into categorical data

  • Lee, Joo Weon;Lee, Jaeheon
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.4
    • /
    • pp.487-498
    • /
    • 2022
  • Social network analysis (SNA) techniques have recently been developed to monitor and detect abnormal behaviors in social networks. As a useful tool for process monitoring, control charts are also useful for network monitoring. In this paper, the degree and closeness centrality measures, in which each has global and local perspectives, respectively, are applied to an exponentially weighted moving average (EWMA) chart and a multinomial cumulative sum (CUSUM) chart for monitoring undirected weighted networks. In general, EWMA charts monitor only one variable in a single chart, whereas multinomial CUSUM charts can monitor a categorical variable, in which several variables are transformed through classification rules, in a single chart. To monitor both degree centrality and closeness centrality simultaneously, we categorize them based on the average of each measure and then apply to the multinomial CUSUM chart. In this case, the global and local attributes of the network can be monitored simultaneously with a single chart. We also evaluate the performance of the proposed procedure through a simulation study.

Imputation for Binary or Ordered Categorical Traits Based on the Bayesian Threshold Model (베이지안 분계점 모형에 의한 순서 범주형 변수의 대체)

  • Lee Seung-Chun
    • The Korean Journal of Applied Statistics
    • /
    • v.18 no.3
    • /
    • pp.597-606
    • /
    • 2005
  • The nonresponse in sample survey causes a problem when it comes time to analyze dataset in public-use files where the user has only complete-data methods available and has limited information about the reasons for nonresponse. Recently imputation for nonresponse is becoming a standard approach for handling nonresponse and various imputation methods have been devised . However, most imputation methods concern with continuous traits while many interesting features are measured by binary or ordered categorical scales in sample survey. In this note. an imputation method for ignorable nonresponse in binary or ordered categorical traits is considered.

Bias Reduction in Split Variable Selection in C4.5

  • Shin, Sung-Chul;Jeong, Yeon-Joo;Song, Moon Sup
    • Communications for Statistical Applications and Methods
    • /
    • v.10 no.3
    • /
    • pp.627-635
    • /
    • 2003
  • In this short communication we discuss the bias problem of C4.5 in split variable selection and suggest a method to reduce the variable selection bias among categorical predictor variables. A penalty proportional to the number of categories is applied to the splitting criterion gain of C4.5. The results of empirical comparisons show that the proposed modification of C4.5 reduces the size of classification trees.

An Empirical Study on the Measurement of Clustering and Trend Analysis among the Asian Container Ports Using the Variable Group Benchmarking and Categorical Variable Models (가변 그룹 벤치마킹 모형과 범주형 변수모형을 이용한 아시아 컨테이너항만의 클러스터링측정 및 추세분석에 관한 실증적 연구)

  • Park, Rokyung
    • Journal of Korea Port Economic Association
    • /
    • v.29 no.1
    • /
    • pp.143-175
    • /
    • 2013
  • The purpose of this paper is to show the clustering trend by using the variable group benchmarking(VGB) and categorical variable(CV) models for 38 Asian ports during 9 years(2001-2009) with 4 inputs(birth length, depth, total area, and number of crane) and 1 output(container TEU). The main empirical results of this paper are as follows. First, clustering results by using VGB show that Shanghai, Qingdao, and Ningbo ports took the core role for clustering. Second, CV analysis focusing on the container throughputs indicated that Singapore, Keelong, Dubai, and Kaosiung ports except Chinese ports are appeared as the center ports of clustering. Third, Aqaba, Dubai, Hongkong, Shanghai, Guangzhou, and Ningbo ports are recommended as the efficient ports for the target of clustering. Fourth, when the ports are classified by the regional location, Dubai, Khor Fakkan, Shanghai, Hongkong, Keelong, Ningbo, and Singapore ports are the core ports for clustering. On the whole, other ports located in Asia should be clustered to Dubai, Khor Fakkan, Shanghai, Hongkong, Ningbo, and Singapore ports. The policy implication of this paper is that Korean port policy planner should introduce the VGB model, and CV model for clustering among the international ports for enhancing the efficiency of inputs and outputs.

Individual differences in categorical perception: L1 English learners' L2 perception of Korean stops

  • Kong, Eun Jong
    • Phonetics and Speech Sciences
    • /
    • v.11 no.4
    • /
    • pp.63-70
    • /
    • 2019
  • This study investigated individual variability of L2 learners' categorical judgments of L2 stops by exploring English learners' perceptual processing of two acoustic cues (voice onset time [VOT] and f0) and working memory capacity as sources of variation. As prior research has reported that English speakers' greater use of the redundant cue f0 was responsible for gradient processing of native stops, we examined whether the same processing characteristics would be observed in L2 learners' perception of Korean stops (/t/-/th/). 22 English learners of L2 Korean with a range of L2 proficiency participated in a visual analogue scaling task and demonstrated variable manners of judging the L2 Korean stops: Some were more gradient than others in performing the task. Correlation analysis revealed that L2 learners' categorical responses were modestly related to individuals' utilizations of a primary cue for the stop contrast (VOT for L1 English stops and f0 for L2 Korean stops), and were also related to better working memory capacity. Together, the current experimental evidence demonstrates adult L2 learners' top-down processing of stop consonants where linguistic and cognitive resources are devoted to a process of determining abstract phonemic identity.

Variable selection for latent class analysis using clustering efficiency (잠재변수 모형에서의 군집효율을 이용한 변수선택)

  • Kim, Seongkyung;Seo, Byungtae
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.6
    • /
    • pp.721-732
    • /
    • 2018
  • Latent class analysis (LCA) is an important tool to explore unseen latent groups in multivariate categorical data. In practice, it is important to select a suitable set of variables because the inclusion of too many variables in the model makes the model complicated and reduces the accuracy of the parameter estimates. Dean and Raftery (Annals of the Institute of Statistical Mathematics, 62, 11-35, 2010) proposed a headlong search algorithm based on Bayesian information criteria values to choose meaningful variables for LCA. In this paper, we propose a new variable selection procedure for LCA by utilizing posterior probabilities obtained from each fitted model. We propose a new statistic to measure the adequacy of LCA and develop a variable selection procedure. The effectiveness of the proposed method is also presented through some numerical studies.

Contour Method and Collapsibility Criteria for $2{\times}3{\times}K$ Contingency Tables

  • Hong, C.S.;Son, B.U.;Park, J.Y.
    • Journal of the Korean Data and Information Science Society
    • /
    • v.15 no.4
    • /
    • pp.717-729
    • /
    • 2004
  • The contour method which was originally designed for $2{\times}2{\times}2$ contingency table is studied for $2{\times}2{\times}K$ and $2{\times}3{\times}K$ tables. Whereas a contour plot for a $2{\times}2{\times}K$ table is represented on unit squared two dimensional plane, a contour plot of a $2{\times}3{\times}K$ table can be expressed with a regular hexahedron on three dimensional space. Based on contour plots for categorical data fitted to all possible three dimensional log-linear models, one might identify whether $2{\times}2{\times}k$ or $2{\times}3{\times}K$ tables are collapsible over the third variable.

  • PDF