• Title/Summary/Keyword: 순서형 변수

Search Result 84, Processing Time 0.022 seconds

Ordinal Variable Selection in Decision Trees (의사결정나무에서 순서형 분리변수 선택에 관한 연구)

  • Kim Hyun-Joong
    • The Korean Journal of Applied Statistics
    • /
    • v.19 no.1
    • /
    • pp.149-161
    • /
    • 2006
  • The most important component in decision tree algorithm is the rule for split variable selection. Many earlier algorithms such as CART and C4.5 use greedy search algorithm for variable selection. Recently, many methods were developed to cope with the weakness of greedy search algorithm. Most algorithms have different selection criteria depending on the type of variables: continuous or nominal. However, ordinal type variables are usually treated as continuous ones. This approach did not cause any trouble for the methods using greedy search algorithm. However, it may cause problems for the newer algorithms because they use statistical methods valid for continuous or nominal types only. In this paper, we propose a ordinal variable selection method that uses Cramer-von Mises testing procedure. We performed comparisons among CART, C4.5, QUEST, CRUISE, and the new method. It was shown that the new method has a good variable selection power for ordinal type variables.

의사결정나무에서 순서형 분리 변수 선택에 관한 연구

  • 김현중;송주미
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2004.11a
    • /
    • pp.283-288
    • /
    • 2004
  • 지금까지 의사결정나무에서 분리 변수의 선택에 관한 연구는 많았으나, 대부분 연속형 변수와 명목형 변수에 국한되어 왔다. 본 연구에서는 순서형 변수에 주목하여 CART, QUEST, CRUISE 등 기존 알고리즘과 본 연구에서 제안하는 비모수적 접근 방법인 K-S test, framer-von Misos test 방법의 변수 선택력을 비교하였다. 그 결과 본 연구에서 제안하는 framer-von Mises test 방법이 다른 알고리즘에 비하여, 변수 선택력과 안정성에 있어서 좋은 성과를 보였다.

  • PDF

Bayesian ordinal probit semiparametric regression models: KNHANES 2016 data analysis of the relationship between smoking behavior and coffee intake (베이지안 순서형 프로빗 준모수 회귀 모형 : 국민건강영양조사 2016 자료를 통한 흡연양태와 커피섭취 간의 관계 분석)

  • Lee, Dasom;Lee, Eunji;Jo, Seogil;Choi, Taeryeon
    • The Korean Journal of Applied Statistics
    • /
    • v.33 no.1
    • /
    • pp.25-46
    • /
    • 2020
  • This paper presents ordinal probit semiparametric regression models using Bayesian Spectral Analysis Regression (BSAR) method. Ordinal probit regression is a way of modeling ordinal responses - usually more than two categories - by connecting the probability of falling into each category explained by a combination of available covariates using a probit (an inverse function of normal cumulative distribution function) link. The Bayesian probit model facilitates posterior sampling by bringing a latent variable following normal distribution, therefore, the responses are categorized by the cut-off points according to values of latent variables. In this paper, we extend the latent variable approach to a semiparametric model for the Bayesian ordinal probit regression with nonparametric functions using a spectral representation of Gaussian processes based BSAR method. The latent variable is decomposed into a parametric component and a nonparametric component with or without a shape constraint for modeling ordinal responses and predicting outcomes more flexibly. We illustrate the proposed methods with simulation studies in comparison with existing methods and real data analysis applied to a Korean National Health and Nutrition Examination Survey (KNHANES) 2016 for investigating nonparametric relationship between smoking behavior and coffee intake.

The Study on the Severity of Children Traffic Accident using Ordinal Logistic Regression Analysis (순서형 로지스틱 회귀분석을 이용한 어린이 사고심각도 분석 연구)

  • Yoon, Byoung-Jo;Ko, Eun-Hyeck;Yang, Sung-Ryong
    • Proceedings of the Korean Society of Disaster Information Conference
    • /
    • 2016.11a
    • /
    • pp.259-260
    • /
    • 2016
  • 어린이의 경우 다른 연령층에 비해 신체적, 정신적으로 완성되지 못하여 교통사고의 가능성이 높으며, 특히 전국의 어린이 교통사고는 점진적으로 감소 추세이나 인천의 어린이 교통사고는 감소하다가 다시 증가 추세에 들어선 실정이다. 따라서 본 연구의 목적은 어린이 교통사고 심각도에 영향을 미치는 주요 요인들을 발견하고 제시하고자 하였다. 순서형 로지스틱 회귀분석을 활용하여 순서척도인 반응변수에 대한 설명변수의 오즈(Odds)를 확인하고자 하였으며 안전운전불이행, 차대사람(횡단중), 차대차(측면직각충돌)사고가 유의한 결과로 나타났다. 안전운전불이행으로 인한 사망사고와 기타사고의 오즈차이는 1.35배, 측면직각충돌로 인한 사망사고와 기타사고의 오즈차이는 1.76배 증가하는 것으로 나타났고, 횡단중인 경우에는 오히려 사망 위험도의 오즈값이 0.58배로 감소하는 것으로 나타났다.

  • PDF

Imputation for Binary or Ordered Categorical Traits Based on the Bayesian Threshold Model (베이지안 분계점 모형에 의한 순서 범주형 변수의 대체)

  • Lee Seung-Chun
    • The Korean Journal of Applied Statistics
    • /
    • v.18 no.3
    • /
    • pp.597-606
    • /
    • 2005
  • The nonresponse in sample survey causes a problem when it comes time to analyze dataset in public-use files where the user has only complete-data methods available and has limited information about the reasons for nonresponse. Recently imputation for nonresponse is becoming a standard approach for handling nonresponse and various imputation methods have been devised . However, most imputation methods concern with continuous traits while many interesting features are measured by binary or ordered categorical scales in sample survey. In this note. an imputation method for ignorable nonresponse in binary or ordered categorical traits is considered.

Ordering Variables and Categories on the Mosaic Plot (모자이크 플롯에서 변수와 범주의 순서화)

  • Lee, Moon-Joo;Huh, Myung-Hoe
    • The Korean Journal of Applied Statistics
    • /
    • v.21 no.5
    • /
    • pp.875-888
    • /
    • 2008
  • Mosaic plots, proposed by Hartigan and Kleiner (1981, 1984), are very useful in visualizing categorical data. In mosaic plot, multi-way classified cell frequencies are represented by rectangles with proportional area. The plot is easy to understand while preserving the information contained in the data. Plot's appearance, however, does change substantially depending on the order of variables and the orders of categories with variable put into the plot. In this study, we propose the algorithms for ordering variables and categories of the categorical data to be explored via mosaic plots. We demonstrate our methods to three well-known datasets: Titanic, Housing and PreSex.

A Study on the Scoring Method of the Ordinal Variable

  • Chung, Sung-S.;Chun, Young-M.;Oh, Seon-J.
    • Journal of the Korean Data and Information Science Society
    • /
    • v.15 no.1
    • /
    • pp.95-105
    • /
    • 2004
  • The main characteristic of the ordinal scale is that its categories have a logically or continuously ordered relationship to each other. A continuous type permits measuring degrees of differences among categories. Also, the specific amount of differences is important. In this paper we consider the scoring method using a dummy variable based on distance among categories.

  • PDF

The Study on the Accident Injury Severity Using Ordered Probit Model (순서형 프로빗 모형을 이용한 사고심각도 분석)

  • Ha, Oh-Keun;Oh, Ju-Taek;Won, Jai-Mu;Sung, Nak-Moon
    • Journal of Korean Society of Transportation
    • /
    • v.23 no.4 s.82
    • /
    • pp.47-55
    • /
    • 2005
  • In recent years, the rapid growth of vehicles have increased traffic crashes. Since they can cause the economic losses and have put the life qualify in danger, there should be numerous efforts to reduce traffic crashes. To reduce traffic crashes, this research seeks to improve the safety of intersections by analysing causations of injury severity with Ordered Probability Model. This research applied the Ordered Probit Model, which assumes that ${\epsilon}_i$(random error) is normally distributed, for model calibration and used $p^2$ (likelihood ratio) and $x^2$ (Chi-square) for model selection. The results show that minor road traffic, heavy vehicle rates, major and minor right-turn rates, presence of lightings, speed limits, instructive line for left-turn traffic are significant factors affecting crash severities at signalized intersections.

Control Charts for Ordinal Variables (순서형 변수를 위한 관리도)

  • Jang, Dae-Heung
    • Proceedings of the Korean Society for Quality Management Conference
    • /
    • 2006.04a
    • /
    • pp.330-333
    • /
    • 2006
  • Many practical problems of quality control in service management are derived from the use of ordinal variables. Ordered linguistic variables differ from measurement variables. This paper presents a new control chart of a production process based on ordinal variables.

  • PDF

Factors Influencing Crash Severity by the Types of Bus Transportation Services Using Ordered Probit Models (순서형 프로빗 모형을 이용한 버스 운송사업 유형 별 사고심각도 영향요인 분석)

  • YOON, Sangwon;KHO, Seung-Young;KIM, Dong-Kyu
    • Journal of Korean Society of Transportation
    • /
    • v.36 no.1
    • /
    • pp.13-22
    • /
    • 2018
  • Buses, one of the representative public transportation modes, are divided into a vareity of service types according to the purpose of operation, operating distance, and management agencies. Although bus-involved crashes may cause large amount of damage due to the higher number of passengers boarded on a bus, prior research has little focused on crash severity according to bus service types. This study aims to investigate factors influencing crash severity in bus-involved crashes and to present policy implications to reduce crash severity by bus service type. To do this, bus-involved crash data from the Traffic Accident Analysis System (TAAS) during five-year period are used. Ordered probit models for three types of bus service, i.e., city bus, suburban and express buses, and charter buses, are estimated to analyze the factors of accident severity. The results show that there are significant differences of factors affecting crash severity among the types of bus services while speed and road surface influence all the types of buses. In case of local buses, time of day, roadway alignment, and installation of a traffic signal are found to be statistically significant factors. Seat belt and road class have significant effects on injury severity of the intercity and express buses. Chartered buses have time of day, driving experience, seatbelt, traffic signal, and day of week as the significant factors. The results of this study are expected to contribute to the reduction of the crash severity by each bus service type.