• Title/Summary/Keyword: 분류변수

Search Result 1,538, Processing Time 0.024 seconds

Comparison of Variable Importance Measures in Tree-based Classification (나무구조의 분류분석에서 변수 중요도에 대한 고찰)

  • Kim, Na-Young;Lee, Eun-Kyung
    • The Korean Journal of Applied Statistics
    • /
    • v.27 no.5
    • /
    • pp.717-729
    • /
    • 2014
  • Projection pursuit classification tree uses a 1-dimensional projection with the view of the most separating classes in each node. These projection coefficients contain information distinguishing two groups of classes from each other and can be used to calculate the importance measure of classification in each variable. This paper reviews the variable importance measure with increasing interest in line with growing data size. We compared the performances of projection pursuit classification tree with those of classification and regression tree(CART) and random forest. Projection pursuit classification tree are found to produce better performance in most cases, particularly with highly correlated variables. The importance measure of projection pursuit classification tree performs slightly better than the importance measure of random forest.

A comparative study of feature screening methods for ultrahigh dimensional multiclass classification (초고차원 다범주분류를 위한 변수선별 방법 비교 연구)

  • Lee, Kyungeun;Kim, Kyoung Hee;Shin, Seung Jun
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.5
    • /
    • pp.793-808
    • /
    • 2017
  • We compare various variable screening methods on multiclass classification problems when the data is ultrahigh-dimensional. Two different approaches were considered: (1) pairwise extension from binary classification via one versus one or one versus rest comparisons and (2) direct classification of multiclass responses. We conducted extensive simulation studies under different conditions: heavy tailed explanatory variables, correlated signal and noise variables, correlated joint distributions but uncorrelated marginals, and unbalanced response variables. We then analyzed real data to examine the performance of the methods. The results showed that model-free methods perform better for multiclass classification problems as well as binary ones.

Analysis and Classification of Acoustic Emission Signals During Wood Drying Using the Principal Component Analysis (주성분 분석을 이용한 목재 건조 중 발생하는 음향방출 신호의 해석 및 분류)

  • Kang, Ho-Yang;Kim, Ki-Bok
    • Journal of the Korean Society for Nondestructive Testing
    • /
    • v.23 no.3
    • /
    • pp.254-262
    • /
    • 2003
  • In this study, acoustic emission (AE) signals due to surface cracking and moisture movement in the flat-sawn boards of oak (Quercus Variablilis) during drying under the ambient conditions were analyzed and classified using the principal component analysis. The AE signals corresponding to surface cracking showed higher in peak amplitude and peak frequency, and shorter in rise time than those corresponding to moisture movement. To reduce the multicollinearity among AE features and to extract the significant AE parameters, correlation analysis was performed. Over 99% of the variance of AE parameters could be accounted for by the first to the fourth principal components. The classification feasibility and success rate were investigated in terms of two statistical classifiers having six independent variables (AE parameters) and six principal components. As a result, the statistical classifier having AE parameters showed the success rate of 70.0%. The statistical classifier having principal components showed the success rate of 87.5% which was considerably than that of the statistical classifier having AE parameters.

Variable Ordering Algorithms Using Problem Classifying (문제분류규칙을 이용한 변수 순서화 알고리즘)

  • Sohn, Surg-Won
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.4
    • /
    • pp.127-135
    • /
    • 2011
  • Efficient ordering of decision variables is one of the methods that find solutions quickly in the depth first search using backtracking. At this time, development of variables ordering algorithms considering dynamic and static properties of the problems is very important. However, to exploit optimal variable ordering algorithms appropriate to the problems. In this paper, we propose a problem classifying rule which provides problem type based on variables' properties, and use this rule to predict optimal type of variable ordering algorithms. We choose frequency allocation problem as a DS-type whose decision variables have dynamic and static properties, and estimate optimal variable ordering algorithm. We also show the usefulness of problem classifying rule by applying base station problem as a special case whose problem type is not generated from the presented rule.

Efficient variable selection method using conditional mutual information (조건부 상호정보를 이용한 분류분석에서의 변수선택)

  • Ahn, Chi Kyung;Kim, Donguk
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.5
    • /
    • pp.1079-1094
    • /
    • 2014
  • In this paper, we study efficient gene selection methods by using conditional mutual information. We suggest gene selection methods using conditional mutual information based on semiparametric methods utilizing multivariate normal distribution and Edgeworth approximation. We compare our suggested methods with other methods such as mutual information filter, SVM-RFE, Cai et al. (2009)'s gene selection (MIGS-original) in SVM classification. By these experiments, we show that gene selection methods using conditional mutual information based on semiparametric methods have better performance than mutual information filter. Furthermore, we show that they take far less computing time than Cai et al. (2009)'s gene selection but have similar performance.

Intelligence Package Development for UT Signal Pattern Recognition and Application to Classification of Defects in Austenitic Stainless Steel Weld (UT 신호형상 인식을 위한 Intelligence Package 개발과 Austenitic Stainless Steel Welding부 결함 분류에 관한 적용 연구)

  • Lee, Kang-Yong;Kim, Joon-Seob
    • Journal of the Korean Society for Nondestructive Testing
    • /
    • v.15 no.4
    • /
    • pp.531-539
    • /
    • 1996
  • The research for the classification of the artificial defects in welding parts is performed using the pattern recognition technology of ultrasonic signal. The signal pattern recognition package including the user defined function is developed to perform the digital signal processing, feature extraction, feature selection and classifier selection. The neural network classifier and the statistical classifiers such as the linear discriminant function classifier and the empirical Bayesian classifier are compared and discussed. The pattern recognition technique is applied to the classification of artificial defects such as notchs and a hole. If appropriately learned, the neural network classifier is concluded to be better than the statistical classifiers in the classification of the artificial defects.

  • PDF

Case Studies Regarding the Classification of Public Caves (공개동굴의 유형분류에 관한 사례연구)

  • Hong, Hyun-Chul
    • Journal of the Speleological Society of Korea
    • /
    • no.93
    • /
    • pp.13-25
    • /
    • 2009
  • This study, which includes case studies that provide information of cave tour resources, considered a variety of selected variables of the internal and external parts of caves with the expanded factors of the academic classification in caves. It uses the cluster analysis, one of the multivariate analysis techniques, and applied the results for review. As a result, public caves can present multiple classification criteria according to the factors of the surrounding area's human environment. The result, classified by the region in public caves, is derived from this study.

Combining Feature Variables for Improving the Accuracy of $Na\ddot{i}ve$ Bayes Classifiers (나이브베이즈분류기의 정확도 향상을 위한 자질변수통합)

  • Heo Min-Oh;Kim Byoung-Hee;Hwang Kyu-Baek;Zhang Byoung-Tak
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2005.07b
    • /
    • pp.727-729
    • /
    • 2005
  • 나이브베이즈분류기($na\ddot{i}ve$ Bayes classifier)는 학습, 적용 및 계산자원 이용의 측면에서 매우 효율적인 모델이다. 또한, 그 분류 성능 역시 다른 기법에 비해 크게 떨어지지 않음이 다양한 실험을 통해 보여져 왔다. 특히, 데이터를 생성한 실제 확률분포를 나이브베이즈분류기가 정확하게 표현할 수 있는 경우에는 최대의 효과를 볼 수 있다. 하지만, 실제 확률분포에 존재하는 조건부독립성(conditional independence)이 나이브베이즈분류기의 구조와 일치하지 않는 경우에는 성능이 하락할 수 있다. 보다 구체적으로, 각 자질변수(feature variable)들 사이에 확률적 의존관계(probabilistic dependency)가 존재하는 경우 성능 하락은 심화된다. 본 논문에서는 이러한 나이브베이즈분류기의 약점을 효율적으로 해결할 수 있는 자질변수의 통합기법을 제시한다. 자질변수의 통합은 각 변수들 사이의 관계를 명시적으로 표현해 주는 방법이며, 특히 상호정보량(mutual information)에 기반한 통합 변수의 선정이 성능 향상에 크게 기여함을 실험을 통해 보인다.

  • PDF

Geoacoustic Modeling for Analysis of Attenuation Characteristics using Chirp Acoustic Profiling data (광역주파수 음향반사자료의 감쇠특성 분석을 위한 지질음향모델링 기법 연구)

  • Chang Jae-Kyeong;Yang Sung-Jin
    • Geophysics and Geophysical Exploration
    • /
    • v.2 no.4
    • /
    • pp.202-208
    • /
    • 1999
  • We introduce a new acoustic parameter for the classification of seafloor sediments from chirp sonar acoustic profiling data. The acoustic parameter is defined as a derivative of the unwrapped phase of the Fourier transform of acoustic profiling data. Consequently, it represents the characteristics of attenuation by dissipative dispersion in sediments. And we estimated acoustic properties by geoacoustic modeling using Chirp data obtained from the different sedimentary facies. Our classification results, when compared with the results of analysis of sampled sediments, show that the acoustic parameter discriminates sedimentary facies and bottom hardness. Thus the method in this paper is expected to be an effective means of geoacoustic modeling of the seafloor.

  • PDF

Data Mining for Road Traffic Accident Type Classification (데이터 마이닝을 이용한 교통사고 심각도 분류분석)

  • 손소영;신형원
    • Journal of Korean Society of Transportation
    • /
    • v.16 no.4
    • /
    • pp.187-194
    • /
    • 1998
  • 본 연구는 교통사고 심각도와 관련된 중요변수를 찾고 이들 변수를 바탕으로 신경망, Decision Tree, 로지스틱 회귀분석을 이용하여 사고 심각도 분류 예측모형을 추정하였다. 다수의 범주형 변수로 이루어진 교통사고 통계원표상의 설명변수 들로부터 사고 심각도 변화에 영향력 있는 변수 선택을 위하여 독립성 검정을 위한 $x^2$ test와 Decision Tree를 이용하였고, 선택된 변수들은 신경망과 로지스틱 회귀분석의 기초로 이용되었다. 분석결과 세가지기법간에 분류정확도에는 유의한 차이가 없는 것으로 나타났다. 그러나 Decision Tree가 설명변수 선택능력과 분석수행시간, 사고 심각도 결정요인 식별의 용이함 측면에서 범주형 종속변수인 사고 심각도의 분석에 적합한 것으로 보이며 사고 심각도에는 보호장구가 가장 큰 영향을 미치는 것으로 재입증되었다.

  • PDF