• 제목/요약/키워드: Statistics Classification

검색결과 867건 처리시간 0.021초

Logistic Regression Classification by Principal Component Selection

  • Kim, Kiho;Lee, Seokho
    • Communications for Statistical Applications and Methods
    • /
    • 제21권1호
    • /
    • pp.61-68
    • /
    • 2014
  • We propose binary classification methods by modifying logistic regression classification. We use variable selection procedures instead of original variables to select the principal components. We describe the resulting classifiers and discuss their properties. The performance of our proposals are illustrated numerically and compared with other existing classification methods using synthetic and real datasets.

Discriminant Analysis of Binary Data by Using the Maximum Entropy Distribution

  • Lee, Jung Jin;Hwang, Joon
    • Communications for Statistical Applications and Methods
    • /
    • 제10권3호
    • /
    • pp.909-917
    • /
    • 2003
  • Although many classification models have been used to classify binary data, none of the classification models dominates all varying circumstances depending on the number of variables and the size of data(Asparoukhov and Krzanowski (2001)). This paper proposes a classification model which uses information on marginal distributions of sub-variables and its maximum entropy distribution. Classification experiments by using simulation are discussed.

A New Approach to Statistical Analysis of Electrical Fire and Classification of Electrical Fire Causes

  • Kim, Doo-Hyun;Lee, Jong-Ho;Kim, Sung-Chul
    • International Journal of Safety
    • /
    • 제6권2호
    • /
    • pp.17-21
    • /
    • 2007
  • This paper aims at the statistical analysis of electrical fire and classification of electrical fire causes to collect electrical fires data efficiently. Electrical fire statistics are produced to monitor the number and characteristics of fires attended by fire fighters, including the causes and effects of fire so that action can be taken to reduce the human and financial cost of fire. Electrical fires make up the majority of fires in Korea(including nearly 30% of total fires according to recent figures), The incorrect and biased knowledge for electrical fires changed the classification of certain types of fires, from non-electrical to electrical. It is convenient and required to develop the standardized form that makes, in the assessment of the cause of electrical fires, the fire fighters directly ticking the appropriate box on the fire report form or making an assessment of a text description. Therefore, it is highly recommended to develop electrical fire cause classification and electrical fire assessment on the fire statistics in order to categorize and assess electrical fires exactly. In this paper newly developed electrical fire cause classification structure, which is well-defined hierarchical structure so that there are not any relationship or overlap between cause categories, is suggested. Also fire statistics systems of foreign countries are introduced and compared.

Performance Comparison of Classication Methods with the Combinations of the Imputation and Gene Selection Methods

  • Kim, Dong-Uk;Nam, Jin-Hyun;Hong, Kyung-Ha
    • 응용통계연구
    • /
    • 제24권6호
    • /
    • pp.1103-1113
    • /
    • 2011
  • Gene expression data is obtained through many stages of an experiment and errors produced during the process may cause missing values. Due to the distinctness of the data so called 'small n large p', genes have to be selected for statistical analysis, like classification analysis. For this reason, imputation and gene selection are important in a microarray data analysis. In the literature, imputation, gene selection and classification analysis have been studied respectively. However, imputation, gene selection and classification analysis are sequential processing. For this aspect, we compare the performance of classification methods after imputation and gene selection methods are applied to microarray data. Numerical simulations are carried out to evaluate the classification methods that use various combinations of the imputation and gene selection methods.

The use of support vector machines in semi-supervised classification

  • Bae, Hyunjoo;Kim, Hyungwoo;Shin, Seung Jun
    • Communications for Statistical Applications and Methods
    • /
    • 제29권2호
    • /
    • pp.193-202
    • /
    • 2022
  • Semi-supervised learning has gained significant attention in recent applications. In this article, we provide a selective overview of popular semi-supervised methods and then propose a simple but effective algorithm for semi-supervised classification using support vector machines (SVM), one of the most popular binary classifiers in a machine learning community. The idea is simple as follows. First, we apply the dimension reduction to the unlabeled observations and cluster them to assign labels on the reduced space. SVM is then employed to the combined set of labeled and unlabeled observations to construct a classification rule. The use of SVM enables us to extend it to the nonlinear counterpart via kernel trick. Our numerical experiments under various scenarios demonstrate that the proposed method is promising in semi-supervised classification.

문화분류와 문화콘텐츠산업분류에 관한 연구 (A Study of A Cultural Classification and A Culture Contents Industrial Classification)

  • 안인자
    • 한국비블리아학회지
    • /
    • 제17권2호
    • /
    • pp.5-22
    • /
    • 2006
  • 문화분류와 문화콘텐츠산업분류는 관련 정책, 지원, 통계, 평가를 위한 필수적인 기본도구이며, 이과정은 순환되는 것을 알 수 있다. 이의 용례를 법, 문화지표, 통계, 평가항목, 관련 연구보고서 분석하여 살펴본 결과 단기적 목적에 따라 아주 다양하게 분류되는 것을 발견하였다. 본 논문에서는 콜론분류방법에 기초한 분류안을 제시하였으며, 분류요소로서 통신망, 매체, 장르, 문화영역구분을 사용하였다.

대안적인 분류기준: 오분류율곱 (Alternative Optimal Threshold Criteria: MFR)

  • 홍종선;김효민;김동규
    • 응용통계연구
    • /
    • 제27권5호
    • /
    • pp.773-786
    • /
    • 2014
  • 본 연구는 ROC 곡선에서 형성되는 면적 형태로 나타나는 분류정확도기준인 오분류율곱(multiplication of false rates; MFR)를 제안한다. MFR 기준과 다른 기준로부터 구한 최적분류점의 분류성과에 대하여 비교 분석한다. 다양한 분포함수에 대하여 최적분류점을 구하고 이에 대응하는 FNR과 FPR을 비교하면서 MFR의 특징과 장점을 유도한다. 일반적인 비용함수를 바탕으로 분류점에 대한 비용비율을 다양한 분류기준을 이용하여 구한다. 비용곡선에 대한 비용비율의 관계를 정리하여 MFR 기준의 장점을 탐색한다. MFR 기준의 정의를 다차원 ROC 분석으로 확장하고 다차원의 다른 분류기준과의 관계를 설명하면서 토론한다.

Functional Data Classification of Variable Stars

  • Park, Minjeong;Kim, Donghoh;Cho, Sinsup;Oh, Hee-Seok
    • Communications for Statistical Applications and Methods
    • /
    • 제20권4호
    • /
    • pp.271-281
    • /
    • 2013
  • This paper considers a problem of classification of variable stars based on functional data analysis. For a better understanding of galaxy structure and stellar evolution, various approaches for classification of variable stars have been studied. Several features that explain the characteristics of variable stars (such as color index, amplitude, period, and Fourier coefficients) were usually used to classify variable stars. Excluding other factors but focusing only on the curve shapes of variable stars, Deb and Singh (2009) proposed a classification procedure using multivariate principal component analysis. However, this approach is limited to accommodate some features of the light curve data that are unequally spaced in the phase domain and have some functional properties. In this paper, we propose a light curve estimation method that is suitable for functional data analysis, and provide a classification procedure for variable stars that combined the features of a light curve with existing functional data analysis methods. To evaluate its practical applicability, we apply the proposed classification procedure to the data sets of variable stars from the project STellar Astrophysics and Research on Exoplanets (STARE).

A Classification Method Using Data Reduction

  • Uhm, Daiho;Jun, Sung-Hae;Lee, Seung-Joo
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • 제12권1호
    • /
    • pp.1-5
    • /
    • 2012
  • Data reduction has been used widely in data mining for convenient analysis. Principal component analysis (PCA) and factor analysis (FA) methods are popular techniques. The PCA and FA reduce the number of variables to avoid the curse of dimensionality. The curse of dimensionality is to increase the computing time exponentially in proportion to the number of variables. So, many methods have been published for dimension reduction. Also, data augmentation is another approach to analyze data efficiently. Support vector machine (SVM) algorithm is a representative technique for dimension augmentation. The SVM maps original data to a feature space with high dimension to get the optimal decision plane. Both data reduction and augmentation have been used to solve diverse problems in data analysis. In this paper, we compare the strengths and weaknesses of dimension reduction and augmentation for classification and propose a classification method using data reduction for classification. We will carry out experiments for comparative studies to verify the performance of this research.

A Comparison Study of Classification Algorithms in Data Mining

  • Lee, Seung-Joo;Jun, Sung-Rae
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • 제8권1호
    • /
    • pp.1-5
    • /
    • 2008
  • Generally the analytical tools of data mining have two learning types which are supervised and unsupervised learning algorithms. Classification and prediction are main analysis tools for supervised learning. In this paper, we perform a comparison study of classification algorithms in data mining. We make comparative studies between popular classification algorithms which are LDA, QDA, kernel method, K-nearest neighbor, naive Bayesian, SVM, and CART. Also, we use almost all classification data sets of UCI machine learning repository for our experiments. According to our results, we are able to select proper algorithms for given classification data sets.