• Title/Summary/Keyword: Accuracy Statistics

Search Result 823, Processing Time 0.02 seconds

Optimal Thresholds from Non-Normal Mixture (비정규 혼합분포에서의 최적분류점)

  • Hong, Chong-Sun;Joo, Jae-Seon
    • The Korean Journal of Applied Statistics
    • /
    • v.23 no.5
    • /
    • pp.943-953
    • /
    • 2010
  • From a mixture distribution of the score random variable for credit evaluation, there are many methods of estimating optimal thresholds. Most the research news is based on the assumption of normal distributions. In this paper, we extend non-normal distributions such as Weibull, Logistic and Gamma distributions to estimate an optimal threshold by using a hypotheses test method and other methods maximizing the total accuracy and the true rate. The type I and II errors are obtained and compared with their sums. Finally we discuss their e ciency and derive conclusions for non-normal distributions.

Comparison of Feature Selection Methods in Support Vector Machines (지지벡터기계의 변수 선택방법 비교)

  • Kim, Kwangsu;Park, Changyi
    • The Korean Journal of Applied Statistics
    • /
    • v.26 no.1
    • /
    • pp.131-139
    • /
    • 2013
  • Support vector machines(SVM) may perform poorly in the presence of noise variables; in addition, it is difficult to identify the importance of each variable in the resulting classifier. A feature selection can improve the interpretability and the accuracy of SVM. Most existing studies concern feature selection in the linear SVM through penalty functions yielding sparse solutions. Note that one usually adopts nonlinear kernels for the accuracy of classification in practice. Hence feature selection is still desirable for nonlinear SVMs. In this paper, we compare the performances of nonlinear feature selection methods such as component selection and smoothing operator(COSSO) and kernel iterative feature extraction(KNIFE) on simulated and real data sets.

Implementation of Mahalanobis-Taguchi System for the Election of Major League Baseball Hitters to the Hall of Fame (메이저리그 타자들의 명예의 전당 입성과 탈락에 대한 Mahalanobis-Taguchi System의 적용과 비교)

  • Kim, Su Whan;Park, Changsoon
    • The Korean Journal of Applied Statistics
    • /
    • v.26 no.2
    • /
    • pp.223-236
    • /
    • 2013
  • Various statistical classification methods to predict election to the Major League Baseball hall of fame of are implemented and their accuracies are compared. Seventeen independent variables are selected from the data of candidates eligible for the hall of fame and well-known classification methods such as discriminant analysis and logistic regression as well as the recently proposed Mahalanobis-Taguchi system(MTS). The MTS showed a better performance than the others in classification accuracy because it is especially efficient in cases where multivariate data does not constitute directionally geographical groups according to attributes.

Bias corrected non-response estimation using nonparametric function estimation of super population model (선형 응답률 모형에서 초모집단 모형의 비모수적 함수 추정을 이용한 무응답 편향 보정 추정)

  • Sim, Joo-Yong;Shin, Key-Il
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.6
    • /
    • pp.923-936
    • /
    • 2021
  • A large number of non-responses are occurring in the sample survey, and various methods have been developed to deal with them appropriately. In particular, the bias caused by non-ignorable non-response greatly reduces the accuracy of estimation and makes non-response processing difficult. Recently, Chung and Shin (2017, 2020) proposed an estimator that improves the accuracy of estimation using parametric super-population model and response rate model. In this study, we suggested a bias corrected non-response mean estimator using a nonparametric function generalizing the form of a parametric super-population model. We confirmed the superiority of the proposed estimator through simulation studies.

Correlated variable importance for random forests (랜덤포레스트를 위한 상관예측변수 중요도)

  • Shin, Seung Beom;Cho, Hyung Jun
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.2
    • /
    • pp.177-190
    • /
    • 2021
  • Random forests is a popular method that improves the instability and accuracy of decision trees by ensembles. In contrast to increasing the accuracy, the ease of interpretation is sacrificed; hence, to compensate for this, variable importance is provided. The variable importance indicates which variable plays a role more importantly in constructing the random forests. However, when a predictor is correlated with other predictors, the variable importance of the existing importance algorithm may be distorted. The downward bias of correlated predictors may reduce the importance of truly important predictors. We propose a new algorithm remedying the downward bias of correlated predictors. The performance of the proposed algorithm is demonstrated by the simulated data and illustrated by the real data.

Performance comparison for automatic forecasting functions in R (R에서 자동화 예측 함수에 대한 성능 비교)

  • Oh, Jiu;Seong, Byeongchan
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.5
    • /
    • pp.645-655
    • /
    • 2022
  • In this paper, we investigate automatic functions for time series forecasting in R system and compare their performances. For the exponential smoothing models and ARIMA (autoregressive integrated moving average) models, we focus on the representative time series forecasting functions in R: forecast::ets(), forecast::auto.arima(), smooth::es() and smooth::auto.ssarima(). In order to compare their forecast performances, we use M3-Competiti on data consisting of 3,003 time series and adopt 3 accuracy measures. It is confirmed that each of the four automatic forecasting functions has strengths and weaknesses in the flexibility and convenience for time series modeling, forecasting accuracy, and execution time.

Area-wise relational knowledge distillation

  • Sungchul Cho;Sangje Park;Changwon Lim
    • Communications for Statistical Applications and Methods
    • /
    • v.30 no.5
    • /
    • pp.501-516
    • /
    • 2023
  • Knowledge distillation (KD) refers to extracting knowledge from a large and complex model (teacher) and transferring it to a relatively small model (student). This can be done by training the teacher model to obtain the activation function values of the hidden or the output layers and then retraining the student model using the same training data with the obtained values. Recently, relational KD (RKD) has been proposed to extract knowledge about relative differences in training data. This method improved the performance of the student model compared to conventional KDs. In this paper, we propose a new method for RKD by introducing a new loss function for RKD. The proposed loss function is defined using the area difference between the teacher model and the student model in a specific hidden layer, and it is shown that the model can be successfully compressed, and the generalization performance of the model can be improved. We demonstrate that the accuracy of the model applying the method proposed in the study of model compression of audio data is up to 1.8% higher than that of the existing method. For the study of model generalization, we demonstrate that the model has up to 0.5% better performance in accuracy when introducing the RKD method to self-KD using image data.

An improved fuzzy c-means method based on multivariate skew-normal distribution for brain MR image segmentation

  • Guiyuan Zhu;Shengyang Liao;Tianming Zhan;Yunjie Chen
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.8
    • /
    • pp.2082-2102
    • /
    • 2024
  • Accurate segmentation of magnetic resonance (MR) images is crucial for providing doctors with effective quantitative information for diagnosis. However, the presence of weak boundaries, intensity inhomogeneity, and noise in the images poses challenges for segmentation models to achieve optimal results. While deep learning models can offer relatively accurate results, the scarcity of labeled medical imaging data increases the risk of overfitting. To tackle this issue, this paper proposes a novel fuzzy c-means (FCM) model that integrates a deep learning approach. To address the limited accuracy of traditional FCM models, which employ Euclidean distance as a distance measure, we introduce a measurement function based on the skewed normal distribution. This function enables us to capture more precise information about the distribution of the image. Additionally, we construct a regularization term based on the Kullback-Leibler (KL) divergence of high-confidence deep learning results. This regularization term helps enhance the final segmentation accuracy of the model. Moreover, we incorporate orthogonal basis functions to estimate the bias field and integrate it into the improved FCM method. This integration allows our method to simultaneously segment the image and estimate the bias field. The experimental results on both simulated and real brain MR images demonstrate the robustness of our method, highlighting its superiority over other advanced segmentation algorithms.

Discriminant analysis using empirical distribution function

  • Kim, Jae Young;Hong, Chong Sun
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.5
    • /
    • pp.1179-1189
    • /
    • 2017
  • In this study, we propose an alternative method for discriminant analysis using a multivariate empirical distribution function to express multivariate data as a simple one-dimensional statistic. This method turns to be the estimation process of the optimal threshold based on classification accuracy measures and an empirical distribution function of data composed of classes. This can also be visually represented on a two-dimensional plane and discussed with some measures in ROC curves, surfaces, and manifolds. In order to explore the usefulness of this method for discriminant analysis in the study, we conducted comparisons between the proposed method and the existing methods through simulations and illustrative examples. It is found that the proposed method may have better performances for some cases.

Optimal k-Nearest Neighborhood Classifier Using Genetic Algorithm (유전알고리즘을 이용한 최적 k-최근접이웃 분류기)

  • Park, Chong-Sun;Huh, Kyun
    • Communications for Statistical Applications and Methods
    • /
    • v.17 no.1
    • /
    • pp.17-27
    • /
    • 2010
  • Feature selection and feature weighting are useful techniques for improving the classification accuracy of k-Nearest Neighbor (k-NN) classifier. The main propose of feature selection and feature weighting is to reduce the number of features, by eliminating irrelevant and redundant features, while simultaneously maintaining or enhancing classification accuracy. In this paper, a novel hybrid approach is proposed for simultaneous feature selection, feature weighting and choice of k in k-NN classifier based on Genetic Algorithm. The results have indicated that the proposed algorithm is quite comparable with and superior to existing classifiers with or without feature selection and feature weighting capability.