• Title/Summary/Keyword: Accuracy Statistics

Search Result 823, Processing Time 0.027 seconds

Tree size determination for classification ensemble

  • Choi, Sung Hoon;Kim, Hyunjoong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.1
    • /
    • pp.255-264
    • /
    • 2016
  • Classification is a predictive modeling for a categorical target variable. Various classification ensemble methods, which predict with better accuracy by combining multiple classifiers, became a powerful machine learning and data mining paradigm. Well-known methodologies of classification ensemble are boosting, bagging and random forest. In this article, we assume that decision trees are used as classifiers in the ensemble. Further, we hypothesized that tree size affects classification accuracy. To study how the tree size in uences accuracy, we performed experiments using twenty-eight data sets. Then we compare the performances of ensemble algorithms; bagging, double-bagging, boosting and random forest, with different tree sizes in the experiment.

TPR-TNR plot for confusion matrix

  • Hong, Chong Sun;Oh, Tae Gyu
    • Communications for Statistical Applications and Methods
    • /
    • v.28 no.2
    • /
    • pp.161-169
    • /
    • 2021
  • The two-dimensional confusion matrix used in credit assessment, biostatistics, and many other fields consists of true positive, true negative, false positive, and false negative. Their rates, such as the true positive rate (TPR), true negative rate (TNR), false positive rate, and false negative rate, can be applied to measure its accuracy. In this study, we propose the TPR-TNR plot, a graphical method that can geometrically describe and explain these rates based on the confusion matrix. The proposed TPR-TNR plot consists of two right-angled triangles. We obtain that the TPR and TNR describe the acute angles of right-angled triangles in the plot. These acute angles can be used to determine optimal thresholds corresponding to lots of accuracy measures.

On the Bias of Bootstrap Model Selection Criteria

  • Kee-Won Lee;Songyong Sim
    • Journal of the Korean Statistical Society
    • /
    • v.25 no.2
    • /
    • pp.195-203
    • /
    • 1996
  • A bootstrap method is used to correct the apparent downward bias of a naive plug-in bootstrap model selection criterion, which is shown to enjoy a high degree of accuracy. Comparison of bootstrap method with the asymptotic method is made through an illustrative example.

  • PDF

Testing Outliers in Nonlinear Regression

  • Kahng, Myung-Wook
    • Journal of the Korean Statistical Society
    • /
    • v.24 no.2
    • /
    • pp.419-437
    • /
    • 1995
  • Given the specific mean shift outlier model, several standard approaches to obtaining test statistic for outliers are discussed. Each of these is developed in detail for the nonlinear regression model, and each leads to an equivalent distribution. The geometric interpretations of the statistics and accuracy of linear approximation are also presented.

  • PDF

AROC Curve and Optimal Threshold (AROC 곡선과 최적분류점)

  • Hong, Chong-Sun;Lee, Hee-Jung
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.1
    • /
    • pp.185-191
    • /
    • 2011
  • In the credit evaluation study with the assumption of mixture distributions, the ROC curve is a useful method to explore the discriminatory power of default and non-default borrowers. The AROC curve is an adjusted ROC curve that can be identified with the corresponding score and is mathematically analyzed in this work. We obtain patterns of this curve by applying normal distributions. Moreover, the relationship between the AROC curve and many classification accuracy statistics are explored to find the optimal threshold. In the case of equivalent variances of two distributions, we obtain that the local minimum of the AROC curve is estimated at the optimal threshold to maximize certain classification accuracies.

DR-LSTM: Dimension reduction based deep learning approach to predict stock price

  • Ah-ram Lee;Jae Youn Ahn;Ji Eun Choi;Kyongwon Kim
    • Communications for Statistical Applications and Methods
    • /
    • v.31 no.2
    • /
    • pp.213-234
    • /
    • 2024
  • In recent decades, increasing research attention has been directed toward predicting the price of stocks in financial markets using deep learning methods. For instance, recurrent neural network (RNN) is known to be competitive for datasets with time-series data. Long short term memory (LSTM) further improves RNN by providing an alternative approach to the gradient loss problem. LSTM has its own advantage in predictive accuracy by retaining memory for a longer time. In this paper, we combine both supervised and unsupervised dimension reduction methods with LSTM to enhance the forecasting performance and refer to this as a dimension reduction based LSTM (DR-LSTM) approach. For a supervised dimension reduction method, we use methods such as sliced inverse regression (SIR), sparse SIR, and kernel SIR. Furthermore, principal component analysis (PCA), sparse PCA, and kernel PCA are used as unsupervised dimension reduction methods. Using datasets of real stock market index (S&P 500, STOXX Europe 600, and KOSPI), we present a comparative study on predictive accuracy between six DR-LSTM methods and time series modeling.

Fixed Accuracy Confidence Set for the Autocorrelations of Linear Processes

  • Lee, Sang-Yeol
    • Communications for Statistical Applications and Methods
    • /
    • v.4 no.2
    • /
    • pp.345-351
    • /
    • 1997
  • This paper considers the problem of sequential fixed accuracy confidence set procedure of the aurocorrelations of stationary linear processes. The proposed procedure for fixed-width confidence set is shown to be both asymptotically consistent and asymptotically efficient as the size of the width approaches zero.

  • PDF

Model-Free Interval Prediction in a Class of Time Series with Varying Coefficients

  • Park, Sang-Woo;Cho, Sin-Sup;Lee, Sang-Yeol;Hwang, Sun-Y.
    • Journal of the Korean Data and Information Science Society
    • /
    • v.11 no.2
    • /
    • pp.173-179
    • /
    • 2000
  • Interval prediction based on the empirical distribution function for the class of time series with time varying coefficients is discussed. To this end, strong mixing property of the model is shown and results due to Fotopoulos et. al.(1994) are employed. A simulation study is presented to assess the accuracy of the proposed interval predictor.

  • PDF

Detection of superior genotype of fatty acid synthase in Korean native cattle by an environment-adjusted statistical model

  • Lee, Jea-Young;Oh, Dong-Yep;Kim, Hyun-Ji;Jang, Gab-Sue;Lee, Seung-Uk
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.30 no.6
    • /
    • pp.765-772
    • /
    • 2017
  • Objective: This study examines the genetic factors influencing the phenotypes (four economic traits:oleic acid [C18:1], monounsaturated fatty acids, carcass weight, and marbling score) of Hanwoo. Methods: To enhance the accuracy of the genetic analysis, the study proposes a new statistical model that excludes environmental factors. A statistically adjusted, analysis of covariance model of environmental and genetic factors was developed, and estimated environmental effects (covariate effects of age and effects of calving farms) were excluded from the model. Results: The accuracy was compared before and after adjustment. The accuracy of the best single nucleotide polymorphism (SNP) in C18:1 increased from 60.16% to 74.26%, and that of the two-factor interaction increased from 58.69% to 87.19%. Also, superior SNPs and SNP interactions were identified using the multifactor dimensionality reduction method in Table 1 to 4. Finally, high- and low-risk genotypes were compared based on their mean scores for each trait. Conclusion: The proposed method significantly improved the analysis accuracy and identified superior gene-gene interactions and genotypes for each of the four economic traits of Hanwoo.