DOI QR코드

DOI QR Code

Discriminant analysis using empirical distribution function

  • Kim, Jae Young (Research Institute of Applied Statistics, Sungkyunkwan University) ;
  • Hong, Chong Sun (Department of Statistics, Sungkyunkwan University)
  • Received : 2017.07.18
  • Accepted : 2017.09.07
  • Published : 2017.09.30

Abstract

In this study, we propose an alternative method for discriminant analysis using a multivariate empirical distribution function to express multivariate data as a simple one-dimensional statistic. This method turns to be the estimation process of the optimal threshold based on classification accuracy measures and an empirical distribution function of data composed of classes. This can also be visually represented on a two-dimensional plane and discussed with some measures in ROC curves, surfaces, and manifolds. In order to explore the usefulness of this method for discriminant analysis in the study, we conducted comparisons between the proposed method and the existing methods through simulations and illustrative examples. It is found that the proposed method may have better performances for some cases.

Keywords

References

  1. Critchley, F. and Vitiello, C. (1991). The influence of observations on misclassification probability estimates in linear discriminant analysis. Biometrika, 78, 677-690. https://doi.org/10.1093/biomet/78.3.677
  2. Fung, W. K. (1992). Some diagnostic measures in discriminant analysis. Statistics & Probability Letters, 13, 279-285. https://doi.org/10.1016/0167-7152(92)90035-4
  3. Fung, W. K. (1996). The influence of observations on misclassification probability in multiple discriminant analysis. Communications in Statistics-Theory and Methods, 25, 1917-1930. https://doi.org/10.1080/03610929608831793
  4. Hong, C. S. (2012). SAS/SPSS and multivariate data analysis, Free Academy, Paju.
  5. Hong, C. S. and Joo, J. S. (2010). Optimal thresholds from non-normal mixture. Korean Journal of Applied Statistics, 23, 943-953. https://doi.org/10.5351/KJAS.2010.23.5.943
  6. Hong, C. S. and Jung, E. S. (2013). Optimal thresholds criteria for ROC surfaces. Journal of the Korean Data & Information Science Society, 24, 1489-1496. https://doi.org/10.7465/jkdi.2013.24.6.1489
  7. Hong, C. S., Park, J. and Park, Y. H. (2017). Multivariate empirical distribution functions and descriptive methods. The Korean Data & Information Science Society, 28, 87-98. https://doi.org/10.7465/jkdi.2017.28.1.87
  8. Jhun, M. S. and Choi, I. K. (2009). Adaptive nearest neighbors for classification. Korean Journal of Applied Statistics, 22, 479-488. https://doi.org/10.5351/KJAS.2009.22.3.479
  9. Hong, C. S. and Jung, D. G. (2014). Standard criterion of hypervolume under the ROC manifold. Journal of the Korean Data & Information Science Society, 25, 473-483. https://doi.org/10.7465/jkdi.2014.25.3.473
  10. Johnson, R. A. and Wichern, D. W. (2007). Applied multivariate statistical analysis, PrenticeHall International. INC., New Jersey.
  11. Lambert, J. and Lipkovich, I. (2008). A macro for getting more out of your ROC curve. SAS Global forum, 231.
  12. Pepe, M. S. (2003). The statistical evaluation of medical tests for classification and prediction, Oxford University Press, USA.
  13. Perkins, N. J. and Schisterman, E. F. (2006). The inconsistency of "optimal" cutpoints obtained using two criteria based on the receiver operating characteristic curve. American Journal of Epidemiology, 163, 670-675. https://doi.org/10.1093/aje/kwj063
  14. Tasche, D. (2006). Validation of internal rating systems and PD estimates. The Analytics of Risk Model Validation, 28, 169-196.
  15. Velez, D. R., White, B. C., Motsinger, A. A., Bush, W. S., Ritchie, M. D., Williams, S. M. and Moore, J. H. (2007). A balanced accuracy function for epistasis modeling in imbalanced datasets using multifactor dimensionality reduction. Genetic epidemiology, 31, 306-315. https://doi.org/10.1002/gepi.20211
  16. Yoo, H. S. and Hong, C. S. (2011). Optimal criterion of classification accuracy measures for normal mixture. Communications for Statistical Applications and Methods, 18, 343-355. https://doi.org/10.5351/CKSS.2011.18.3.343
  17. Youden, W. J. (1950). Index for rating diagnostic tests. Cancer, 3, 32-35. https://doi.org/10.1002/1097-0142(1950)3:1<32::AID-CNCR2820030106>3.0.CO;2-3