Browse > Article
http://dx.doi.org/10.5351/KJAS.2020.33.1.061

On principal component analysis for interval-valued data  

Choi, Soojin (Department of Statistics, Hankuk University of Foreign Studies)
Kang, Kee-Hoon (Department of Statistics, Hankuk University of Foreign Studies)
Publication Information
The Korean Journal of Applied Statistics / v.33, no.1, 2020 , pp. 61-74 More about this Journal
Abstract
Interval-valued data, one type of symbolic data, are observed in the form of intervals rather than single values. Each interval-valued observation has an internal variation. Principal component analysis reduces the dimension of data by maximizing the variance of data. Therefore, the principal component analysis of the interval-valued data should account for the variance between observations as well as the variation within the observed intervals. In this paper, three principal component analysis methods for interval-valued data are summarized. In addition, a new method using a truncated normal distribution has been proposed instead of a uniform distribution in the conventional quantile method, because we believe think there is more information near the center point of the interval. Each method is compared using simulations and the relevant data set from the OECD. In the case of the quantile method, we draw a scatter plot of the principal component, and then identify the position and distribution of the quantiles by the arrow line representation method.
Keywords
center method; quantile method; symbolic data; truncated normal distribution; vertices method;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Billard, L. (2008). Sample covariance functions for complex quantitative data. In Mizuta M. and Nakano J. (Eds), Proceedings of the International Association of Statistical Computing, 157-163, Yokohama.
2 Billard, L. and Diday, E. (2006). Symbolic Data Analysis: Conceptual Statistics and Data Mining, Wiley, Chichester.
3 Cazes, P., Chouakria, A., Diday, E., and Schektman, Y. (1997). Extension de l'analyse en composantes principales a des donnees de type intervalle, Revue de statistique appliquee, 45, 5-24.
4 Chouakria, A. (1998). Extension des methodes d'analyse factorielles a des donnees de type intervalle, Ph.D. Dissertation, Universite Paris-Dauphine.
5 Chouakria, A., Billard, L., and Diday, E. (2011). Principal component analysis for interval-valued observations, Statistical Analysis and Data Mining, 4, 229-246.   DOI
6 Ichino, M. (2011). The quantile method for symbolic principal component analysis, Statistical Analysis and Data Mining, 4, 184-198.   DOI
7 Lauro, N. C., Verde, R., and Irpino, A. (2008). Principal component analysis of symbolic data described by intervals. In Diday, E. and Noirhomme-Fraiture, M. (Eds), Symbolic Data Analysis and the SODAS Software, Wiley, Chichester, 279-311.
8 Le-Rademacher, J. and Billard, L. (2012). Symbolic Covariance Principal Component Analysis and Visualization for Interval-Valued Data, Journal of Computational and Graphical Statistics, 21, 413-432.   DOI
9 Palumbo, F. and Lauro, N. C. (2003). A PCA for interval-valued data based on midpoints and radii. In Yanai, H., Okada, A., Shigemasu, K., Kano, Y. and Meulman, J. (Eds), New Developments in Psychometrics, 641-648.
10 Wang, H., Chen, M., Shi, X., and Li, N. (2016). Principal component analysis for normal-distribution-valued symbolic data, IEEE Transactions on Cybernetics, 46, 356-365.   DOI