Browse > Article
http://dx.doi.org/10.7465/jkdi.2017.28.2.395

Clustering and classification to characterize daily electricity demand  

Park, Dain (Department of Statistics, Daegu University)
Yoon, Sanghoo (Department of Statistics and Computer Science, Daegu University & Institute of Basic Science, Daegu University)
Publication Information
Journal of the Korean Data and Information Science Society / v.28, no.2, 2017 , pp. 395-406 More about this Journal
Abstract
The purpose of this study is to identify the pattern of daily electricity demand through clustering and classification. The hourly data was collected by KPS (Korea Power Exchange) between 2008 and 2012. The time trend was eliminated for conducting the pattern of daily electricity demand because electricity demand data is times series data. We have considered k-means clustering, Gaussian mixture model clustering, and functional clustering in order to find the optimal clustering method. The classification analysis was conducted to understand the relationship between external factors, day of the week, holiday, and weather. Data was divided into training data and test data. Training data consisted of external factors and clustered number between 2008 and 2011. Test data was daily data of external factors in 2012. Decision tree, random forest, Support vector machine, and Naive Bayes were used. As a result, Gaussian model based clustering and random forest showed the best prediction performance when the number of cluster was 8.
Keywords
Classification analysis; Cluster analysis; Electricity demand; Machine learning;
Citations & Related Records
Times Cited By KSCI : 5  (Citation Analysis)
연도 인용수 순위
1 Breiman, L. (2001). Random forests. Machine learning, 45, 5-32.   DOI
2 Cho, H., Goude, Y., Brossat, X. and Yao, Q.(2013). Modeling and forecasting daily electricity load curves: A hybrid approach. Journal of the American Statistical Association, 108, 7-21.   DOI
3 Dimitriadou, E., Hornik, K., Leisch, F., Meyer, D. and Weingessel, A. (2005). Misc Functions of the Department of Statistics (e1071), TU Wien. R package version 1.5-7, http://CRAN.R-project.org/.
4 Fraley, C., Raftery, A. E., Scrucca, L., Murphy, T. B. and Fop, M. (2016). mclust: Normal mixture modelling for model-based clustering, classification, and density estimation, http://CRAN.R-project.org/package=mclust.Rpackageversion,5.
5 Hwang, H. M., Lee, S. H., Park, J. B., Park, Y. G., and Son, S. Y. (2015). Load forecasting using hierarchical clustering method for building. Journal of the Korean Institute of Illuminating and Electrical Installation Engineers, 59-65.
6 Therneau, T., Atkinson, B., Ripley, B., and Ripley, M. B. (2015). Package 'rpart', Available online cran.ma.ic.ac.uk/web/packages/rpart/rpart.pdf.
7 Wi, Y. M. and Min, Y. K. (2016). Weekly peak load forecasting using weather stochastic model and weather sensitivity. The Transactions of the Korean Institute of Electrical Engineers, 64, 41-47.
8 Yoon, S. H. and Choi, Y. J. (2015). Functional clustering for electricity demand data: A case study. Journal of the Korean Data & information Science Society, 26, 885-894.   DOI
9 Kang, D. H., Park, J. D. and Song, K. B. (2016). 24-Hour load forecasting for anomalous weather days using hourly temperature. The Transactions of The Korean Institute of Electrical Engineers, 65, 1144-1150.   DOI
10 Kim, C. H., Koo, B. G. and Park, J. H. (2012). Short-term electric load forecasting using data mining technique. Journal of Electrical Engineering & Technology, 7, 807-813.   DOI
11 Liaw, A, and Wiener, M. (2002). Classification and regression by randomForest. IR news, 2, 18-22
12 Lim, J. H., Kim, S. Y., Park, J. D. and Song, K. B. (2013). Representative temperature assessment for improvement of short-term load forecasting accuracy. Journal of the Korean Institute of Illuminating and Electrical Installation Engineers, 27, 39-43.
13 Ma, P., Castillo-Davis, C. I., Zhong, W. and Liu, J. S. (2006). A data-driven clustering method for time course gene expression data. Nucleic Acids Research, 34, 1261-1269.   DOI
14 MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. Proceedings of the fifth Berkeley Symposium on Mathematical Statistics and Probability, 1, 281-297.
15 Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A., Leisch, F., Chang, C. C. and Lin, C. C. (2015). Package 'e1071'. The Comprehensive R Archive Network, Available at https://cran.r-project.org/web/packages/e1071/e1071.pdf.
16 Park, C. (2016). A simple diagnostic statistic for determining the size of random forest. Journal of the Korean Data & information Science Society, 27, 855-863.   DOI
17 Scott, A. J. and Symons, M. J. (1971). Clustering methods based on likelihood ratio criteria. Biometrics, 27, 387-397.   DOI
18 Song, K. B., Baek, Y. S., Hong, D. H., and Jang, G. (2005). Short-term load forecasting for the holidays using fuzzy linear regression method. IEEE transactions on power systems, 20, 96-101.   DOI