Browse > Article
http://dx.doi.org/10.5351/KJAS.2019.32.1.111

A study on the number of passengers using the subway stations in Seoul  

Cho, Soojin (Department of Statistics, Ewha Womans University)
Kim, Bogyeong (Department of Statistics, Ewha Womans University)
Kim, Nahyun (Department of Statistics, Ewha Womans University)
Song, Jongwoo (Department of Statistics, Ewha Womans University)
Publication Information
The Korean Journal of Applied Statistics / v.32, no.1, 2019 , pp. 111-128 More about this Journal
Abstract
Subways are eco-friendly public transportation that can transport large numbers of passengers safely and quickly. It is necessary to predict the accurate number of passengers in order to increase public interest in subway. This study groups stations on Lines 1 to 9 of the Seoul Metropolitan Subway using clustering analysis. We propose one final prediction model for all stations and three optimal prediction models for each cluster. We found three groups of stations out of 294 total subway stations. The Group 1 area is industrial and commercial, the Group 2 ares is residential and commercial, and the Group 3 area is residential districts. Various data mining techniques were conducted for each group, as well as driving some influential factors on demand prediction. We use our model to predict the number of passengers for 8 new stations which are part of the 3rd extension plan of Seoul metro line 9 opened in October 2018. The estimated average number of passengers per hour is from 241 to 452 and the estimated maximum number of passengers per hour is from 969 to 1515. We believe our analysis can help improve the efficiency of public transportation policy.
Keywords
subway; demand prediction; GMM; extreme gradient boosting; random forest; linear model;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 Cortes, C. and Vapnik, V. (1995). Support-vector networks, Machine Learning, 20, 273-297.   DOI
2 Horel, A. E. and Kennard, R. W. (1970). Ridge regression: biased estimation for nonorthogonal problems, Technometrics, 12, 55-67.   DOI
3 Douglas, R. (2015). Gaussian mixture models, Encyclopedia of biometrics.
4 Kim, J. I. (2013). The determinants of subway riderships at AM-peak in Daegu metropolitan city: focusing on the land use of station neighborhood areas, Journal of Transport Research, 20, 15-25.   DOI
5 Kim, J. S. (2016). Subway congestion prediction and recommendation system using big data analysis, Journal of Digital Convergence, 14, 289-295.   DOI
6 Lee, J., Go, J. Y., Jeon, S., and Jun, C. (2015). A study of land use characteristics by types of subway station areas in Seoul analyzing patterns of transit ridership, The Korea Spatial Planning Review, 84, 35-53.   DOI
7 R Development Core Team (2010). R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, ISBN 3-900051-07-0. http://www.R-project.org.
8 Ridgeway, G. (2007). Generalized Boosted Models: A guide to the gbm package, https://cran.r-project.org/web/packages/gbm
9 Shon, E. Y., Kwon, B. W., and Lee, M. H. (2004). Modelling the subway demand estimation by station using the multiple regression analysis by category, Journal of Korea Society of Transportation, 22, 33-42.
10 Song, J. (1991). A study on prediction of passenger demand in Seoul Subway, Statistical Consulting, 6.
11 Tianqi, C. and Carlos, G. (2016). XGBoost: A Scalable Tree Boosting System, KDD '16 Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785-794.
12 Tibshirani, R. (1996). Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society B, 58, 267-288.
13 Breiman, L., Friedman. J., Olshen, R., and Stone, C. (1984). Classification and Regression Trees, Chapman and Hall, New York.
14 Breiman, L. (2001). Random forests, Machine Learning, 45, 5-32.   DOI