Browse > Article
http://dx.doi.org/10.7319/kogsis.2014.22.4.175

Selection of Optimal Variables for Clustering of Seoul using Genetic Algorithm  

Kim, Hyung Jin (Department of Civil and Environmental Engineering, Yonsei University)
Jung, Jae Hoon (Department of Civil and Environmental Engineering, Yonsei University)
Lee, Jung Bin (Department of Civil and Environmental Engineering, Yonsei University)
Kim, Sang Min (Department of Civil and Environmental Engineering, Yonsei University)
Heo, Joon (Department of Civil and Environmental Engineering, Yonsei University)
Publication Information
Journal of Korean Society for Geospatial Information Science / v.22, no.4, 2014 , pp. 175-181 More about this Journal
Abstract
Korean government proposed a new initiative 'government 3.0' with which the administration will open its dataset to the public before requests. City of Seoul is the front runner in disclosure of government data. If we know what kind of attributes are governing factors for any given segmentation, these outcomes can be applied to real world problems of marketing and business strategy, and administrative decision makings. However, with respect to city of Seoul, selection of optimal variables from the open dataset up to several thousands of attributes would require a humongous amount of computation time because it might require a combinatorial optimization while maximizing dissimilarity measures between clusters. In this study, we acquired 718 attribute dataset from Statistics Korea and conducted an analysis to select the most suitable variables, which differentiate Gangnam from other districts, using the Genetic algorithm and Dunn's index. Also, we utilized the Microsoft Azure cloud computing system to speed up the process time. As the result, the optimal 28 variables were finally selected, and the validation result showed that those 28 variables effectively group the Gangnam from other districts using the Ward's minimum variance and K-means algorithm.
Keywords
Clustering; Dunn's Index; Ward's Minimum Variance; K-means Algorithm; Genetic Algorithm;
Citations & Related Records
Times Cited By KSCI : 3  (Citation Analysis)
연도 인용수 순위
1 Bezdek, J. C. and Nikhil R. P., 1995, Cluster validation with generalized dunn's indices, Proc. of the 2nd New Zealand Conference, pp. 190-193.
2 Hartigan, J. A. and Wong, M. A., 1979, Algorithm as 136: a k-means clustering algorithm. Journal of the Royal Statistical Society, Vol. 28, No. 1, pp. 100-108.
3 Hinneburg, A. and Kein, D. A., 1998, An efficient approach to clustering in large multimedia databases with noise, Proc. of the 4th International Conference on Knowledge Discovery and Data Mmining, pp. 58-65.
4 Kwak, S. Y., Nam, H. W. and Jun, C. M., 2012, An optimal model for indoor pedestrian evacuation considering the entire distribution of building pedestrians, Korea Society for Geospatial Information System, Vol. 20, No. 2, pp. 23-29.   과학기술학회마을   DOI
5 Kim, S. W. and Ahn, H. C., 2010, Development of an intelligent trading system using support vector machines and genetic algorithms, Korea Intelligent Information System Society, Vol. 16, No. 1, pp. 71-92.   과학기술학회마을
6 Kim, U. G., Ahn, W. S., Lee, C. Y. and Um, M. J., 2012, The optimal analysis of data preprocessing method for clustering the region of precipitation, Journal of Korean Society of Hazard Mitigation, Vol. 12, No. 5, pp. 233-240.   과학기술학회마을   DOI
7 Microsoft, 2014, Microsoft azure, http://azure.microsoft.com/ko-kr/
8 Milligan, G. W. and Cooper, M. C., 1985, Anexamination of procedures for determining the number of clusters in a data set, Psychometrika, Vol. 50, pp. 159-179.   DOI   ScienceOn
9 Rademacher, L., 2005, Combinatorial optimization, http://www-math.mit.edu/ -goemans/18433-FALL05.html.
10 Ray, A. and Srivastava, D. C., 2008, Non-linear least squares ellipse fitting using the genetic algorithm with applications to strain analysis, Journal of Structural Geology, Vol. 30, pp. 1593-1602.   DOI
11 Statistical Research Institute, 2008, Segmentation of rural areas based on the attributes of agricultural and fishing villages, Technical report, p. 40.