DOI QR코드

DOI QR Code

Spatial Clustering Analysis based on Text Mining of Location-Based Social Media Data

위치기반 소셜 미디어 데이터의 텍스트 마이닝 기반 공간적 클러스터링 분석 연구

  • Park, Woo Jin (Center of Environmental Remediation and Risk Assessment, Seoul National University) ;
  • Yu, Ki Yun (Department of Civil & Environmental Engineering, Seoul National University)
  • 박우진 (서울대학교 환경정화기술 및 위해성평가 연구센터) ;
  • 유기윤 (서울대학교 건설환경공학부)
  • Received : 2015.05.21
  • Accepted : 2015.06.11
  • Published : 2015.06.30

Abstract

Location-based social media data have high potential to be used in various area such as big data, location based services and so on. In this study, we applied a series of analysis methodology to figure out how the important keywords in location-based social media are spatially distributed by analyzing text information. For this purpose, we collected tweet data with geo-tag in Gangnam district and its environs in Seoul for a month of August 2013. From this tweet data, principle keywords are extracted. Among these, keywords of three categories such as food, entertainment and work and study are selected and classified by category. The spatial clustering is conducted to the tweet data which contains keywords in each category. Clusters of each category are compared with buildings and benchmark POIs in the same position. As a result of comparison, clusters of food category showed high consistency with commercial areas of large scale. Clusters of entertainment category corresponded with theaters and sports complex. Clusters of work and study showed high consistency with areas where private institutes and office buildings are concentrated.

위치기반 소셜 미디어 데이터는 빅데이터, 위치기반서비스 등 다양한 분야에서 활용가능성이 매우 큰 데이터이다. 본 연구에서는 위치기반 소셜 미디어 데이터의 텍스트 정보를 분석하여 주요한 키워드들이 공간적으로 어떻게 분포하고 있는지를 파악할 수 있는 일련의 분석방법론을 적용해보았다. 이를 위해, 위치태그를 지닌 트윗 데이터를 서울시 강남지역과 그 주변지역에 대하여 2013년 8월 한달 간 수집하였으며, 이 데이터를 대상으로 하여 텍스트 마이닝을 통해 주요 키워드들을 도출하였다. 이러한 키워드들 중 음식, 엔터테인먼트, 업무 및 공부의 세 카테고리에 해당하는 키워드들만 추출, 분류하였으며 각 카테고리에 해당하는 트윗 데이터들에 대해서 공간적 클러스터링을 실시하였다. 도출된 각 카테고리별 클러스터들을 실제 그 지역의 건물 또는 벤치마크 POI들과 비교한 결과, 음식 카테고리 클러스터는 대규모 상업지역들과 일치도가 높았고 엔터테인먼트 카테고리의 클러스터는 공연장, 극장, 잠실운동장 등과 일치하였다. 업무 및 공부 카테고리 클러스터들은 학원 밀집지역 및 사무용 빌딩 밀집지역과 높은 일치도를 나타내었다.

Keywords

References

  1. Chae, J., Thom, D., Bosch, H., Jang, Y., Maciejewski, R., Ebert, D. and Ertl, T., 2012, Spatiotemporal social media analytics for abnormal event detection using seasonal-trend decomposition, Proceedings of IEEE Conference on Visual Analytics Science and Technology, IEEE, pp. 143-152.
  2. Choi, H. and Yom, J., 2014, Implementation of webGIS for integration of GIS spatial Analysis and social network analysis, Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography, Vol. 32, No. 2, pp. 95-107. https://doi.org/10.7848/ksgpc.2014.32.2.95
  3. Gerber, S., 2014, Predicting crime using Twitter and kernel density estimation, Decision Support Systems, Vol. 61, pp. 115-125. https://doi.org/10.1016/j.dss.2014.02.003
  4. Ghosh, D. and Guha, R., 2013, What are we 'tweeting' about obesity? Mapping tweets with topic modeling and Geographic Information System, Cartography and Geographic Information Science, Vol. 40, No. 2, pp. 90-102. https://doi.org/10.1080/15230406.2013.776210
  5. Java, A., Song, X., Finin, T. and Tseng, B., 2007, Why we Twitter: understanding microblogging usage and communities, Proceedings of WebKDD/ SNAKDD 2007, ACM, pp. 56-65.
  6. Kang, N., Kang, J and Yong, H., 2004, Performance comparison of clustering techniques for spatio-temporal data, Journal of Intelligence and Information Systems, Vol. 10, No. 2, pp. 15-37.
  7. Kim, M. and Park, S., 2014, Construction and application of POI database with spatial relations using SNS, Journal of Korea Spatial Information Society, Vol. 22, No. 4, pp. 21-38. https://doi.org/10.12672/ksis.2014.22.4.021
  8. Kouloumpis, E., Wilson, T. and Moore, J., 2011, Twitter sentiment analysis: The good the bad and the OMG! Proceedings of ICWSM 2011, AAAI, pp. 538-541.
  9. Mardia, K. and Kent, J., 1979, Multivariate Analysis, Academic Press.
  10. Mei, Q., Liu, C., Su, H. and Zhai, C., 2006, A probabilistic approach to spatiotemporal theme pattern mining on weblogs, Proceedings of the 15th international conference on World Wide Web, ACM, pp. 533-542.
  11. Park, W., Eo, S. and Yu, K., 2015, Analyzing spatial correlation between location-based social media data and real estates price index through rasterization, Journal of the Korean Society for Geo-Spatial Information System, Vol. 23, No. 1, pp. 23-29. https://doi.org/10.7319/kogsis.2015.23.1.023
  12. Qu, Z. and Liu, Y., 2011, Interactive group suggesting for Twitter, Proceedings of HLT 2011, ACL, pp. 519-523.
  13. Sakaki, T., Okazaki, M. and Matsuo, Y., 2010, Earthquake shakes Twitter users: real-time event detection by social sensors, Proceedings of the 19th International Conference on World Wide Web, ACM.
  14. San Diego State University, Center for Human Dynamics in the Mobile Age, 2015, GeoViewer, http://vision.sdsu.edu/hdma/geoviewer
  15. Shin, J., 2004, Research on areal interpolation methods and error measurement techniques for reorganizing incompatible regional data units, Journal of the Korean Association of Regional Geographers, Vol. 10, No. 2, pp. 389-406.
  16. Trendsmap solutions, 2009, Trendsmap, http://trendsmap.com
  17. Wang, Z. and Muller, J., 1998, Line generalization based on analysis of shape characteristics, Cartography and Geographic Information Systems, Vol. 25, No. 1, pp. 3-15. https://doi.org/10.1559/152304098782441750
  18. Widener, J. and Li, W., 2014, Using geolocated Twitter data to monitor the prevalence of healthy and unhealthy food references across the US, Applied Geography, Vol. 54, pp. 189-197. https://doi.org/10.1016/j.apgeog.2014.07.017
  19. Yu, K., 1998, Generalization of point feature in digital map through point pattern analysis, Journal of GIS Association of Korea, Vol. 6, No. 1, pp. 11-23.

Cited by

  1. Analyzing Customer Feedback Differences between VOCs and External Channels vol.41, pp.3, 2018, https://doi.org/10.11627/jkise.2018.41.3.129
  2. Inferring tweet location inference for twitter mining vol.24, pp.4, 2015, https://doi.org/10.1007/s41324-016-0041-y
  3. 도시 지역 트윗 데이터의 시간대별 공간분포 특성 - 부산광역시를 사례로 - vol.46, pp.2, 2015, https://doi.org/10.22640/lxsiri.2016.46.2.269