DOI QR코드

DOI QR Code

Location Inference of Twitter Users using Timeline Data

타임라인데이터를 이용한 트위터 사용자의 거주 지역 유추방법

  • Kang, Ae Tti (Department of Social Studies, Ewha Womans University) ;
  • Kang, Young Ok (Department of Social Studies, Ewha Womans University)
  • Received : 2015.03.13
  • Accepted : 2015.04.30
  • Published : 2015.04.30

Abstract

If one can infer the residential area of SNS users by analyzing the SNS big data, it can be an alternative by replacing the spatial big data researches which result from the location sparsity and ecological error. In this study, we developed the way of utilizing the daily life activity pattern, which can be found from timeline data of tweet users, to infer the residential areas of tweet users. We recognized the daily life activity pattern of tweet users from user's movement pattern and the regional cognition words that users text in tweet. The models based on user's movement and text are named as the daily movement pattern model and the daily activity field model, respectively. And then we selected the variables which are going to be utilized in each model. We defined the dependent variables as 0, if the residential areas that users tweet mainly are their home location(HL) and as 1, vice versa. According to our results, performed by the discriminant analysis, the hit ratio of the two models was 67.5%, 57.5% respectively. We tested both models by using the timeline data of the stress-related tweets. As a result, we inferred the residential areas of 5,301 users out of 48,235 users and could obtain 9,606 stress-related tweets with residential area. The results shows about 44 times increase by comparing to the geo-tagged tweets counts. We think that the methodology we have used in this study can be used not only to secure more location data in the study of SNS big data, but also to link the SNS big data with regional statistics in order to analyze the regional phenomenon.

SNS사용자의 거주 지역을 유추하여 그들이 생성한 데이터에 거주위치를 부여하는 것은 위치희박(location sparsity)과 생태학적 오류문제로 인해 연구결과의 신뢰성이 떨어진다는 평가를 받아온 공간빅데이터 연구에 대안이 될 수 있다. 본 연구에서는 Tweet 사용자의 거주 지역을 유추하는 방법으로 사용자 타임라인데이터 속에서 찾아낸 일상생활활동패턴을 이용하는 방법을 고안하였다. 트윗 사용자의 일상생활활동패턴은 이동궤적과 사용자의 언어(text)에서 확인할 수 있었으며 전자를 활용한 모델을 일상이동패턴모델, 후자를 활용한 모델을 일상 활동장 모델이라 명명하고 각각 모델에 입력될 변수를 선정하였다. 자신의 거주 지역에서 가장 높은 빈도의 트윗 발생 여부와 가장 높은 빈도의 거주행정구역 표현 단어를 사용하는지 아닌지를 종속변수로 한 판별분석을 실시하여 모델을 작성하였으며 설명력은 일상 이동패턴모델, 일상 활동장 모델 각각 67.5%, 57.5%였다. 이 모델을 스트레스 관련 트윗을 작성한 사용자의 타임라인데이터로 구성된 테스트데이터에 입력해본 결과 전체 사용자 48,235명 중 5,301명의 거주 지역을 유추하였고 이를 활용하여 위치 부여된 스트레스 관련 트윗 9,606개를 확보하였다. 본 연구의 유추기법을 통해 기존 SNS데이터 분석연구에서 사용하는 데이터 수집 방법보다 44배 많은 위치 부여 트윗을 확보할 수 있었다. 본 연구방법론은 SNS데이터를 이용한 연구에서 위치 부여된 데이터를 확보하는데 활용 가능할 것으로 판단되며, 각종 지역통계와 상관관계파악을 통해 지역적 현상 분석에도 SNS데이터를 이용할 수 있는 가능성을 높일 것으로 판단된다.

Keywords

References

  1. Achrekar, H; Gandhe, A; Lazarus, R; Yu, S. H; Liu, B. 2011, Predicting flu trends using twitter data. Paper presented at the Computer Communications Workshops (INFOCOM WKSHPS), 2011 IEEE Conference on.
  2. Backstrom, L; Sun, E; Marlow, C. 2010, Find me if you can: improving geographical prediction with social and spatial proximity. Paper presented at the Proceedings of the 19th international conference on World wide web.
  3. Bae, H. W; Bang, S. W. 2013, (with R) Discriminant analysis and Logistic Regression analysis, Kyowoosa, Seoul
  4. Bollen, J; Mao, H; Pepe, A. 2011, Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena. Paper presented at the ICWSM.
  5. Cheng, Z; Caverlee, J; Lee, K. 2010, You are where you tweet: a content-based approach to geo-locating twitter users. Paper presented at the Proceedings of the 19th ACM international conference on Information and knowle
  6. Choi, H; Varian, H. 2012, Predicting the present with google trends. Economic Record, 88(s1): 2-9. https://doi.org/10.1111/j.1475-4932.2012.00809.x
  7. Davis Jr, C. A; Pappa, G. L; de Oliveira; D. R. R; de L Arcanjo, F. 2011, Inferring the location of Twitter messages based on user relationships. Transactions in GIS, 15(6):735-751. https://doi.org/10.1111/j.1467-9671.2011.01297.x
  8. Fujisaka, T; Lee, R; Sumiya, K. 2010, Discovery of user behavior patterns from geo-tagged microblogs. Paper presented at the Proceedings of the 4th International Conference on Uniquitous Information Management and Commdgemanagement.
  9. Ghosh, D; Guha, R. 2013, What are we 'tweeting' about obesity? Mapping tweets with topic modeling and Geographic Information System. Cartography and Geographic Information Science, 40(2):90-102. https://doi.org/10.1080/15230406.2013.776210
  10. Hecht, B; Hong, L; Suh, B; Chi, E. H. 2011, Tweets from Justin Bieber's heart: the dynamics of the location field in user profiles. Paper presented at the Proceedings of the SIGCHI Conference on Human Factors in Computing.
  11. Ikawa, Y; Enoki, M; Tatsubori, M. 2012, Location inference using microblog messages. Paper presented at the Proceedings of the 21st international conference companion on World Wide Web.
  12. Kent, J. D; Capello Jr, H. T. 2013, Spatial patterns and demographic indicators of effective social media content during theHorsethief Canyon fire of 2012. Cartography and Geographic Information Science, 40(2):78-89. https://doi.org/10.1080/15230406.2013.776727
  13. Kim, K. S; Kim, H.T. 2008, Analysis on the Excess Commuting Travel Time, Korea Development Institute.
  14. Kim, S. J. 2013, Analyzing political attitudes of Twitter users by extracting sentiment from user timeline, The Catholic University of Korea, Bucheon.
  15. Kim, Y. H; Shin, S. 2013, The current status of use SNS in Korea, Korea Information Society Development Institute.
  16. Kwak, H; Lee, C; Park, H ; Moon, S. 2010, What is Twitter, a social network or a news media? Paper presented at the Proceedings of the 19th international conference on World wide web.
  17. Lee, D. W; Kang, H. K; Kim, S. H; Lee, C. M. 2013, Autocorrelation Analysis of the Sentiment with Stock Information Appearing on Big-Data. The Korean Journal Of Financial Engineering, 12(2):79-96.
  18. Lee, H. S; Lim, J. H. 2013, SPSS 20.0 Manual, Zip-hyunjae, Seoul.
  19. Li, L; Goodchild, M. F; Xu, B. 2013, Spatial, temporal, and socioeconomic patterns in the use of Twitter and Flickr. Cartography and Geographic Information Science, 40(2):61-77. https://doi.org/10.1080/15230406.2013.777139
  20. Mayer-Schonberger, V; Cukier, K. 2013, Big data: A revolution that will transform how we live, work, and think: Houghton Mifflin Harcourt.
  21. Mitchell, L; Frank, M. R; Harris, K. D; Dodds, P. S; Danforth, C. M. 2013, The Geography of Happiness: Connecting Twitter sentiment and expression, demographics, and objective characteristics of place. PloS one, 8(5):e64417. https://doi.org/10.1371/journal.pone.0064417
  22. Noulas, A; Scellato, S; Mascolo, C; Pontil, M. 2011, An Empirical Study of Geographic User Activity Patterns in Foursquare. ICWSM, 11:70-573.
  23. Park, D. Y; Park, D. J. 2013, R and Statistical analysis, Jayu Academy, Paju.
  24. Roick, O; Heuser, S. 2013, Location Based Social Networks-Definition, Current State of the Art and Research Agenda. Transactions in GIS.
  25. Seo, T. W. 2012, A Study of Real-time Disaster Information Extraction and Displayusing the Mash-up based on SNS, Bukyung University, Busan.
  26. Sung, T. J. 2014, (Using SPSS/AMOS/HLM) Easy Statistical Analysis, HakJeeSa, Seoul.

Cited by

  1. Recent research trends for geospatial information explored by Twitter data vol.24, pp.2, 2015, https://doi.org/10.1007/s41324-016-0007-0
  2. Evaluating residential location inference of twitter users at district level: focused on Seoul city vol.24, pp.4, 2015, https://doi.org/10.1007/s41324-016-0039-5
  3. Inferring tweet location inference for twitter mining vol.24, pp.4, 2015, https://doi.org/10.1007/s41324-016-0041-y
  4. 도시 지역 트윗 데이터의 시간대별 공간분포 특성 - 부산광역시를 사례로 - vol.46, pp.2, 2015, https://doi.org/10.22640/lxsiri.2016.46.2.269
  5. 토픽 모델링을 이용한 트위터 데이터의 공간 분포 패턴 분석 vol.23, pp.2, 2017, https://doi.org/10.26863/jkarg.2017.05.23.2.376