Dynamic Seed Selection for Twitter Data Collection

트위터 데이터 수집을 위한 동적 시드 선택

  • Received : 2014.02.05
  • Accepted : 2014.06.10
  • Published : 2014.08.15


Analysis of social media such as Twitter can yield interesting perspectives to understanding human behavior, detecting hot issues, identifying influential people, or discovering a group and community. However, it is difficult to gather the data relevant to specific topics due to the main characteristics of social media data; data is large, noisy, and dynamic. This paper proposes a new algorithm that dynamically selects the seed nodes to efficiently collect tweets relevant to topics. The algorithm utilizes attributes of users to evaluate the user influence, and dynamically selects the seed nodes during the collection process. We evaluate the proposed algorithm with real tweet data, and get satisfactory performance results.

트위터와 같은 소셜 네트워크 분석은 인간의 행동을 이해하거나, 화제가 되는 주제를 탐지하거나, 영향력 있는 사람을 식별하거나, 커뮤니티나 그룹을 발견하는데 흥미로운 시각을 제공할 수 있다. 하지만 소셜 네트워크가 가지는 특성(즉 데이터가 방대하고, 정교하지 않으며 또한 동적인 특성)으로 인하여 소셜 네트워크에서 주제와 연관이 있는 데이터를 수집하는 것은 어려운 일이다. 본 논문은 주어진 주제와 관련 있는 트윗을 효과적으로 수집하기 위하여 시드 노드를 동적으로 선택하는 알고리즘을 제안한다. 본 알고리즘은 사용자의 영향력을 측정하기 위하여 사용자 속성을 활용하며, 수집 프로세스 중에 시드 노드를 동적으로 할당한다. 우리는 제안한 알고리즘을 실제 트윗 데이터에 적용하였으며, 만족할 만한 성능결과를 얻었다.



  1. J. Cortizo, F. Carrero, J. Gomez, B. Monsalve, and P. Puertas, "Introduction to Mining Social Media," Proceedings of the 1st International Workshop on Mining Social Media, pp.1-3, 2009.
  2. I. King, J. Li, and K. Chan, "A Brief Survey of Computational Approaches in Social Computing," Proceedings of the IEEE 2009 International Joint Conference on Neural Networks, pp.2699-2706, Piscataway, 2009.
  3. T. Sakaki, M. Okazaki, and Y. Matsuo, "Earthquake Shakes Twitter Users: Real-time Event Detection by Social Sensors," Proceedings of the 19th International Conference on World Wide Web, pp.851-560, Raleigh, 2010.
  4. E. Aramaki, S. Maskawa, and M. Morita, "Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter," Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp.1568-1576, Edinburgh, 2011.
  5. D. Correa and A. Sureka, "Mining Tweets for Tag Recommendation on Social Media," Proceedings of the 3rd International Workshop on Search and Mining User-generated Contents, pp.69-76, Glasgow, 2011.
  6. M. Bosnjak, E. Oliveira, J. Martins, E. Mendes, and L. Sarmento, "TwitterEcho: A Distributed Focused Crawler to Support Open Research with Twitter Data," Proceedings of the 21th International Conference Companion on World Wide Web, pp.1233-1240, Lyon, 2012.
  7. H. Kwak, C. Lee, H. Park, and S. Moon, "What is Twitter, A Social Network or A News Media?," Proceedings of the 19th International Conference on World Wide Web, pp.591-600, Raleigh, 2010.
  8. P. Noordhuis, M. Heijkoop, and A. Lazovik, "Mining Twitter in the Cloud," Proceedings of the IEEE 3rd International Conference on Cloud Computing, pp.107-114, Miami, 2010.
  9. C. Byun, H. Lee, Y. Kim, and K. Kim, "Automated Twitter Data Collecting Tool and Case Study with Rule-based Analysis," 14th International Conference on Information Integration and Web-based Application & Services (IIWAS), pp.196-204, Bali, 2012.
  10. C. Byun, H. Lee, J. You, and Y. Kim, "Efficient Keyword-related Data Collection in a Social Network with Weighted Seed Selection," International Journal of Networked and Distributed Computing, vol.1, no.3, pp.167-173, Aug. 2013. https://doi.org/10.2991/ijndc.2013.1.3.5
  11. Twitter Developers, accessed on December 2013, .
  12. X. Shang, X. Chen, Z.Jiang, Q.Gu, and D. Chen, "Factor Analysis for Maximization Problem in Social Networks," Proceedings of the 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel & Distributed Computing (SNPD), pp.95-101, Nanjing, 2012.
  13. P. Domingos and M. Richardson, "Mining the Network Value of Customers," Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp.57-66, San Francisco, 2001.
  14. M. Richardson and P. Domingos, "Mining Knowledge- Sharing Sites for Viral Marketing," Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp.61-70, Edmonton, 2002.
  15. D. Kempe, J. Kleinberg, and E. Tardos, "Maximizing the Spread of Influence through a Social Network," Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp.137-146, Washington D.C., 2003.
  16. W. Chen, Y. Wang, and S. Yang, "Scalable Influence Maximization for Prevalent Viral Marketing in Large-scale Social Networks," Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp.1029- 1038, Washington D.C., 2010.
  17. W. Chen, Y. Wang, and S. Yang, "Efficient Influence Maximization in Social Networks," Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp.199-208, Paris, 2009.
  18. M. Cha, H. Haddadi, F. Benevenuto, and K. Gummadi, "Measuring User Influence in Twitter: The Million Follower Fallacy," Proceedings of the 4th International AAAI Conference on Weblogs and Social Media (ICWSM), pp.10-17, Washington D.C., 2010.