Browse > Article
http://dx.doi.org/10.1633/JISTaP.2021.9.1.3

Topic Modeling and Sentiment Analysis of Twitter Discussions on COVID-19 from Spatial and Temporal Perspectives  

AlAgha, Iyad (Faculty of Information Technology, The Islamic University of Gaza)
Publication Information
Journal of Information Science Theory and Practice / v.9, no.1, 2021 , pp. 35-53 More about this Journal
Abstract
The study reported in this paper aimed to evaluate the topics and opinions of COVID-19 discussion found on Twitter. It performed topic modeling and sentiment analysis of tweets posted during the COVID-19 outbreak, and compared these results over space and time. In addition, by covering a more recent and a longer period of the pandemic timeline, several patterns not previously reported in the literature were revealed. Author-pooled Latent Dirichlet Allocation (LDA) was used to generate twenty topics that discuss different aspects related to the pandemic. Time-series analysis of the distribution of tweets over topics was performed to explore how the discussion on each topic changed over time, and the potential reasons behind the change. In addition, spatial analysis of topics was performed by comparing the percentage of tweets in each topic among top tweeting countries. Afterward, sentiment analysis of tweets was performed at both temporal and spatial levels. Our intention was to analyze how the sentiment differs between countries and in response to certain events. The performance of the topic model was assessed by being compared with other alternative topic modeling techniques. The topic coherence was measured for the different techniques while changing the number of topics. Results showed that the pooling by author before performing LDA significantly improved the produced topic models.
Keywords
COVID-19; Twitter; topic modeling; sentiment analysis; Latent Dirichlet Allocation; social media;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Kabir, Y., & Madria, S. (2020). CoronaVis: A real-time COVID-19 Tweets data analyzer and data repository. arXiv. https://arxiv.org/abs/2004.13932.
2 Li, C., Wang, H., Zhang, Z., Sun, A., & Ma, Z. (2016, July 17- 21). Topic modeling for short texts with auxiliary word embeddings. In R. Perego & F. Sebastiani (Eds.), SIGIR '16: The 39th International ACM SIGIR conference on research and development in Information Retrieval (pp. 165-174). ACM.
3 Dai, X., Bikdash, M., & Meyer, B. (2017, March 30-April 2). From social media to public health surveillance: Word embedding based clustering method for Twitter classification. In Institute of Electrical and Electronics Engineers (IEEE) (Ed.), SoutheastCon 2017 (pp. 1-7). IEEE.
4 Stevens, K., Kegelmeyer, P., Andrzejewski, D., & Buttler, D. (2012, July 12-14). Exploring topic coherence over many models and many topics. In J. Tsujii (Ed.), EMNLP-CoNLL '12: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (pp. 952-961). Association for Computational Linguistics.
5 Weng, J., Lim, E.-P., Jiang, J., & He, Q. (2010, February 3-6). TwitterRank: Finding topic-sensitive influential Twitterers. In B. D. Davison & T. Suel (Eds.), WSDM'10: Third ACM International Conference on Web Search and Data Mining (pp. 261-270). ACM. https://doi.org/10.1145/1718487.1718520.   DOI
6 Wicke, P., & Bolognesi, M. M. (2020). Framing COVID-19: How we conceptualize and discuss the pandemic on Twitter. arXiv. https://arxiv.org/abs/2004.06986.
7 World Health Organization (WHO). (2020). Coronavirus disease (COVID-19) situation report - 142. https://www.who.int/docs/default-source/coronaviruse/situation-reports/20200610-covid-19-sitrep-142.pdf?sfvrsn=180898cd_6.
8 Zaharia, M., Xin, R. S., Wendell, P., Das, T., Armbrust, M., Dave, A., Meng, X., Rosen, J., Venkataraman, S., Franklin, M. J., Ghodsi, A., Gonzalez, J., Shenker, S., & Stoica, I. (2016). Apache Spark: A unified engine for big data processing. Communications of the ACM, 59(11), 56-65. https://doi.org/10.1145/2934664.   DOI
9 Sharma, K., Seo, S., Meng, C., Rambhatla, S., Dua, A., & Liu, Y. (2020). COVID-19 on social media: Analyzing misinformation in Twitter conversations. arXiv. https://arxiv.org/abs/2003.12309.
10 Liu, C., Liu, Z., Li, T., & Xia, B. (2018, July 1-3). Topic modeling for noisy short texts with multiple relations. In X. He (Ed.), Proceedings of the 30th International Conference on Software Engineering and Knowledge Engineering (pp. 610-609). KSI Research Inc. and Knowledge Systems Institute Graduate School. http://ksiresearchorg.ipage.com/seke/seke18.html
11 Lopez, C. E., Vasu, M., & Gallemore, C. (2020). Understanding the perception of COVID-19 policies by mining a multilanguage Twitter dataset. arXiv. https://arxiv.org/abs/2003.10359.
12 Medford, R. J., Saleh, S. N., Sumarsono, A., Perl, T. M., & Lehmann, C. U. (2020). An "infodemic": Leveraging high-volume Twitter data to understand early public sentiment for the coronavirus disease 2019 outbreak. Open Forum Infectious Diseases, 7(7), ofaa258. https://doi.org/10.1093/ofid/ofaa258.   DOI
13 Nikita, M. (2015). Select number of topics for LDA model. http://rstudio-pubs-static.s3.amazonaws.com/107657_4cdc6f600fe44cc8b2600f6f9c470ce8.html.
14 Mehrotra, R., Sanner, S., Buntine, W., & Xie, L. (2013, July 28-August 1). Improving LDA topic models for microblogs via Tweet pooling and automatic labeling. In G. J. F. Jones & P. Sheridan (Eds.), SIGIR '13: The 36th International ACM SIGIR conference on research and development in information retrieval (pp. 889-892). ACM. https://doi.org/10.1145/2484028.2484166   DOI
15 Mendoza, M., Poblete, B., & Valderrama, I. (2019). Nowcasting earthquake damages with Twitter. EPJ Data Science, 8, 3. https://doi.org/10.1140/epjds/s13688-019-0181-0.   DOI
16 Newman, D., Lau, J. H., Grieser, K., & Baldwin, T. (2010, June 2-4). Automatic evaluation of topic coherence. In R. M. Kaplan (Ed.), HLT '10: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (pp. 100-108). Association for Computational Linguistics.
17 Cinelli, M., Quattrociocchi, W., Galeazzi, A., Valensise, C. M., Brugnoli, E., Schmidt, A. L., Zola, P., Zollo, F., & Scala, A. (2020). The COVID-19 social media infodemic. arXiv. https://arxiv.org/abs/2003.05004.
18 Zhao, W., Chen, J. J., Perkins, R., Liu, Z., Ge, W., Ding, Y., & Zou, W. (2015). A heuristic approach to determine an appropriate number of topics in topic modeling. BMC Bioinformatics, 16(Supplement 13), S8. https://doi.org/10.1186/1471-2105-16-S13-S8.   DOI
19 Zhao, W. X., Jiang, J., Weng, J., He, J., Lim, E.-P., Yan, H., & Li, X. (2011, April 18-21). Comparing Twitter and traditional media using topic models. In P. Clough, C. Foley, C. Gurrin, G. J. F. Jones, W. Kraaij, H. Lee, & V. Mudoch (Eds.), 33rd European Conference on Information Retrieval Research, ECIR 2011 (pp. 338-349). Springer.
20 Rehurek, R., & Sojka, P. (2011). Gensim-statistical semantics in Python. https://www.fi.muni.cz/usr/sojka/posters/rehurek-sojka-scipy2011.pdf.
21 Maier, D., Waldherr, A., Miltner, P., Wiedemann, G., Niekler, A., Keinert, A., Pfetsch, B., Heyer, G., Reber, U., Haussler, T., Schmid-Petri, H., & Adam, S. (2018). Applying LDA topic modeling in communication research: Toward a valid and reliable methodology. Communication Methods and Measures, 12(2-3), 93-118. https://doi.org/10.1080/19312458.2018.1430754.   DOI
22 Pourebrahim, N., Sultana, S., Edwards, J., Gochanour, A., & Mohanty, S. (2019). Understanding communication dynamics on Twitter during natural disasters: A case study of Hurricane Sandy. International Journal of Disaster Risk Reduction, 37, 101176. https://doi.org/10.1016/j.ijdrr.2019.101176.   DOI
23 Mimno, D., Wallach, H. M., Talley, E., Leenders, M., & McCallum, A. (2011, July 27-31). Optimizing semantic coherence in topic models. In P. Merlo (Ed.), EMNLP '11: Proceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 262-272). Association for Computational Linguistics.
24 Ordun, C., Purushotham, S., & Raff, E. (2020). Exploratory analysis of Covid-19 Tweets using topic modeling, UMAP, and DiGraphs. arXiv. https://arxiv.org/abs/2005.03082.
25 Owoputi, O., O'Connor, B., Dyer, C., Gimpel, K., Schneider, N., & Smith, N. A. (2013, June 9-14). Improved part-of-speech tagging for online conversational text with word clusters. Proceedings of NAACL-HLT 2013 (pp. 380-390). Association for Computational Linguistics. https://www.aclweb.org/anthology/N13-1039.pdf
26 Ponweiser, M. (2012). Latent Dirichlet allocation in R. [Diploma thesis]. Vienna University, Vienna, Austria. https://epub.wu.ac.at/3558/
27 Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. The Journal of Machine Learning Research, 3, 993-1022. https://doi.org/10.1145/2615569.2615680.   DOI
28 Bruns, A., & Hanusch, F. (2017). Conflict imagery in a connective environment: Audiovisual content on Twitter following the 2015/2016 terror attacks in Paris and Brussels. Media, Culture & Society, 39(8), 1122-1141. https://doi.org/10.1177%2F0163443717725574.   DOI
29 Alvarez-Melis, D., & Saveski, M. (2016, May 17-20). Topic modeling in Twitter: Aggregating tweets by conversations. Paper presented at the 10th International AAAI Conference on Web and Social Media, Cologne, Germany.
30 Meng, X., Bradley, J., Yavuz, B., Sparks, E., Venkataraman, S., Liu, D., Freeman, J., Tsai, D., Amde, M., Owen, S., Xin, D., Xin, R., Franklin, M. J., Zadeh, R., Zaharia, M., & Talwalkar, A. (2016). MLlib: Machine learning in Apache Spark. Journal of Machine Learning Research, 17(2016), 1-7. https://www.jmlr.org/papers/volume17/15-237/15-237.pdf.
31 Rossetti, M., Stella, F., & Zanker, M. (2016). Analyzing user reviews in tourism with topic models. Information Technology & Tourism, 16(1), 5-21. https://doi.org/10.1007/s40558-015-0035-y.   DOI
32 Sha, H., Hasan, M. A., Mohler, G., & Brantingham, P. J. (2020). Dynamic topic modeling of the COVID-19 Twitter narrative among U.S. governors and cabinet executives. arXiv. https://arxiv.org/abs/2004.11692.
33 Sharma, C., & Bedi, P. (2018, October 10-12). Mitigating popularity bias in Twitter- recommending novel hashtags using pooled tweets. In S. K. Niranjan (Ed.), Proceedings of the 3rd International Conference on Contemporary Computing and Informatics (iC3 I - 2018) (pp. 166-171). IEEE. https://doi.org/10.1109/IC3I44769.2018.9007248   DOI
34 Chen, L., Lyu, H., Yang, T., Wang, Y., & Luo, J. (2020). In the eyes of the beholder: Analyzing social media use of neutral and controversial terms for COVID-19. arXiv. https://arxiv.org/abs/2004.10225.
35 Singh, L., Bansal, S., Bode, L., Budak, C., Chi, G., Kawintiranon, K., Padden, C., Vanarsdall, R., Vraga, E., & Wang, Y. (2020). A first look at COVID-19 information and misinformation sharing on Twitter. arXiv. https://arxiv.org/abs/2003.13907.
36 Abd-Alrazaq, A., Alhuwail, D., Househ, M., Hamdi, M., & Shah, Z. (2020). Top concerns of Tweeters during the COVID-19 pandemic: Infoveillance study. Journal of Medical Internet Research, 22(4), e19016. https://doi.org/10.2196/19016.   DOI
37 Alam, F., Ofli, F., Imran, M., & Aupetit, M. (2018). A Twitter tale of three hurricanes: Harvey, Irma, and Maria. arXiv. https://arxiv.org/abs/1805.05144.
38 Aletras, N., & Stevenson, M. (2013, March 19-22). Evaluating topic coherence using distributional semantics. In A. Koller & K. Erk (Eds.), Proceedings of the 10th International Conference on Computational Semantics (IWCS 2013) - Long Papers (pp. 13-22). Association for Computational Linguistics.
39 Aletras, N., & Stevenson, M. (2014, June 22-27). Labelling topics using unsupervised graph-based methods. In K. Toutanova & H. Wu (Eds.), Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (pp. 631-636). Association for Computational Linguistics.
40 Byrd, K., Mansurov, A., & Baysal, O. (2016, May 14-15). Mining Twitter data for influenza detection and surveillance. In P. Kellenberger (Ed.), 2016 IEEE/ACM International Workshop on Software Engineering in Healthcare Systems (SEHS) (pp. 43-49). Institute of Electrical and Electronics Engineers.
41 Idzelis, M. (2005). Jazzy: The Java open source spell checker. http://jazzy.sourceforge.net.
42 Smith, A., Lee, T. Y., Poursabzi-Sangdeh, F., Boyd-Graber, J., Elmqvist, N., & Findlater, L. (2017). Evaluating visual representations for topic understanding and their effects on manually generated topic labels. Transactions of the Association for Computational Linguistics, 5, 1-16. https://doi.org/10.1162/tacl_a_00042.   DOI
43 Banda, J. M., Tekumalla, R., Wang, G., Yu, J., Liu, T., Ding, Y., Artemova, K., Tutubalina, E., & Chowell, G. (2020). A large-scale COVID-19 Twitter chatter dataset for open scientific research -- an international collaboration. arXiv. https://arxiv.org/abs/2004.03688.
44 Blei, D. M., & Lafferty, J. D. (2005, December 5-10). Correlated topic models. In Y. Weiss, B. Scholkopf, & J. Platt (Eds.), Advances in Neural Information Processing Systems 18 (NIPS 2005) (pp. 147-154). MIT Press.
45 Duong, V., Pham, P., Yang, T., Wang, Y., & Luo, J. (2020). The ivory tower lost: How college students respond differently than the general public to the COVID-19 pandemic. arXiv. https://arxiv.org/abs/2004.09968.
46 Gimpel, K., Schneider, N., & O'Connor, B. (2013). Annotation guidelines for Twitter part-of-speech tagging version 0.3 (March 2013). http://www.ark.cs.cmu.edu/TweetNLP/annot_guidelines.pdf.
47 Hong, L., & Davison, B. D. (2010, July 25). Empirical study of topic modeling in Twitter. In P. Melville, J. Leskovec, & F. Provost (Eds.), Proceedings of the first workshop on social media analytics (pp. 80-88). ACM. https://doi.org/10.1145/1964858.1964870   DOI
48 Hutto, C. J., & Gilbert, E. (2014). Vader: A parsimonious rule-based model for sentiment analysis of social media text. Paper presented at 8th International AAAI Conference on Weblogs and Social Media, Ann Arbor, Michigan, USA.
49 Jordan, S. E., Hovet, S. E., Fung, I. C.-H., Liang, H., Fu, K.-W., & Tse, Z. T. H. (2019). Using Twitter for public health surveillance from monitoring and prediction to public response. Data, 4(1), 6. https://doi.org/10.3390/data4010006.   DOI