[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.1633/JISTaP.2021.9.1.3

Topic Modeling and Sentiment Analysis of Twitter Discussions on COVID-19 from Spatial and Temporal Perspectives

AlAgha, Iyad (Faculty of Information Technology, The Islamic University of Gaza)

Publication Information

Journal of Information Science Theory and Practice / v.9, no.1, 2021 , pp. 35-53 More about this Journal

Abstract

The study reported in this paper aimed to evaluate the topics and opinions of COVID-19 discussion found on Twitter. It performed topic modeling and sentiment analysis of tweets posted during the COVID-19 outbreak, and compared these results over space and time. In addition, by covering a more recent and a longer period of the pandemic timeline, several patterns not previously reported in the literature were revealed. Author-pooled Latent Dirichlet Allocation (LDA) was used to generate twenty topics that discuss different aspects related to the pandemic. Time-series analysis of the distribution of tweets over topics was performed to explore how the discussion on each topic changed over time, and the potential reasons behind the change. In addition, spatial analysis of topics was performed by comparing the percentage of tweets in each topic among top tweeting countries. Afterward, sentiment analysis of tweets was performed at both temporal and spatial levels. Our intention was to analyze how the sentiment differs between countries and in response to certain events. The performance of the topic model was assessed by being compared with other alternative topic modeling techniques. The topic coherence was measured for the different techniques while changing the number of topics. Results showed that the pooling by author before performing LDA significantly improved the produced topic models.

Keywords

COVID-19; Twitter; topic modeling; sentiment analysis; Latent Dirichlet Allocation; social media;

Citations & Related Records

Reference

1	Kabir, Y., & Madria, S. (2020). CoronaVis: A real-time COVID-19 Tweets data analyzer and data repository. arXiv. https://arxiv.org/abs/2004.13932.
2	Li, C., Wang, H., Zhang, Z., Sun, A., & Ma, Z. (2016, July 17- 21). Topic modeling for short texts with auxiliary word embeddings. In R. Perego & F. Sebastiani (Eds.), SIGIR '16: The 39th International ACM SIGIR conference on research and development in Information Retrieval (pp. 165-174). ACM.
3	Dai, X., Bikdash, M., & Meyer, B. (2017, March 30-April 2). From social media to public health surveillance: Word embedding based clustering method for Twitter classification. In Institute of Electrical and Electronics Engineers (IEEE) (Ed.), SoutheastCon 2017 (pp. 1-7). IEEE.
4	Stevens, K., Kegelmeyer, P., Andrzejewski, D., & Buttler, D. (2012, July 12-14). Exploring topic coherence over many models and many topics. In J. Tsujii (Ed.), EMNLP-CoNLL '12: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (pp. 952-961). Association for Computational Linguistics.
5	Weng, J., Lim, E.-P., Jiang, J., & He, Q. (2010, February 3-6). TwitterRank: Finding topic-sensitive influential Twitterers. In B. D. Davison & T. Suel (Eds.), WSDM'10: Third ACM International Conference on Web Search and Data Mining (pp. 261-270). ACM. https://doi.org/10.1145/1718487.1718520. DOI
6	Wicke, P., & Bolognesi, M. M. (2020). Framing COVID-19: How we conceptualize and discuss the pandemic on Twitter. arXiv. https://arxiv.org/abs/2004.06986.
7	World Health Organization (WHO). (2020). Coronavirus disease (COVID-19) situation report - 142. https://www.who.int/docs/default-source/coronaviruse/situation-reports/20200610-covid-19-sitrep-142.pdf?sfvrsn=180898cd_6.
8	Zaharia, M., Xin, R. S., Wendell, P., Das, T., Armbrust, M., Dave, A., Meng, X., Rosen, J., Venkataraman, S., Franklin, M. J., Ghodsi, A., Gonzalez, J., Shenker, S., & Stoica, I. (2016). Apache Spark: A unified engine for big data processing. Communications of the ACM, 59(11), 56-65. https://doi.org/10.1145/2934664. DOI
9	Sharma, K., Seo, S., Meng, C., Rambhatla, S., Dua, A., & Liu, Y. (2020). COVID-19 on social media: Analyzing misinformation in Twitter conversations. arXiv. https://arxiv.org/abs/2003.12309.
10	Liu, C., Liu, Z., Li, T., & Xia, B. (2018, July 1-3). Topic modeling for noisy short texts with multiple relations. In X. He (Ed.), Proceedings of the 30th International Conference on Software Engineering and Knowledge Engineering (pp. 610-609). KSI Research Inc. and Knowledge Systems Institute Graduate School. http://ksiresearchorg.ipage.com/seke/seke18.html
11	Lopez, C. E., Vasu, M., & Gallemore, C. (2020). Understanding the perception of COVID-19 policies by mining a multilanguage Twitter dataset. arXiv. https://arxiv.org/abs/2003.10359.
12	Medford, R. J., Saleh, S. N., Sumarsono, A., Perl, T. M., & Lehmann, C. U. (2020). An "infodemic": Leveraging high-volume Twitter data to understand early public sentiment for the coronavirus disease 2019 outbreak. Open Forum Infectious Diseases, 7(7), ofaa258. https://doi.org/10.1093/ofid/ofaa258. DOI
13	Nikita, M. (2015). Select number of topics for LDA model. http://rstudio-pubs-static.s3.amazonaws.com/107657_4cdc6f600fe44cc8b2600f6f9c470ce8.html.
14	Mehrotra, R., Sanner, S., Buntine, W., & Xie, L. (2013, July 28-August 1). Improving LDA topic models for microblogs via Tweet pooling and automatic labeling. In G. J. F. Jones & P. Sheridan (Eds.), SIGIR '13: The 36th International ACM SIGIR conference on research and development in information retrieval (pp. 889-892). ACM. https://doi.org/10.1145/2484028.2484166 DOI
15	Mendoza, M., Poblete, B., & Valderrama, I. (2019). Nowcasting earthquake damages with Twitter. EPJ Data Science, 8, 3. https://doi.org/10.1140/epjds/s13688-019-0181-0. DOI
16	Newman, D., Lau, J. H., Grieser, K., & Baldwin, T. (2010, June 2-4). Automatic evaluation of topic coherence. In R. M. Kaplan (Ed.), HLT '10: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (pp. 100-108). Association for Computational Linguistics.
17	Cinelli, M., Quattrociocchi, W., Galeazzi, A., Valensise, C. M., Brugnoli, E., Schmidt, A. L., Zola, P., Zollo, F., & Scala, A. (2020). The COVID-19 social media infodemic. arXiv. https://arxiv.org/abs/2003.05004.
18	Zhao, W., Chen, J. J., Perkins, R., Liu, Z., Ge, W., Ding, Y., & Zou, W. (2015). A heuristic approach to determine an appropriate number of topics in topic modeling. BMC Bioinformatics, 16(Supplement 13), S8. https://doi.org/10.1186/1471-2105-16-S13-S8. DOI
19	Zhao, W. X., Jiang, J., Weng, J., He, J., Lim, E.-P., Yan, H., & Li, X. (2011, April 18-21). Comparing Twitter and traditional media using topic models. In P. Clough, C. Foley, C. Gurrin, G. J. F. Jones, W. Kraaij, H. Lee, & V. Mudoch (Eds.), 33rd European Conference on Information Retrieval Research, ECIR 2011 (pp. 338-349). Springer.
20	Rehurek, R., & Sojka, P. (2011). Gensim-statistical semantics in Python. https://www.fi.muni.cz/usr/sojka/posters/rehurek-sojka-scipy2011.pdf.
21	Maier, D., Waldherr, A., Miltner, P., Wiedemann, G., Niekler, A., Keinert, A., Pfetsch, B., Heyer, G., Reber, U., Haussler, T., Schmid-Petri, H., & Adam, S. (2018). Applying LDA topic modeling in communication research: Toward a valid and reliable methodology. Communication Methods and Measures, 12(2-3), 93-118. https://doi.org/10.1080/19312458.2018.1430754. DOI
22	Pourebrahim, N., Sultana, S., Edwards, J., Gochanour, A., & Mohanty, S. (2019). Understanding communication dynamics on Twitter during natural disasters: A case study of Hurricane Sandy. International Journal of Disaster Risk Reduction, 37, 101176. https://doi.org/10.1016/j.ijdrr.2019.101176. DOI
23	Mimno, D., Wallach, H. M., Talley, E., Leenders, M., & McCallum, A. (2011, July 27-31). Optimizing semantic coherence in topic models. In P. Merlo (Ed.), EMNLP '11: Proceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 262-272). Association for Computational Linguistics.
24	Ordun, C., Purushotham, S., & Raff, E. (2020). Exploratory analysis of Covid-19 Tweets using topic modeling, UMAP, and DiGraphs. arXiv. https://arxiv.org/abs/2005.03082.
25	Owoputi, O., O'Connor, B., Dyer, C., Gimpel, K., Schneider, N., & Smith, N. A. (2013, June 9-14). Improved part-of-speech tagging for online conversational text with word clusters. Proceedings of NAACL-HLT 2013 (pp. 380-390). Association for Computational Linguistics. https://www.aclweb.org/anthology/N13-1039.pdf
26	Ponweiser, M. (2012). Latent Dirichlet allocation in R. [Diploma thesis]. Vienna University, Vienna, Austria. https://epub.wu.ac.at/3558/
27	Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. The Journal of Machine Learning Research, 3, 993-1022. https://doi.org/10.1145/2615569.2615680. DOI
28	Bruns, A., & Hanusch, F. (2017). Conflict imagery in a connective environment: Audiovisual content on Twitter following the 2015/2016 terror attacks in Paris and Brussels. Media, Culture & Society, 39(8), 1122-1141. https://doi.org/10.1177%2F0163443717725574. DOI
29	Alvarez-Melis, D., & Saveski, M. (2016, May 17-20). Topic modeling in Twitter: Aggregating tweets by conversations. Paper presented at the 10th International AAAI Conference on Web and Social Media, Cologne, Germany.
30	Meng, X., Bradley, J., Yavuz, B., Sparks, E., Venkataraman, S., Liu, D., Freeman, J., Tsai, D., Amde, M., Owen, S., Xin, D., Xin, R., Franklin, M. J., Zadeh, R., Zaharia, M., & Talwalkar, A. (2016). MLlib: Machine learning in Apache Spark. Journal of Machine Learning Research, 17(2016), 1-7. https://www.jmlr.org/papers/volume17/15-237/15-237.pdf.
31	Rossetti, M., Stella, F., & Zanker, M. (2016). Analyzing user reviews in tourism with topic models. Information Technology & Tourism, 16(1), 5-21. https://doi.org/10.1007/s40558-015-0035-y. DOI
32	Sha, H., Hasan, M. A., Mohler, G., & Brantingham, P. J. (2020). Dynamic topic modeling of the COVID-19 Twitter narrative among U.S. governors and cabinet executives. arXiv. https://arxiv.org/abs/2004.11692.
33	Sharma, C., & Bedi, P. (2018, October 10-12). Mitigating popularity bias in Twitter- recommending novel hashtags using pooled tweets. In S. K. Niranjan (Ed.), Proceedings of the 3rd International Conference on Contemporary Computing and Informatics (iC3 I - 2018) (pp. 166-171). IEEE. https://doi.org/10.1109/IC3I44769.2018.9007248 DOI
34	Chen, L., Lyu, H., Yang, T., Wang, Y., & Luo, J. (2020). In the eyes of the beholder: Analyzing social media use of neutral and controversial terms for COVID-19. arXiv. https://arxiv.org/abs/2004.10225.
35	Singh, L., Bansal, S., Bode, L., Budak, C., Chi, G., Kawintiranon, K., Padden, C., Vanarsdall, R., Vraga, E., & Wang, Y. (2020). A first look at COVID-19 information and misinformation sharing on Twitter. arXiv. https://arxiv.org/abs/2003.13907.
36	Abd-Alrazaq, A., Alhuwail, D., Househ, M., Hamdi, M., & Shah, Z. (2020). Top concerns of Tweeters during the COVID-19 pandemic: Infoveillance study. Journal of Medical Internet Research, 22(4), e19016. https://doi.org/10.2196/19016. DOI
37	Alam, F., Ofli, F., Imran, M., & Aupetit, M. (2018). A Twitter tale of three hurricanes: Harvey, Irma, and Maria. arXiv. https://arxiv.org/abs/1805.05144.
38	Aletras, N., & Stevenson, M. (2013, March 19-22). Evaluating topic coherence using distributional semantics. In A. Koller & K. Erk (Eds.), Proceedings of the 10th International Conference on Computational Semantics (IWCS 2013) - Long Papers (pp. 13-22). Association for Computational Linguistics.
39	Aletras, N., & Stevenson, M. (2014, June 22-27). Labelling topics using unsupervised graph-based methods. In K. Toutanova & H. Wu (Eds.), Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (pp. 631-636). Association for Computational Linguistics.
40	Byrd, K., Mansurov, A., & Baysal, O. (2016, May 14-15). Mining Twitter data for influenza detection and surveillance. In P. Kellenberger (Ed.), 2016 IEEE/ACM International Workshop on Software Engineering in Healthcare Systems (SEHS) (pp. 43-49). Institute of Electrical and Electronics Engineers.
41	Idzelis, M. (2005). Jazzy: The Java open source spell checker. http://jazzy.sourceforge.net.
42	Smith, A., Lee, T. Y., Poursabzi-Sangdeh, F., Boyd-Graber, J., Elmqvist, N., & Findlater, L. (2017). Evaluating visual representations for topic understanding and their effects on manually generated topic labels. Transactions of the Association for Computational Linguistics, 5, 1-16. https://doi.org/10.1162/tacl_a_00042. DOI
43	Banda, J. M., Tekumalla, R., Wang, G., Yu, J., Liu, T., Ding, Y., Artemova, K., Tutubalina, E., & Chowell, G. (2020). A large-scale COVID-19 Twitter chatter dataset for open scientific research -- an international collaboration. arXiv. https://arxiv.org/abs/2004.03688.
44	Blei, D. M., & Lafferty, J. D. (2005, December 5-10). Correlated topic models. In Y. Weiss, B. Scholkopf, & J. Platt (Eds.), Advances in Neural Information Processing Systems 18 (NIPS 2005) (pp. 147-154). MIT Press.
45	Duong, V., Pham, P., Yang, T., Wang, Y., & Luo, J. (2020). The ivory tower lost: How college students respond differently than the general public to the COVID-19 pandemic. arXiv. https://arxiv.org/abs/2004.09968.
46	Gimpel, K., Schneider, N., & O'Connor, B. (2013). Annotation guidelines for Twitter part-of-speech tagging version 0.3 (March 2013). http://www.ark.cs.cmu.edu/TweetNLP/annot_guidelines.pdf.
47	Hong, L., & Davison, B. D. (2010, July 25). Empirical study of topic modeling in Twitter. In P. Melville, J. Leskovec, & F. Provost (Eds.), Proceedings of the first workshop on social media analytics (pp. 80-88). ACM. https://doi.org/10.1145/1964858.1964870 DOI
48	Hutto, C. J., & Gilbert, E. (2014). Vader: A parsimonious rule-based model for sentiment analysis of social media text. Paper presented at 8th International AAAI Conference on Weblogs and Social Media, Ann Arbor, Michigan, USA.
49	Jordan, S. E., Hovet, S. E., Fung, I. C.-H., Liang, H., Fu, K.-W., & Tse, Z. T. H. (2019). Using Twitter for public health surveillance from monitoring and prediction to public response. Data, 4(1), 6. https://doi.org/10.3390/data4010006. DOI