Browse > Article
http://dx.doi.org/10.22640/lxsiri.2017.47.1.53

Classification of Public Perceptions toward Smog Risks on Twitter Using Topic Modeling  

Kim, Yun-Ki (Department of Land Management Cheongju University)
Publication Information
Journal of Cadastre & Land InformatiX / v.47, no.1, 2017 , pp. 53-79 More about this Journal
Abstract
The main purpose of this study was to detect and classify public perceptions toward smog disasters on Twitter using topic modeling. To help achieve these objectives and to identify gaps in the literature, this research carried out a literature review on public opinions toward smog disasters and topic modeling. The literature review indicated that there are huge gaps in the related literature. In this research, this author formed five research questions to fill the gaps in the literature. And then this study performed research steps such as data extraction, word cloud analysis on the cleaned data, building the network of terms, correlation analysis, hierarchical cluster analysis, topic modeling with the LDA, and stream graphs to answer those research questions. The results of this research revealed that there exist huge differences in the most frequent terms, the shapes of terms network, types of correlation, and smog-related topics changing patterns between New York and London. Therefore, this author could find positive answers to the four of the five research questions and a partially positive answer to Research question 4. Finally, on the basis of the results, this author suggested policy implications and recommendations for future study.
Keywords
Public Perceptions; Smog Risks; Topic Modeling; LDA; Stream Graphs;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Crowe MJ. 1968. Toward a "definitional model" of public perceptions of air pollution. Journal of the Air Pollution Control Association. 18(3):154-157.   DOI
2 Dunlap RE. 1998. Lay perceptions of global risk: Public views of global warming in cross-national context. International sociology. 13(4):473-498.   DOI
3 Saksena S. 2011. Public perceptions of urban air pollution risks. Risk, Hazards & Crisis in Public Policy. 2(1):1-19.   DOI
4 Sang ETK. 2014. Using tweets for assigning sentiments to regions. In Proc. of the International Workshop on Emotion, Social Signal, Sentiment & Linked Open Data
5 Gretarsson B, O'donovan J, Bostandjiev S, Hollerer T, Asuncion A, Newman D, Smyth, P. 2012. Topicnets: Visual analysis of large text corpora with topic modeling. ACM Transactions on Intelligent Systems and Technology. 3(2):23.
6 Elliott SJ, Cole DC, Kruege P, Voorberg N, Wakefield S. 1999. The power of perception: Health risk attributed to air pollution in anurban industrial neighbourhood. Risk analysis. 19(4):621-634.   DOI
7 Foulds JR, Kumar SH, Getoor L. 2015. Latent Topic Networks: A Versatile Probabilistic Programming Framework for Topic Models. p. 777-786.
8 Fu G, Wang X. 2010. Chinese sentence-level sentiment classification based on fuzzy sets. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters. Association for Computational Linguistics. p. 312-319.
9 Hall D, Jurafsky D, Manning CD. 2008. Studying the history of ideas using topic models. In Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics. p. 363-371.
10 Hofmann T. 1999. Probabilistic latent semantic analysis. In Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc. p. 289-296.
11 Hofmann T. 2001. Unsupervised learning by probabilistic latent semantic analysis. Machine learning. 42(1-2):177-196.   DOI
12 Cody EM, Reagan AJ, Mitchell L, Dodds PS, Danforth CM. 2015. Climate change sentiment on twitter: an unsolicited public opinion poll. PloS one. 10(8):e0136092.   DOI
13 Sluban B, Smailovic J, Juric M, Mozetic I, Battiston S. 2014. Community sentiment on environmental topics in social networks. In Signal-Image Technology and Internet-Based Systems (SITIS), 2014 Tenth International Conference on. p. 376-382.
14 Semenza JC, Wilson DJ, Parra J, Bontempo BD, Hart M, Sailor DJ, George LA. 2008. Public perception and behavior change in relationship to hot weather and air pollution. Environmental research. 107(3):401-411.   DOI
15 Sha Y, Yan J, Cai G. 2014. Detecting public sentiment over PM2. 5 pollution hazards through analysis of Chinese microblog. In ISCRAM: The 11th International Conference on Information Systems for Crisis Response and Management. p. 722-726.
16 Shatnawi S, Gaber MM, Cocea M. 2014. Text stream mining for Massive Open Online Courses: review and perspectives. Systems Science & Control Engineering: An Open Access Journal. 2(1):664-676.   DOI
17 Huynh T, Fritz M, Schiele B. 2008. Discovery of activity patterns using topic models. In Proceedings of the 10th international conference on Ubiquitous computing. p. 10-19.
18 Iacus SM, Porro G, Salini S, Siletti E. 2015. Social networks, happiness and health: from sentiment analysis to a multidimensional indicator of subjective well-being. arXiv preprint arXiv:1512.01569.
19 Hong L., Davison BD. 2010. Empirical study of topic modeling in twitter. In Proceedings of the first workshop on social media analytics. p.80-88.
20 Hoffman M, Bach FR, Blei DM. 2010. Online learning for latent dirichlet allocation. In advances in neural information processing systems. p. 856-864.
21 Hu Y, Boyd-Graber J, Satinoff B, Smith A. 2014. Interactive topic modeling. Machine learning. 95(3):423-469.   DOI
22 Lee H, Kim J, Choo J, Stasko J, Park H. 2012. iVisClustering: An interactive visual document clustering via topic modeling. In Computer Graphics Forum. Blackwell Publishing Ltd. 31(3):1155-1164.   DOI
23 Ji X, Chun SA, Wei Z, Geller J. 2015. Twitter sentiment classification for measuring public health concerns. Social Network Analysis and Mining. 5(1):1-25.   DOI
24 Jiang H, Lin P, Qiang M. 2015. Public-opinion sentiment analysis for large hydro projects. Journal of Construction Engineering and Management. 142(2): 05015013.
25 Jiang Y, Meng W, Yu C. 2011. Topic sentiment change analysis. In International Workshop on Machine Learning and Data Mining in Pattern Recognition. Springer Berlin Heidelberg. p. 443-457
26 Koltsova O, Koltcov S. 2013. Mapping the public agenda with topic modeling: The case of the Russian livejournal. Policy & Internet. 5(2):207-227.   DOI
27 Landauer TK, Foltz PW, Laham D. 1998. An introduction to latent semantic analysis. Discourse processes. 25(2-3): 259-284.   DOI
28 Li J, Pearce PL, Morrison AM, Wu B. 2015. Up in Smoke? The Impact of Smog on Risk Perception and Satisfaction of International Tourists in Beijing. International Journal of Tourism Research. 10:2055.
29 Liu B, EDU U. 2014. Topic modeling using topics from many domains. lifelong learning and big data.
30 Zhai K, Boyd-Graber J, Asadi N, Alkhouja ML. 2012. Mr. LDA: A flexible large scale topic modeling package using variational inference in mapreduce. In Proceedings of the 21st international conference on World Wide Web. p. 879-888.
31 Mei Q, Cai D, Zhang D, Zhai C. 2008. Topic modeling with network regularization. In Proceedings of the 17th international conference on World Wide Web. p. 101-110.
32 Lu Y, Zhai C. 2008. Opinion integration through semi-supervised topic modeling. In Proceedings of the 17th international conference on World Wide Web. p. 121-130.
33 Macnaghten P, Grove-White R, Jacobs M, Wynne B. 1995. Public perceptions and sustainability in Lancashire. Indicators, Institutions, Participation. A report by the Centre for the Study of Environmental Change commissioned by Lancashire County Council.
34 Mehrotra R, Sanner S, Buntine W, Xie L. 2013. Improving lda topic models for microblogs via tweet pooling and automatic labeling. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval. p. 889-892.
35 Min Z, Jianping W. 2015. Visualization Analysis on Contemporary Youth's Haze Sentiment. Youth Studies. 4:006.
36 Sun C, Yuan X, Xu M. 2016b. The public perceptions and willingness to pay: from the perspective of the smog crisis in China. Journal of Cleaner Production. 112:1635-1644.   DOI
37 Zhao Y. 2013. Analysing twitter data with text mining and social network analysis. In Proceedings of the 11th Australasian Data Mining and Analytics Conference.
38 Zhao W, Zou W, Chen JJ. 2014) Topic modeling for cluster analysis of large biological and medical datasets. BMC bioinformatics. 15(11):S11.
39 Zhou Y, Lu T, Zhu T, Chen Z. 2016. Environmental Incidents Detection from Chinese Microblog Based on Sentiment Analysis. In International Conference on Human Centered Computing Springer International Publishing. p. 849-854.
40 Sun C, Yuan X, Yao X. 2016a. Social acceptance towards the air pollution in China: Evidence from public's willingness to pay for smog mitigation. Energy Policy. 92:313-324.   DOI
41 Sun L, Yin Y. 2017. Discovering themes and trends in transportation research using topic modeling. Transportation Research Part C: Emerging Technologies. 77:49-66.   DOI
42 Surian D, Nguyen DQ, Kennedy G, Johnson M, Coiera E, Dunn AG. 2016. Characterizing Twitter discussions about HPV vaccines using topic modeling and community detection. Journal of Medical Internet Research. 18(8):e232.   DOI
43 Tan S, Li Y, Sun H, Guan Z, Yan X, Bu J, He X. 2014. Interpreting the public sentiment variations on twitter. ieee transactions on knowledge and data engineering. 26(5):1158-1170.   DOI
44 Tang J, Jin R, Zhang J. 2008. A topic modeling approach and its integration into the random walk framework for academic search. In Data Mining, 2008. ICDM'08. Eighth IEEE International Conference. p. 1055-1060.
45 Titov I, McDonald R. 2008. Modeling online reviews with multi-grain topic models. In Proceedings of the 17th international conference on World Wide Web. p. 111-120.
46 Andrzejewski D, Zhu X, Craven M. 2009. Incorporating domain knowledge into topic modeling via Dirichlet forest priors. In Proceedings of the 26th Annual International Conference on Machine Learning. p. 25-32.
47 Montague JJ. 2016. Using Visual Communication Design To Optimize Exploration of Large Text-Mining Datasets. [dissertation]. University of Alberta.
48 Alghamdi R. & Alfalqi K. 2015. A Survey of Topic Modeling in Text Mining. International Journal of Advanced Computer Science and Applications. 6(1).
49 Anandkumar A, Kakade SM, Foster DP, Liu YK, Hsu D. 2012. Two svds suffice: Spectral decompositions for probabilistic topic modeling and latent dirichlet allocation (No. arXiv: 1204.6703).
50 Wallach HM. 2006. Topic modeling: beyond bag-of-words. In Proceedings of the 23rd international conference on Machine learning. p. 977-984.
51 Wang C, Blei DM. 2011. Collaborative topic modeling for recommending scientific articles. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. p. 448-456.
52 Weber EU, Stern PC. 2011. Public understanding of climate change in the United States. American Psychologist. 66(4):315.   DOI
53 Yan J, Zeng J, Liu ZQ, Yang L., Gao Y. 2016. Towards big topic modeling. Information Sciences. 390:15-31.
54 Yang S, Shi L. 2016. Public Perception of Smog: A Case Study in Ningbo City, China. Journal of the Air & Waste Management Association. (just-accepted).
55 Yousefpour A, Ibrahim R, Hamed HNA, Hajmohammadi MS. 2014. A comparative study on sentiment analysis. Advances in Environmental Biology. 53-69.
56 Yoon HG, Kim H, Kim CO, Song M. 2016. Opinion polarity detection in Twitter data combining shrinkage regression and topic modeling. Journal of Informetrics. 10(2):634-644.   DOI
57 Yu X. 2016. Noise Levels Associated with Sentiment Analysis on Twitter: A Case Study of New York City [dissertation]. Tufts University.
58 Zhang D, Guo B, Yu Z. 2011. The emergence of social and community intelligence. Computer. 44(7):21-28.   DOI
59 Tuarob S, Pouchard LC, Giles CL. 2013. Automatic tag recommendation for metadata annotation using probabilistic topic modeling. In Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries. p. 239-248.
60 Arora R, Ravindran B. 2008. Latent dirichlet allocation based multi-document summarization. In Proceedings of the second workshop on Analytics for noisy unstructured text data. p. 91-97.
61 Arora S, Ge R, Halpern Y, Mimno DM, Moitra A, Sontag D, Zhu M. 2013. A Practical Algorithm for Topic Modeling with Provable Guarantees. In ICML. p. 280-288.
62 Asuncion HU, Asuncion AU, Taylor RN. 2010. Software traceability with topic modeling. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering-Volume 1 . p. 95-104.
63 Bicalho P, Pita M, Pedrosa G, Lacerda A, Pappa GL.. 2017. A general framework to expand short text for topic modeling. Information Sciences. 393:66-81.   DOI
64 Bickerstaff K, Walker G. 2001. Public understandings of air pollution: the 'localisation' of environmental risk. Global Environmental Change. 11(2):133-145.   DOI
65 Bisgin H, Liu Z., Fang H, Xu X, Tong W. 2011. Mining FDA drug labels using an unsupervised learning technique-topic modeling. BMC bioinformatics. 12(10): S11.
66 Blei DM. 2012. Probabilistic topic models. Communications of the ACM. 55(4):77-84.   DOI
67 Chen J, Chen H, Pan JZ. 2016. Semantic Reasoning for Smog Disaster Analysis. In Description Logics.
68 Nguyen AT, Nguyen TT, Nguyen TN, Lo D, Sun C. 2012. Duplicate bug report detection with a combination of information retrieval and topic modeling. In Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering. p. 70-79.
69 Brechin SR. 2003. Comparative public opinion and knowledge on global climatic change and the Kyoto Protocol: the US versus the world?. International Journal of Sociology and Social Policy. 23(10): 106-134.   DOI
70 Brody SD, Zahran S, Vedlitz A, Grover H. 2008. Examining the relationship between physical vulnerability and public perceptions of global climate change in the United States. Environment and behavior. 40(1):72-95.   DOI
71 Aldahawi HA. 2015. Mining and analysing social network in the oil business: Twitter sentiment analysis and prediction approaches. [dissertation]. Cardiff University.
72 Akerlof K, DeBono R, Berry P, Leiserowitz A, Roser-Renouf C, Clarke KL., Maibach EW. 2010. Public perceptions of climate change as a human health risk: surveys of the United States, Canada and Maltaz. International journal of environmental research and public health. 7(6):2559-2606.   DOI
73 Chen J, Chen H, Wu Z, Hu D, Pan JZ. 2017. Forecasting smog-related health hazard based on social media and physical sensor. Information Systems. 64:281-291.   DOI
74 Chen Z, Mukherjee A, Liu B, Hsu M, Castellanos M, Ghosh R. 2013. Discovering coherent topics using general knowledge. In Proceedings of the 22nd ACM international conference on Information & Knowledge Management. p. 209-218.
75 Cheng P, We J, Marinova D, Guo X. 2017. Adoption of Protective Behaviours: Residents Response to City Smog in Hefei, China. Journal of Contingencies and Crisis Management. 1468-5973
76 Blei, D. M., Ng, A. Y., & Jordan, M. I. 2003. Latent dirichlet allocation. Journal of machine Learning research. 3(Jan):993-1022.
77 Ponweiser M. 2012. Latent Dirichlet allocation in R.
78 Pang B, Lee L, Vaithyanathan S. 2002. Thumbs up?: sentiment classification using machine learning techniques. In Proceedings of the ACL-02 conference on Empirical methods in natural language processing. Association for Computational Linguistics. 10:79-86.
79 Paul MJ, Dredze M. 2014. Discovering health topics in social media using topic models. PloS one. 9(8):e103408.   DOI
80 Pingclasai N, Hata H, Matsumoto KI. 2013. Classifying bug reports to bugs and other requests using topic modeling. In Software Engineering Conference (APSEC), 2013 20th Asia-Pacific. 2:13-18.
81 Qin Z, Cong Y, Wan T. 2016. Topic modeling of Chinese language beyond a bag-of-words. Computer Speech & Language. 40:60-78.   DOI
82 Ramage D, Hall D, Nallapati R, Manning CD. 2009. Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1. Association for Computational Linguistics. p. 248-256.
83 Ritter A, Etzioni O. 2010. A latent dirichlet allocation method for selectional preferences. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics. p. 424-434.
84 Saksena S. 2007. Public perceptions of urban air pollution with a focus on developing countries.