[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.3745/JIPS.02.0079

A Survey on Automatic Twitter Event Summarization

Rudrapal, Dwijen (Dept. of Computer Science and Engineering, National Institute of Technology)
Das, Amitava (Dept. of Computer Science and Engineering, Indian Institute of Information Technology)
Bhattacharya, Baby (Dept. of Mathematics, National Institute of Technology)

Publication Information

Journal of Information Processing Systems / v.14, no.1, 2018 , pp. 79-100 More about this Journal

Abstract

Twitter is one of the most popular social platforms for online users to share trendy information and views on any event. Twitter reports an event faster than any other medium and contains enormous information and views regarding an event. Consequently, Twitter topic summarization is one of the most convenient ways to get instant gist of any event. However, the information shared on Twitter is often full of nonstandard abbreviations, acronyms, out of vocabulary (OOV) words and with grammatical mistakes which create challenges to find reliable and useful information related to any event. Undoubtedly, Twitter event summarization is a challenging task where traditional text summarization methods do not work well. In last decade, various research works introduced different approaches for automatic Twitter topic summarization. The main aim of this survey work is to make a broad overview of promising summarization approaches on a Twitter topic. We also focus on automatic evaluation of summarization techniques by surveying recent evaluation methodologies. At the end of the survey, we emphasize on both current and future research challenges in this domain through a level of depth analysis of the most recent summarization approaches.

Keywords

ROUGE; Social Media Text; Tweet Stream; Tweet Summarization;

Citations & Related Records

Reference

1	C. Y. Lin, G. Cao, J. Gao, and J. Y. Nie, "An information-theoretic approach to automatic evaluation of summaries," in Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, New York, NY, 2006, pp. 463-470.
2	Y. Duan, Z. Chen, F. Wei, M. Zhou, and H. Y. Shum, "Twitter topic summarization by ranking tweets using social influence and content quality," in Proceedings of the 24th International Conference on Computational Linguistics, Mumbai, India, 2012, pp. 763-780.
3	A. Louis and A. Nenkova, "Automatically assessing machine summary content without a gold standard," Computational Linguistics, vol. 39, no. 2, pp. 267-300, 2013. DOI
4	M. Schinas, S. Papadopoulos, Y. Kompatsiaris, and P. A. Mitkas, "MGraph: multimodal event summarization in social media using topic models and graph based ranking," International Journal of Multimedia Information Retrieval, vol. 5, no. 1, pp. 51-69, 2016. DOI
5	M. A. Mosa, A. Hamouda, and M. Marei, "Graph coloring and ACO based summarization for social networks," Expert Systems with Applications, vol. 74, pp.115-126, 2017. DOI
6	D. Inouye, "Multiple post microblog summarization," University of Colorado at Colorado Springs, 2010.
7	D. Arthur and S. Vassilvitskii, "k-means++: the advantages of careful seeding," in Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms, New Orleans, LA, 2007, pp. 1027-1035.
8	C. Shen, F. Liu, F. Weng, and T. Li, "A participant-based approach for event summarization using Twitter streams," in Proceedings of the Conference of the North American Chapter of the Association of Computational Linguistics: Human Language Technologies, Atlanta, GA, 2013, pp. 1152-1162.
9	L. Lee, "Measures of distributional similarity," in Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics, College Park, MD, 1999, pp. 25-32.
10	Q. Qu, S. Liu, F. Zhu, and C. S. Jensen, "Efficient online summarization of large-scale dynamic networks," IEEE Transactions on Knowledge and Data Engineering, vol. 28, no. 12, pp. 3231-3245, 2016. DOI
11	B. Sharifi, M. A. Hutton, and J. Kalita, "Summarizing microblogs automatically," in Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Los Angeles, CA, 2010, pp. 685-688.
12	J. Judd and J. Kalita, "Better Twitter summaries?," in Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Atlanta, GA, 2013, pp. 445-449.
13	J. Nichols, J. Mahmud, and C. Drews, "Summarizing sporting events using Twitter," in Proceedings of the ACM International Conference on Intelligent User Interfaces, New York, NY, 2012, pp. 189-198.
14	S. Kullback and R. A. Leibler, "On information and sufficiency," The Annals of Mathematical Statistics, vol. 22, no. 1, pp. 79-86, 1951. DOI
15	S. Brin and L. Page, "The anatomy of a large-scale hypertextual web search engine," Computer Networks and ISDN Systems, vol. 30, no. 1-7, pp. 107-117, 1998. DOI
16	S. Harabagiu and A. Hickl, "Relevance modeling for microblog summarization," in Proceedings of the 5th International Conference on Weblogs and Social Media (ICWSM), Barcelona, Spain, 2011, pp. 514-517.
17	X. Liu, Y. Li, F. Wei, and M. Zhou, "Graph-based multi-tweet summarization using social signals," in Proceedings of 24th International Conference on Computational Linguistics (COLING 2012), Bombay, India, 2012, pp. 1699-1714.
18	J. Goldstein, M. Kantrowitz, V. Mittal, and J. Carbonell, "Summarizing text documents: sentence selection and evaluation metrics," in Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Berkeley, CA, 1999, pp. 121-128.
19	W. Xu, R. Grishman, A. Meyers, and A. Ritter, "A preliminary study of tweet summarization using information extraction," in Proceedings of the Workshop on Language Analysis in Social Media, Atlanta, GA, 2013, pp. 20-29.
20	M. A. H. Khan, D. Bollegala, G. Liu, and K. Sezaki, "Multi-tweet summarization of real-time events," in Proceedings of International Conference on Social Computing, Alexandria, VA, 2013, pp. 128-133.
21	Y. Zhou, N. Kanhabua, and A. I. Cristea, "Real-time timeline summarisation for high-impact events in Twitter," in Proceedings of 22nd European Conference on Artificial Intelligence, The Hague, The Netherlands, 2016, pp. 1158-1166.
22	D. Chakrabarti and K. Punera, "Event summarization using Tweets," in Proceedings of the 5th International Conference on Weblogs and Social Media (ICWSM), Barcelona, Spain, 2011, pp. 66-73.
23	W. X. Zhao, J. R. Wen, and X. Li, "Generating timeline summaries with social media attention," Frontiers of Computer Science, vol. 10, no. 4, pp. 702-716, 2016. DOI
24	D. Gao, W. Li, and R. Zhang, "Sequential summarization: a new application for timely updated Twitter trending topics," in Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013), Sofia, Bulgaria, 2013, pp. 567-571.
25	X. Yang, A. Ghoting, Y. Ruan, and S. Parthasarathy, "A framework for summarizing and analyzing Twitter feeds," in Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Beijing, China, 2012, pp. 370-378.
26	G. Beverungen and J. Kalita, "Evaluating methods for summarizing Twitter posts," in Proceedings of the 4th ACM International Conference on Web Search and Data Mining (WSDM), Hong Kong, China, 2011, pp. 1-6.
27	R. Tibshirani, G. Walther, and T. Hastie, "Estimating the number of clusters in a data set via the gap statistic," Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 63, no. 2, pp. 411-423, 2001. DOI
28	F. Perez-Tellez, D. Pinto, J. Cardiff, and P. Rosso, "On the difficulty of clustering company tweets," in Proceedings of the 2nd International Workshop on Search and Mining User-Generated Contents, Toronto, Canada, 2010, pp. 92-102.
29	C. C. Aggarwal, J. Han, J. Wang, and P. S. Yu, "A framework for clustering evolving data streams," in Proceedings of the 29th International Conference on Very Large Data Bases, Berlin, Germany, 2003, pp. 81-92.
30	A. Zubiaga, D. Spina, E. Amigo, and J. Gonzalo, "Towards real-time summarization of scheduled events from Twitter streams," in Proceedings of the 23rd ACM Conference on Hypertext and Social Media, Milwaukee, WI, 2012, pp. 319-320.
31	F. C. T. Chua and S. Asur, "Automatic summarization of events from social media," in Proceedings of the 7th International Conference on Weblogs and Social Media (ICWSM), Boston, MA, 2013, pp. 81-90.
32	D. M. Blei and J. D. Lafferty, "Dynamic topic models," in Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, PA, 2006, pp. 113-120.
33	R. Mihalcea and P. Tarau, "TextRank: bringing order into texts," in Proceedings of the Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain, 2004, pp. 404-411.
34	B. Xu, H. Hao, Y. Wu, H. Zhang, and C. Liu, "TR-LDA: a cascaded key-bigram extractor for microblog summarization," International Journal of Machine Learning and Computing, vol. 5, no. 3, pp. 172-178, 2015. DOI
35	N. Alsaedi, P. Burnap, and O. Rana, "Automatic summarization of real world events using Twitter," in Proceedings of the 10th International Conference on Web and Social Media (ICWSM), Cologne, Germany, 2016, pp. 511-514.
36	R. Belkaroui and R. Faiz, "Conversational based method for tweet contextualization," Vietnam Journal of Computer Science, vol. 4, no. 4, pp. 223-232, 2017. DOI
37	C. De Maio, G. Fenza, V. Loia, and M. Parente, "Online query-focused twitter summarizer through fuzzy lattice," in Proceedings of 2015 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), Istanbul, Turkey, 2015, pp. 1-8.
38	E. Yulianti, S. Huspi, and M. Sanderson, "Tweet-biased summarization," Journal of the Association for Information Science and Technology, vol. 67, no. 6, pp. 1289-1300, 2016. DOI
39	C. Smith, "400 Interesting Twitter facts, demographics and statistics (November 2017)," 2017 [Online]. Available: http://expandedramblings.com/index.php/march-2013-by-the-numbers-a-few-amazing-twitterstats/.
40	T. Y. Kim, J. Kim, J. Lee, and J. H. Lee, "A Tweet summarization method based on a keyword graph," in Proceedings of the 8th International Conference on Ubiquitous Information Management and Communication, Siem Reap, Cambodia, 2014, pp. 1-8.
41	Twitter usage statistics [Online]. Available: http://www.internetlivestats.com/twitter-statistics/.
42	The top 500 sites on the web [Online]. Available: http://www.alexa.com/topsites.
43	M. Isaac and S. Ember, "For Election Day influence, Twitter ruled social media," The New York Times, 2016 [Online]. Available: https://www.nytimes.com/2016/11/09/technology/for-election-day-chatter-twitterruled- social-media.html.
44	M. Kaufmann, "Syntactic normalization of twitter messages," in Proceedings of International Conference on Natural Language Processing (ICON), Kharagpur, India, 2010.
45	A. Olariu, "Efficient online summarization of microblogging streams," in Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, Gothenburg, Sweden, 2014, pp. 236-240.
46	B. Han and T. Baldwin, "Lexical normalisation of short text messages: makn sens a #twitter," in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, OR, 2011, pp. 368-378.
47	D. Rudrapal, A. Jamatia, K. Chakma, A. Das, and B. Gamback, "Sentence boundary detection for social media text," in Proceedings of the 12th International Conference on Natural Language, Trivandrum, India, 2015, pp. 254-260.
48	C. Lin, C. Lin, J. Li, D. Wang, Y. Chen, and T. Li, "Generating event storylines from microblogs," in Proceedings of the 21st ACM International Conference on Information and Knowledge Management, Maui, HI, 2012, pp. 175-184.
49	B. Sharifi, M. A. Hutton, and J. Kalita, "Automatic summarization of Twitter topics," in Proceedings of National Workshop on Design and Analysis of Algorithm, Tezpur, India, 2010.
50	R. He, Y. Liu, G. Yu, J. Tang, Q. Hu, and J. Dang, "Twitter summarization with social-temporal context," World Wide Web, vol. 20, no. 2, pp. 267-290, 2017. DOI
51	J. Leskovec, A. Rajaraman, and J. D. Ullman, Mining of Massive Datasets. New York, NY: Cambridge University Press, 2011.
52	K. Rudra, S. Banerjee, N. Ganguly, P. Goyal, M. Imran, and P. Mitra, "Summarizing situational tweets in crisis scenario," in Proceedings of the 27th ACM Conference on Hypertext and Social Media, Halifax, Canada, 2016, pp. 137-147.
53	K. Heafield, "KenLM: faster and smaller language model queries," in Proceedings of the 6th Workshop on Statistical Machine Translation, Edinburgh, UK, 2011, pp. 187-197.
54	Z. Wang, L. Shou, K. Chen, G. Chen, and S. Mehrotra, "On summarization and timeline generation for evolutionary tweet streams," IEEE Transactions on Knowledge and Data Engineering, vol. 27, no. 5, pp. 1301-1315, 2015. DOI
55	R. Zhang, W. Li, D. Gao, and Y. Ouyang, "Automatic Twitter topic summarization with speech acts," IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 3, pp. 649-658, 2013. DOI
56	J. R. Searle, "Indirect speech acts," in Syntax and Semantics 3: Speech Acts. New York, NY: Academic Press, 1975, pp. 59-82.
57	I. Mani, "Summarization evaluation: an overview," in Proceedings of the 2nd Workshop Meeting on Evaluation of Chinese & Japanese Text Retrieval and Text Summarization (NTCIR-2), Tokyo, Japan, 2001.
58	R. L. Donaway, K. W. Drummey, and L. A. Mather, "A comparison of rankings produced by summarization evaluation measures," in Proceedings of the NAACL-ANLP Workshop on Automatic Summarization, Seattle, WA, 2000, pp. 69-78.
59	G. J. Rath, A. Resnick, and T. R. Savage, "The formation of abstracts by the selection of sentences. Part I. sentence selection by men and machines," Journal of the Association for Information Science and Technology, vol. 12, no. 2, pp. 139-141, 1961.
60	C. Y. Lin and E. Hovy, "Manual and automatic evaluation of summaries," in Proceedings of the ACL-02 Workshop on Automatic Summarization, Philadelphia, PA, 2002, pp. 45-51.
61	C. Y. Lin, "Rouge: a package for automatic evaluation of summaries," in Proceedings of the ACL-04 Workshop, Barcelona, Spain, 2004, pp. 74-81.
62	D. R. Radev, H. Jing, and M. Budzikowska, "Summarization of multiple documents: clustering, sentence extraction, and evaluation," in Proceedings of the NAACL-ANLP Workshop on Automatic Summarization, Seattle, WA, 2000, pp. 21-30.
63	G. Salton and M. J. McGill, Introduction to Modern Information Retrieval. New York, NY: McGraw-Hill Inc., 1986.
64	D. Inouye and J. K. Kalita, "Comparing twitter summarization algorithms for multiple post summaries," in Proceedings of IEEE 3rd International Conference on Privacy, Security, Risk and Trust and 2011 IEEE 3rd International Conference on Social Computing, Boston, MA, 2011, pp. 298-306.
65	C. Y. Lin and E. Hovy, "Automatic evaluation of summaries using N-gram co-occurrence statistics," in Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, Edmonton, Canada, 2003, pp. 71-78.