Browse > Article
http://dx.doi.org/10.7472/jksii.2014.15.5.63

HBase-based Automatic Summary System using Twitter Trending Topics  

Lee, Sanghoon (Department of Computer Science, Georgia State University)
Moon, Seung-Jin (Department of Computer Science, University of Suwon)
Publication Information
Journal of Internet Computing and Services / v.15, no.5, 2014 , pp. 63-72 More about this Journal
Abstract
Twitter has been a popular social media platform where people post short messages of 140 characters or less via the web. A hashtag is a word or acronym created by Twitter users to open a discussion about certain topics and issues that have a very high percentage of trending. Since the hashtag posts are sorted by time, not relevancy, people who firstly use Twitter have had difficulty understanding their context. In this paper, we propose a HBase-based automatic summary system in order to reduce the difficulty of understanding. The proposed system combines an automatic summary method with a fuzzy system after storing the streaming data provided by Twitter API to the HBase. Throughout this procedure, we have eliminated the duplicate of contents in the hashtag posts and have computed scores between posts so that the users can access to the trending topics with relevancy.
Keywords
Twitter trending topics; Automatic summary system; Fuzzy theory; HBase; NoSQL; Twitter API;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Over, P. and J. Yen. "An Introduction to DUC 2003-Intrinsic Evaluation of Generic News Text Summarization Systems." Available for: http://duc.nist.gov, 2003.
2 Sharifi, B., Hutton, M.A., and Kalita, J. "Summarizing microblogs automatically." In Proc. HLT/NAACL-10. pp. 685-688, 2010.
3 Inouye, D. "Multiple post microblog summarization" Research Final Rep. Colorado Springs, GA: University of Colorado at Colorado Springs, 2010
4 Radev, D., Jing, H., Sty, M., and Tam, D. "Centroid-based summarization of multiple documents" Information Processing and Management. vol. 40, pp. 919-938, 2004.   DOI   ScienceOn
5 Erikan, G. and Radev, D. "LexRank: Graph-based centrality as salience in text summarization. J. Artif. Intell. Res. vol. 22, pp. 457-479, 2004.
6 Mihalcea, R. and Tarau, P. "TextRank: Bringing order into texts" In Proceedings of EMNLP-04. pp. 404-411, 2004.
7 Ghemawat, S., Gobioff , H., and Leung, S.-T. "The Google File System" In Proceedings of SOSP '03. pp. 29-43, 2003.
8 Dean, J., Ghemawat, S. "MapReduce: Simplied Data Processing on Large Clusters, Communications of the ACM. 51, 1 (Jan. 2008), pp. 107-113, 2008.
9 Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., and Gruber, R.E. "Bigtable: A Distributed Storage System for Structured Data" ACM Trans. Comput. Syst. 26, 2 (June 2008), pp. 1-26, 2008. DOI=http://doi.acm.org/10.1145/1365815.1365816.   DOI
10 Porter, M. F. "An Algorithm for Suffix Stripping. Program. vol. 14, no. 3, pp. 130-137, 1980.   DOI
11 Zadeh, L.A. "Fuzzy sets" In Information and Control. vol. 8, no. 3, pp. 338-393, 1965.   DOI
12 Lin, C.Y. "ROUGE: A Package for Automatic Evaluation of Summaries" In Proceedings of the Workshop on Text Summarization. Branches Out (WAS 2004). pp. 74-81, 2004.
13 Lin, C.Y. and Josef, F. "Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statistics" In Proceedings of the 42th Annual Meeting of the Association for Computational Linguistic (ACL 2004). pp. 605-612, 2004.
14 Lin, C.Y. "Looking for a Few Good Metrics: Automatic Summarization Evaluation - How Many Samples Are Enough?" In Proceedings of NTCIR Workshop 4, Tokyo, Japan, June 2-4, 2004.