[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.7472/jksii.2014.15.5.63

HBase-based Automatic Summary System using Twitter Trending Topics

Lee, Sanghoon (Department of Computer Science, Georgia State University)
Moon, Seung-Jin (Department of Computer Science, University of Suwon)

Publication Information

Journal of Internet Computing and Services / v.15, no.5, 2014 , pp. 63-72 More about this Journal

Abstract

Twitter has been a popular social media platform where people post short messages of 140 characters or less via the web. A hashtag is a word or acronym created by Twitter users to open a discussion about certain topics and issues that have a very high percentage of trending. Since the hashtag posts are sorted by time, not relevancy, people who firstly use Twitter have had difficulty understanding their context. In this paper, we propose a HBase-based automatic summary system in order to reduce the difficulty of understanding. The proposed system combines an automatic summary method with a fuzzy system after storing the streaming data provided by Twitter API to the HBase. Throughout this procedure, we have eliminated the duplicate of contents in the hashtag posts and have computed scores between posts so that the users can access to the trending topics with relevancy.

Keywords

Twitter trending topics; Automatic summary system; Fuzzy theory; HBase; NoSQL; Twitter API;

Citations & Related Records

Reference

1	Over, P. and J. Yen. "An Introduction to DUC 2003-Intrinsic Evaluation of Generic News Text Summarization Systems." Available for: http://duc.nist.gov, 2003.
2	Sharifi, B., Hutton, M.A., and Kalita, J. "Summarizing microblogs automatically." In Proc. HLT/NAACL-10. pp. 685-688, 2010.
3	Inouye, D. "Multiple post microblog summarization" Research Final Rep. Colorado Springs, GA: University of Colorado at Colorado Springs, 2010
4	Radev, D., Jing, H., Sty, M., and Tam, D. "Centroid-based summarization of multiple documents" Information Processing and Management. vol. 40, pp. 919-938, 2004. DOI ScienceOn
5	Erikan, G. and Radev, D. "LexRank: Graph-based centrality as salience in text summarization. J. Artif. Intell. Res. vol. 22, pp. 457-479, 2004.
6	Mihalcea, R. and Tarau, P. "TextRank: Bringing order into texts" In Proceedings of EMNLP-04. pp. 404-411, 2004.
7	Ghemawat, S., Gobioff , H., and Leung, S.-T. "The Google File System" In Proceedings of SOSP '03. pp. 29-43, 2003.
8	Dean, J., Ghemawat, S. "MapReduce: Simplied Data Processing on Large Clusters, Communications of the ACM. 51, 1 (Jan. 2008), pp. 107-113, 2008.
9	Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., and Gruber, R.E. "Bigtable: A Distributed Storage System for Structured Data" ACM Trans. Comput. Syst. 26, 2 (June 2008), pp. 1-26, 2008. DOI=http://doi.acm.org/10.1145/1365815.1365816. DOI
10	Porter, M. F. "An Algorithm for Suffix Stripping. Program. vol. 14, no. 3, pp. 130-137, 1980. DOI
11	Zadeh, L.A. "Fuzzy sets" In Information and Control. vol. 8, no. 3, pp. 338-393, 1965. DOI
12	Lin, C.Y. "ROUGE: A Package for Automatic Evaluation of Summaries" In Proceedings of the Workshop on Text Summarization. Branches Out (WAS 2004). pp. 74-81, 2004.
13	Lin, C.Y. and Josef, F. "Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statistics" In Proceedings of the 42th Annual Meeting of the Association for Computational Linguistic (ACL 2004). pp. 605-612, 2004.
14	Lin, C.Y. "Looking for a Few Good Metrics: Automatic Summarization Evaluation - How Many Samples Are Enough?" In Proceedings of NTCIR Workshop 4, Tokyo, Japan, June 2-4, 2004.

KSCI

HBase-based Automatic Summary System using Twitter Trending Topics 트위터 트랜딩 토픽을 이용한 HBase 기반 자동 요약 시스템

HBase-based Automatic Summary System using Twitter Trending Topics