Browse > Article
http://dx.doi.org/10.7236/JIIBC.2014.14.5.189

Topical Clustering Techniques of Twitter Documents Using Korean Wikipedia  

Chang, Jae-Young (Dept. of Computer Engineering, Hansung University)
Publication Information
The Journal of the Institute of Internet, Broadcasting and Communication / v.14, no.5, 2014 , pp. 189-196 More about this Journal
Abstract
Recently, the need for retrieving documents is growing in SNS environment such as twitter. For supporting the twitter search, a clustering technique classifying the massively retrieved documents in terms of topics is required. However, due to the nature of twitter, there is a limit in applying previous simple techniques to clustering the twitter documents. To overcome such problem, we propose in this paper a new clustering technique suitable to twitter environment. In proposed method, we augment new terms to feature vectors representing the twitter documents, and recalculate the weights of features using Korean Wikipedia. In addition, we performed the experiments with Korean twitter documents, and proved the usability of proposed method through performance comparison with the previous techniques.
Keywords
SNS; Twitter; Clustering; Wikipedia; Feature Vector;
Citations & Related Records
Times Cited By KSCI : 3  (Citation Analysis)
연도 인용수 순위
1 J. Weng and Q. He, TwitterRank: Finding Topic-sensitive Influential Twitterers, Proceedings of ACM international conference on Web search and data mining conference, 2010.
2 J.-Y. Chang, An Evaluation of Twitter Ranking Using the Retweet Information, Journal of Society for e-Business Studies, Vol. 17, No. 2, 2012.   과학기술학회마을   DOI
3 R. Nagmoti and M. D. Cock, Ranking Approach for Microblog Search, Proceedings of Web Intelligence-Intelligent Agent Technology conference, 2010.
4 H. W. Lauw, A. Ntoulas and K. Kenthapadi, Estimating the Quality of Postings in the Real-time Web, Proceedings of WSDM 2010 Workshop on Search in Social Media, 2010.
5 J.-Y. Chang, Automatic Retrieval of SNS Opinion Document Using Machine Learning Technique, The Journal of The Institute of Internet, Broadcasting and Communication(JIIBC), Vol. 13, No. 5, October 2013.   과학기술학회마을   DOI   ScienceOn
6 O. Tsur, A. Littman, and A. Rappoport, Efficient Clustering of Short Messages into General Domains, Proceedings of 7th International AAAI Conference on Weblogs and Social Media (ICWSM), 2013.
7 T. Xu, and D. W. Orad, Wikipedia-based Topic Clustering for Microblogs, Proceedings of the American Society for Information Science and Technology, 2011.
8 G. Salton, A. Wong, and C. S. Yang, A Vector Space Model for Automatic Indexing, Communications of the ACM, Vol. 18, No. 11, 1975.
9 B. O'Connor, M. Krieger, and D. Ahn, TweetMotif: Exploratory Search and Topic Summarization for Twitter, Proceedings of the Fourth International AAAI Conference on Weblogs and Social Media, 2010.
10 J. Yang, and J. Leskovec, Patterns od Temporal Variation in Online Media, Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, pp. 177-186, 2011.
11 D. Romero, B. Meeder, and J. Kleinberg, Differences in the Mechanics of Information Diffusion Across Topics:Idioms, Political Hashtags, and Complex Contagion on Twitter, Proceedings of the 20th International Conference on World Wide Web, pp. 695-704, 2011.
12 X. Zhao, and J. Jiang, An Empirical Comparision of Topics in Twitter and Traditional Media, Technical Paper Series, Singaapore Management University School of Information Systems, 2011.
13 M. Michelson, and S. A. Macskassy, Discovering Users Topics of Interest on Twitter: A First Look, Proceedings the Fourth Workshop on Analytics for Noisy Unstructured Text Data, pp. 73-80, 2010.
14 Q. Chen, T Shipper, and L. Khan, Tweets mining using WIKIPEDIA and impurity cluster measurement, Proceedings of IEEE International Conference on Intelligence and Security Informatics, pp. 23-26, 2010.
15 S. Ishikawa, Y. Arakawa, and S. Tagashira, Hot topic detection in local areas using Twitter and Wikipedia, Proceedings of International Conference on Architecture of Computing Systems, pp. 28-29, 2012.
16 B. Liu, Web Data Mining: Exploring hyperlinks, contents, and usage data, Springer, 2006.
17 J. Shim, H. C. Lee, The Development of Automatic Ontology Generation System Using Extended Search Keywords, Journal of the Korea Academia-Industrial cooperation Society(JKAIS), Vol. 11, no. 6, 2009.   과학기술학회마을   DOI