Browse > Article

Link-Based Clustering in Blogosphere  

Song, Suk-Soon (Department of Electronics and Computer Engineering, Hanyang University)
Yoon, Seok-Ho (Department of Electronics and Computer Engineering, Hanyang University)
Kim, Sang-Wook (Department of Electronics and Computer Engineering, Hanyang University)
Publication Information
Abstract
This paper addresses clustering of blogs and posts in blogosphere. First, we model blogosphere as a social network where blogs and posts correspond to nodes and interactions on posts by blogs corresponds to links. Next, for clustering in blogosphere, we employ LinkClus, a link based algorithm that finds clusters of nodes in a network effectively and efficiently. For more accurate clustering, we propose two refinements: (1) change of granularity from blogs to folders, and (2) removal of blogs and posts being highly likely to incur noises. Finally, we verify the effectiveness of the proposed approach by showing how the posts and blogs in the same cluster are similar to one another in terms of their contents.
Keywords
데이터 마이닝;링크 기반 클러스터링;블로그 공간;
Citations & Related Records
연도 인용수 순위
  • Reference
1 J. Han and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann, 2006
2 G. Jeh and J. Widom, 'SimRank: A Measure of Structural-Context Similarity,' In Proc. Int'l. Conf. on Special Interest Group on Knowledge Discovery and Data, pp. 538-543, 2002   DOI
3 S. Herring et al., Conversations in the Blogosphere: An Analysis 'From the Bottom Up,' In Proc. of the 38th Annual Hawaii Int'l. Conf. on System Scicences, pp. 107b, 2005
4 Y. Lin, 'Blog Community Discovery and Evolution based on Mutual Awareness Expansion,' In Proc. Int'l. Conf. on Web Intelligence, pp. 48-56, 2007   DOI
5 K. Fujimura, T. Inoue, and M. Sugisaki, 'The Eigenrumor Algorithm for Ranking Blogs,' In Proc. Int'l. Conf. on World Wide Web, 2005
6 J. Wang and J. Han, 'CLOSET+: Searching for the Best Strategies for Mining Frequent Closed Itemsets,' In Proc. Int'l. Conf. on Special Interest Group on Knowledge Discovery and Data, pp. 236-245, 2003   DOI
7 A. Chin and M. Chignell, 'A Social Hypertext Model for Finding Community in Blogs', In Proc. Int'l. Conf. on Hypertext and Hypermedia, pp. 11-22, 2006   DOI
8 R. Kumar et al., 'Trawling the Web for Emerging Cyber-Communities,' In Proc. Int'l. Conf. on World Wide Web, pp. 1481-1493, 1999   DOI   ScienceOn
9 S. Gardner, Buzz Marketing With Blogs for Dummies, John Wiley & Sons Inc, 2005
10 N. Pasquier et al., 'Discovering Frequent Closed Itemsets for Association Rules,' In Proc. Int'l. Conf. on Database Theory, pp. 398-416, 1999   DOI   ScienceOn
11 H. Small, 'Co-citation in the Scientific Literature: A new Measure of the Relationship between Two Documents,' Journal of the American Society for Information Science, Vol. 24, No. 4, pp. 265-269, 1973   DOI   ScienceOn
12 D. Gruhl et al., 'Information Diffusion Through Blogspace' In Proc. Int'l. Conf. on World Wide Web, pp. 491-501, 2004   DOI
13 M. Kessler, 'Bibliographic Coupling Between Scientific Papers,' Journal of the American Documentation, Vol. 14, No. 1, pp. 10-25, 1963   DOI
14 X. Yin, J. Han, and P. Yu, 'LinkClus: Efficient Clustering via Heterogeneous Semantic Links,' In Proc. Int'l. Conf. on Very Large Data Bases, pp. 427-438, 2006
15 NHN(주), http://blog.naver.com, 2009.
16 J. Wang et al., 'ReCoM: Reinforcement Clustering of Multi-type Interrelated Data Objects,' In Proc. Int'l. Conf. on Special Interest Group on Information Retrieval, pp. 274-281, 2003   DOI
17 Wikipedia, blog, http://en.wikipedia.org/wiki/Blog, 2009