Browse > Article
http://dx.doi.org/10.6109/jkiice.2011.15.1.163

A Splog Detection System Using Support Vector Systems  

Lee, Song-Wook (충주대학교)
Abstract
Blogs are an easy way to publish information, engage in discussions, and form communities on the Internet. Recently, there are several varieties of spam blog whose purpose is to host ads or raise the PageRank of target sites. Our purpose is to develope the system which detects these spam blogs (splogs) automatically among blogs on Web environment. After removing HTML of blogs, they are tagged by part of speech(POS) tagger. Words and their POS tags information is used as a feature type. Among features, we select useful features with X2 statistics and train the SVM with the selected features. Our system acquired 90.5% of F1 measure with SPLOG data set.
Keywords
spam blog detection; Splog; support vector machines; chi square statistics;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 Cuban,M. 2005. A splog here, a splog there, pretty soon it ads up and we all lose. [Online; accessed 22-December-2005;http://www.blogmaverick.com/ entry/1234000870054492/].
2 Kolari, P.; Java, A.; and Finin, T. 2006. Characterizing the splogosphere. In WWW 2006, 3rd Annual Workshop on the Webloggging Ecosystem: Aggregation, Analysis and Dynamics.
3 이성욱, "지지벡터기계와 카이 제곱 통계량을 이용 한 스팸 블로그 판별 시스템", 춘계 한국해양정보통 신학회 논문집, 2010.
4 Yang, Yiming and Jan O. Pedersen. A comparative study on Feature selection in text categorization. In proceedings of the 14th International conference on Machine Learning, 1997.
5 Martin Law. "A simple introduction to Support Vector Machines," PPT file, 2003.
6 Wu, B., and Davison, B. D. Identifying link farm spam pages. In WWW ''05: Special interest tracks and posters of the 14th international conference on World Wide Web, 820-829. New York: ACM Press. 2005.
7 http://www.csie.ntu.edu.tw/-cjlin/libsvm, 2009.
8 Umbria. 2005. Spam in the blogosphere. [Online;http://www.umbrialistens.com/consumer/show WhitePaper].
9 V. Vapnik. The nature of statistical learning theory, Springer, NewYork, 1995.
10 http://web.media.mit.edu/-hugo/montylingua, 2009.
11 D. Sculley, Gabriel M. Wachman. ""Relaxed online SVMs for spam filtering," Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pp.415-422, 2007.
12 이성욱, "카이 제곱 통계량과 지지벡터기계를 이용한 자동 스팸 메일 분류기", 춘계 한국해양정보통신학회 논문집, 2009.
13 은종민, 이성욱, 서정연, "지지벡터기계(Support Vector Machines)를 이용한 한국어 화행분석", 정보처리학회논문지, Vol.12-B, No.3, pp.365-368, 2005.
14 Kolari, P., Finin, T., Joshi, A., "SVMs for the Blogosphere: Blog Identification and Splog Detection", AAAI Spring Symposium on Computational Approaches to Analyzing Weblogs, 2006.