A Splog Detection System Using Support Vector Machines and $x^2$ Statistics

Lee, Song-Wook;

Proceedings of the Korean Institute of Information and Commucation Sciences Conference (한국정보통신학회:학술대회논문집)

2010.05a
/
Pages.905-908
/
2010

The Korea Institute of Information and Commucation Engineering (한국정보통신학회)

A Splog Detection System Using Support Vector Machines and $x^2$ Statistics

지지벡터기계와 카이제곱 통계량을 이용한 스팸 블로그(Splog) 판별 시스템

Lee, Song-Wook (Chungju National University)

이성욱 (충주대학교)

Published : 2010.05.27

PDF

Download PDF

⟨ Previous Next ⟩

Abstract

Our purpose is to develope the system which detects splogs automatically among blogs on Web environment. After removing HTML of blogs, they are tagged by part of speech(POS) tagger. Words and their POS tags information is used as a feature type. Among features, we select useful features with $x^2$ statistics and train the SVM with the selected features. Our system acquired 90.5% of F1 measure with SPLOG data set.

본 연구의 목적은 웹 환경에서 스팸 블로그(Splog)를 자동으로 판별하는 시스템을 개발하는 것이다. 먼저 블로그의 HTML을 제거한 후 품사를 부착하였다. 어휘/품사 쌍을 자질로 사용하였으며 카이제곱 통계량을 이용하여 유용한 자질을 선택하였다. 선택된 자질의 가중치를 벡터로 표현한 후, 지지벡터 기계(Support Vector Machines)를 학습하여 자동으로 스팸 블로그를 판별하는 시스템을 제안하였으며, SPLOG 데이터 집합으로 실험한 결과 F1척도로 90.5%의 정확률을 얻었다.

Proceedings of the Korean Institute of Information and Commucation Sciences Conference (한국정보통신학회:학술대회논문집)

A Splog Detection System Using Support Vector Machines and $x^2$ Statistics

지지벡터기계와 카이제곱 통계량을 이용한 스팸 블로그(Splog) 판별 시스템

Abstract

Keywords

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)