Browse > Article
http://dx.doi.org/10.6109/jkiice.2011.15.7.1531

A Swearword Filter System for Online Game Chatting  

Lee, Song-Wook (충주대학교)
Abstract
We propose an automatic swearword filter system for online game chatting by using Support Vector Machines(SVM). We collected chatting sentences from online games and tagged them as normal sentences or swearword included sentences. We use n-gram syllables and lexical-part of speech (POS) tags of a word as features and select useful features by chi square statistics. Each selected feature is represented as binary weight and used in training SVM. SVM classifies each chatting sentence as swearword included one or not. In experiment, we acquired overall 90.4% of F1 accuracy.
Keywords
online game; swearword filter; support vector machine; chi square statistics;
Citations & Related Records
Times Cited By KSCI : 4  (Citation Analysis)
연도 인용수 순위
1 이성욱, "지지벡터기계를 이용한 스팸 블로그 (Splog) 판별 시스템", 한국해양정보통신학회 논문지, 제15권, 제1호, pp.163-168, 2011
2 은종민, 이성욱, 서정연, "지지벡터기계(Support Vector Machines)를 이용한 한국어 화행분석", 정보처리학회논문지, 제.12-B권, 제3호, pp.365-368, 2005.
3 Y. Yang and Jan O. Pedersen. "A comparative study on Feature selection in text categorization," Proceedings of the 14th International conference on Machine Learning, 1997.
4 http://www.csie.ntu.edu.tw/-cjlin/libsvm, 2009.
5 G. V. Cormack and T. R. Lynam. "TREC 2005 spam track overview," Proceedings of Text REtrieval Conference, 2005.
6 http://www.zdnet.co.kr/news/news_view.asp?artice_id =20110105084601, 2011.01.05.
7 http://www.edaily.co.kr/news/NewsRead.edy?SCD=DB41&newsid=01922086589626600, 2009.03.24.
8 http://www.ajnews.co.kr/view.jsp?newsId=20101021000646, 2010.10.21.
9 이성욱, "카이제곱 통계량과 지지벡터기계를 이용한 스팸메일 필터", 정보처리학회논문지, 제17-B권, 제3호, pp.249-254, 2010.
10 V. Keselj, E. Milios, A. Tuttle, S. Wang, and R. Zhang, "TREC 2005 Spam Track: Spam Filtering Using N-gram-based Techniques", Proceedings of Text REtrieval Conference, 2005.
11 김현준, 정재은, 조근식, "가중치가 부여된 베이지 안 분류자를 이용한 스팸 메일 필터링 시스템", 정보과학회논문지, 제31권 8호, 2004, pp.1092-1100.
12 R. Segal, "IBM SpamGuru on the TREC 2005 Spam Track", Proceedings of Text REtrieval Conference, 2005.
13 A. Brakto and B. Filipic, "Spam Filtering Using Character-Level Markov Models: Experiments for the TREC 2005 Spam Track", Proceedings of Text REtrieval Conference, 2005.
14 L. A. Breyer, "DBACL at the TREC 2005", Proceedings of Text REtrieval Conference, 2005.
15 P. Kolari, A. Java, and T. Finin, "Characterizing the splogosphere", Proceedings of WWW 2006, 3rd Annual Workshop on the Webloggging Ecosystem: Aggregation, Analysis and Dynamics. 2006.
16 F. Assis, W. Yerazunis, C. Siefkes, and S. Chhabra, "CRM114 versus Mr. X: CRM114 Notes for the TREC 2005 Spam Track", Proceedings of Text REtrieval Conference, 2005.
17 W. Cao, A. An, and X. Huang, "York University at TREC 2005: SPAM Track", Proceedings of Text REtrieval Conference, 2005.