Browse > Article

Comments Classification System using Topic Signature  

Bae, Min-Young (창원대학교 컴퓨터공학과)
Cha, Jeong-Won (창원대학교 컴퓨터공학과)
Abstract
In this work, we describe comments classification system using topic signature. Topic signature is widely used for selecting feature in document classification and summarization. Comments are short and have so many word spacing errors, special characters. We firstly convert comments into 7-gram. We consider the 7-gram as sentence. We convert the 7-gram into 3-gram. We consider the 3-gram as word. We select key feature using topic signature and classify new inputs by the Naive Bayesian method. From the result of experiments, we can see that the proposed method is outstanding over the previous methods.
Keywords
comment classification; machine learning; topic signature; n-gram;
Citations & Related Records
연도 인용수 순위
  • Reference
1 comment and trackback spam statistics, http://akismet.com/stats/
2 Bo Pang, Lillian Lee and Shivakumar Vaithyanathan. Thumbs up? Sentiment Classification using Machine Learning Techniques. EMNLP. pp.79-86. 2002
3 Soo-Min Kim and Eduard Hovy. Automatic Detection of Opinion Bearing Words and Sentences. IJCNLP. pp.61-66. 2005
4 Ryan McDonald, Kerry Hannan, Tyler Neylon, Mike Wells and Jeff Reynar. Structured Models for Fine-to-Coarse Sentiment Analysis. EMNLP - CoNLL. pp.432-439. 2007
5 Spam in blogs, Wikipedia. http://en.wikipedia.org/ wiki/Spam_in_blogs
6 Soo-Min Kim and Eduard Hovy. Determining the Sentiment of Opinions. COLING. pp.1367-1373. 2004
7 전희원, 임해창. 본문과 덧글의 동시출현 자질을 이용한 역 카이 제곱 기반 블로그 덧글 스팸 필터 시스템. 한글 및 한국어 정보처리 학술대회 19th. pp.122-127. 2007
8 Movable Type Black Filter, with content filtering http://www.jayallen.org/projects/mt-blacklist/
9 Mishne G., D. Carmel. Blocking Blog Spam with Language Model Disagreement. 1st International Workshop on Adversarial Information Retrieval on the Web. pp.1-6. 2005
10 Chin-Yew Lin and Eduard Hovy. The Automated Acquisition of Topic Signatures for Text Summarization. COLING 18th. pp.495-500. 2000
11 김묘실, 강승식. SVM을 이용한 악성 댓글 판별 시스템의 설계 및 구현. 한글 및 한국어 정보처리 학술대회 18th. pp.285-289. 2006
12 MIT Spam Conference 2007. http://www.spamconference.org/
13 Preventing comment spam using "nofollow" tag (2005). http://googleblog.blogspot.com/2005/01/preventing-comment-spam.html