Analyzing the correlation of Spam Recall and Thesaurus

  • Kang, Sin-Jae (School of Computer and Information Technology, Daegu University) ;
  • Kim, Jong-Wan (School of Computer and Information Technology, Daegu University)
  • 발행 : 2005.11.25

초록

In this paper, we constructed a two-phase spam-mail filtering system based on the lexical and conceptual information. There are two kinds of information that can distinguish the spam mail from the legitimate mail. The definite information is the mail sender's information, URL, a certain spam list, and the less definite information is the word list and concept codes extracted from the mail body. We first classified the spam mail by using the definite information, and then used the less definite information. We used the lexical information and concept codes contained in the email body for SVM learning in the $2^{nd}$ phase. According to our results the spam precision was increased if more lexical information was used as features, and the spam recall was increased when the concept codes were included in features as well.

키워드