Personalized Anti-spam Filter Considering Users' Different Preferences

  • Kim, Jong-Wan (Division of Computer and Information Technology, Daegu University)
  • Received : 2010.01.15
  • Accepted : 2010.03.16
  • Published : 2010.06.30

Abstract

Conventional filters using email header and body information equally judge whether an incoming email is spam or not. However this is unrealistic in everyday life because each person has different criteria to judge what is spam or not. To resolve this problem, we consider user preference information as well as email category information derived from the email content. In this paper, we have developed a personalized anti-spam system using ontologies constructed from rules derived in a data mining process. The reason why traditional content-based filters are not applicable to the proposed experimental situation is described. In also, several experiments constructing classifiers to decide email category and comparing classification rule learners are performed. Especially, an ID3 decision tree algorithm improved the overall accuracy around 17% compared to a conventional SVM text miner on the decision of email category. Some discussions about the axioms generated from the experimental dataset are given too.

Keywords

References

  1. G. Cormack and T. Lynam, "On-line Supervised Sparn Filter Evaluation," ACM Trans. on Information Systems, Vol.25, No.3, article 11, 2007. https://doi.org/10.1145/1247715.1247717
  2. A. Gray and M. Haahr, "Personalized, Collaborative Spam Filtering," Proc. of the First Conference on Email and Anti-Spam, Mountain View, CA, 2004.
  3. J. Ravi, W. Shi, and C. Xu, "Personalized Email Management at Network Edges," IEEE Internet Computing, Vol.9, No.2, pp.54-60, 2005. https://doi.org/10.1109/MIC.2005.44
  4. Anti-Scam Firewall, http://www.barracudanetworks.com/ns/products/anti_spam_tech.php,
  5. J. Kim, D. Dou, H. Liu, and D. Kwak, "Constructing A User Preference Ontology for Anti-spam Mail Systems," Lecture Notes in Artificial Intelligence, Vol.4509, pp.272-283, 2007.
  6. R. Segal, "Combining Global and Personal Anti-Spam Filtering," Proc. of the 4th Conf. on Email and Anti-Spam, http//www.ceas.ccl papers-2007/, 2007
  7. I. H. Witten and E. Frank, Data Mining: Practical machine learning tools and Techniques with java implementations, 2nd Ed., Morgan Kaufmann, San Francisco, CA, 2005.
  8. J. Kim, "A Method to Minimize Classification Rules Based on Data Mining and Logic Synthesis," Journal of Korea Multimedia Society, Vol.11, No.12, pp.1739-1748, 2008.
  9. D. Dou, V. McDermott, and P. Qi, "Ontology translation on the semantic web," Journal of Data Semantics, Vol.2, pp. 35-57, 2004.
  10. T. Gruber, "Toward Principles for the Design of Ontologies Used for Knowledge Sharing," Int. Journal of Human-Computer Studies, Vol.43, pp.907-928, 1995 https://doi.org/10.1006/ijhc.1995.1081
  11. J. Kim, "From Computing Distribution of Email Responses for Each User To Construct User Preference based Anti-spam Mail System," Journal of Korean Institute of Intelligent Systems, Vol.19, No.3, pp.343-349, 2009. (in Korean) https://doi.org/10.5391/JKIIS.2009.19.3.343
  12. H. Drucker, D. Wu, and V. Vapnik, "Support Vector Machines for Spam Categorization," IEEE Trans. on Neural Networks, Vol.10, No.5, pp. 1048-1054, 1999. https://doi.org/10.1109/72.788645
  13. R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval, Addison Wesley, 1999.