Browse > Article
http://dx.doi.org/10.7232/JKIIE.2017.43.3.192

Facebook Spam Post Filtering based on Instagram-based Transfer Learning and Meta Information of Posts  

Kim, Junhong (School of Industrial Management Engineering, Korea University)
Seo, Deokseong (School of Industrial Management Engineering, Korea University)
Kim, Haedong (School of Industrial Management Engineering, Korea University)
Kang, Pilsung (School of Industrial Management Engineering, Korea University)
Publication Information
Journal of Korean Institute of Industrial Engineers / v.43, no.3, 2017 , pp. 192-202 More about this Journal
Abstract
This study develops a text spam filtering system for Facebook based on two variable categories: keywords learned from Instagram and meta-information of Facebook posts. Since there is no explicit labels for spam/ham posts, we utilize hash tags in Instagram to train classification models. In addition, the filtering accuracy is enhanced by considering meta-information of Facebook posts. To verify the proposed filtering system, we conduct an empirical experiment based on a total of 1,795,067 and 761,861 Facebook and Instagram documents, respectively. Employing random forest as a base classification algorithm, experimental result shows that the proposed filtering system yield 99% and 98% in terms of filtering accuracy and F1-measure, respectively. We expect that the proposed filtering scheme can be applied other web services suffering from massive spam posts but no explicit spam labels are available.
Keywords
Spam Filtering; Facebook; Instagram; Hash Tag; Random Forest; Transfer Learning;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 Shannon, C. E. (2001), A mathematical theory of communication, ACM SIGMOBILE Mobile Computing and Communications Review, 5(1), 3-55.   DOI
2 Soiraya, M., Thanalerdmongkol, S., and Chantrapornchai, C. (2012), Using a Data Mining Approach : Spam Detection on Facebook, International Journal of Computer Applications, 58(13), 26-31.   DOI
3 Stringhini, G., Kruegel, C., and Vigna G. (2010), Detecting spammers on social networks, Proceedings of the 26th Annual Computer Security Applications Conference, 1-9.
4 Yang, C., Harkreader, R. C., and Gu, G. (2011), Die free or live hard? empirical evaluation and new design for fighting evolving twitter spammers, In International Workshop on Recent Advances in Intrusion Detection, 318-337.
5 Yang, C., Harkreader, R. C., and Gu, G. (2013), Empirical evaluation and new design for fighting evolving Twitter spammers, IEEE Transactions on Information Forensics and Security, 8(8), 1280-1293.   DOI
6 Zhang, X., Li, Z., Zhu, S., and Liang, W. (2016), Detecting spam and promoting campaigns in Twitter, ACM Transactions on the Web (TWEB), 10(1), 4:1-28.
7 Zheng, X., Zeng, Z., Chen, Z., Yu, Y., and Rong, C. (2015), Detecting spammers on social networks, Neurocomputing, 159, 27-34.   DOI
8 Breiman, L. (2001), Random Forests, Machine Learning, 45(1), 5-32.   DOI
9 Gao, H., Chen, Y., Lee, K., Palsetia, D., and Choudhary, A. N. (2012), Towards Online Spam Filtering in Social Networks, In NDSS 12, 1-16.
10 Fernandez-Delgado. M. and Cernadas. E. (2014), Do we Need Hundreds of Classifiers to Solve Real World Classification Problems?, Journal of Machine Learning Research, 15, 3133-3181.
11 Jo, C. Y. (2011), A Semiotic Study for New Media-applied to the case for Social Network Service, Semiotic Inquiry, 30, 125-154.
12 Joe, I. H. and Shim, H. T. (2009), A SVM-based Spam Filtering System for Short Message Service, The Korean Institute of Communications and Information Sciences, 34(9), 908-913.
13 Lee, S. W. (2010), Spam Filter by Using X2 Statistics and Support Vector Machines, The KIPS transactions, 17(3), 249-254.
14 Kanaris, I., Kanaris, K., and Stamatatos, E. (2006), Spam detection using character n-grams, Hellenic conference on artificial intelligence, 3955, 95-104.
15 Lee, H. N., Song, M. G., and Im, E. G. (2011a), A Study on Structuring Spam Short Message Service(SMS) filter, The Korean Institute of Communications and Information Sciences, 1072-1073.
16 Lee, S. J. and Choi, D. J. (2011b), Personalized Mobile Junk Message Filtering System, The Journal of the Korea Contents Association, 11(12), 122-135.   DOI
17 Oh, Y. H., Kim, H., Yoon, J. S., and Lee, J. S. (2014), Using Data Mining Techniques to Predict Win-Loss in Korean Professional Baseball Games, Journal of Korean Institute of Industrial Engineers, 40(1), 8-17.   DOI
18 Quan, X., Liu, W., and Qiu, B. (2011), Term Weighting Schemes for Question Categorization, IEEE Transactions on Pattern Analysis and Machine Intelligence archive, 33(5), 1009-1021.   DOI