Facebook Spam Post Filtering based on Instagram-based Transfer Learning and Meta Information of Posts

Kim, Junhong;Seo, Deokseong;Kim, Haedong;Kang, Pilsung;

doi:10.7232/JKIIE.2017.43.3.192

Journal of Korean Institute of Industrial Engineers (대한산업공학회지)

Volume 43 Issue 3
/
Pages.192-202
/
2017
/
1225-0988(pISSN)
/
2234-6457(eISSN)

Korean Institute of Industrial Engineers (대한산업공학회)

DOI QR Code

Facebook Spam Post Filtering based on Instagram-based Transfer Learning and Meta Information of Posts

인스타그램 기반의 전이학습과 게시글 메타 정보를 활용한 페이스북 스팸 게시글 판별

Kim, Junhong (School of Industrial Management Engineering, Korea University) ;
Seo, Deokseong (School of Industrial Management Engineering, Korea University) ;
Kim, Haedong (School of Industrial Management Engineering, Korea University) ;
Kang, Pilsung (School of Industrial Management Engineering, Korea University)

김준홍 (고려대학교 산업경영공학부) ;
서덕성 (고려대학교 산업경영공학부) ;
김해동 (고려대학교 산업경영공학부) ;
강필성 (고려대학교 산업경영공학부)

Received : 2016.08.13
Accepted : 2017.02.18
Published : 2017.06.15

https://doi.org/10.7232/JKIIE.2017.43.3.192 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

This study develops a text spam filtering system for Facebook based on two variable categories: keywords learned from Instagram and meta-information of Facebook posts. Since there is no explicit labels for spam/ham posts, we utilize hash tags in Instagram to train classification models. In addition, the filtering accuracy is enhanced by considering meta-information of Facebook posts. To verify the proposed filtering system, we conduct an empirical experiment based on a total of 1,795,067 and 761,861 Facebook and Instagram documents, respectively. Employing random forest as a base classification algorithm, experimental result shows that the proposed filtering system yield 99% and 98% in terms of filtering accuracy and F1-measure, respectively. We expect that the proposed filtering scheme can be applied other web services suffering from massive spam posts but no explicit spam labels are available.

Keywords

References

Breiman, L. (2001), Random Forests, Machine Learning, 45(1), 5-32. https://doi.org/10.1023/A:1010933404324
Fernandez-Delgado. M. and Cernadas. E. (2014), Do we Need Hundreds of Classifiers to Solve Real World Classification Problems?, Journal of Machine Learning Research, 15, 3133-3181.
Gao, H., Chen, Y., Lee, K., Palsetia, D., and Choudhary, A. N. (2012), Towards Online Spam Filtering in Social Networks, In NDSS 12, 1-16.
Jo, C. Y. (2011), A Semiotic Study for New Media-applied to the case for Social Network Service, Semiotic Inquiry, 30, 125-154.
Joe, I. H. and Shim, H. T. (2009), A SVM-based Spam Filtering System for Short Message Service, The Korean Institute of Communications and Information Sciences, 34(9), 908-913.
Kanaris, I., Kanaris, K., and Stamatatos, E. (2006), Spam detection using character n-grams, Hellenic conference on artificial intelligence, 3955, 95-104.
Lee, H. N., Song, M. G., and Im, E. G. (2011a), A Study on Structuring Spam Short Message Service(SMS) filter, The Korean Institute of Communications and Information Sciences, 1072-1073.
Lee, S. J. and Choi, D. J. (2011b), Personalized Mobile Junk Message Filtering System, The Journal of the Korea Contents Association, 11(12), 122-135. https://doi.org/10.5392/JKCA.2011.11.12.122
Lee, S. W. (2010), Spam Filter by Using X2 Statistics and Support Vector Machines, The KIPS transactions, 17(3), 249-254.
Oh, Y. H., Kim, H., Yoon, J. S., and Lee, J. S. (2014), Using Data Mining Techniques to Predict Win-Loss in Korean Professional Baseball Games, Journal of Korean Institute of Industrial Engineers, 40(1), 8-17. https://doi.org/10.7232/JKIIE.2014.40.1.008
Quan, X., Liu, W., and Qiu, B. (2011), Term Weighting Schemes for Question Categorization, IEEE Transactions on Pattern Analysis and Machine Intelligence archive, 33(5), 1009-1021. https://doi.org/10.1109/TPAMI.2010.154
Shannon, C. E. (2001), A mathematical theory of communication, ACM SIGMOBILE Mobile Computing and Communications Review, 5(1), 3-55. https://doi.org/10.1145/584091.584093
Soiraya, M., Thanalerdmongkol, S., and Chantrapornchai, C. (2012), Using a Data Mining Approach : Spam Detection on Facebook, International Journal of Computer Applications, 58(13), 26-31. https://doi.org/10.5120/9343-3660
Stringhini, G., Kruegel, C., and Vigna G. (2010), Detecting spammers on social networks, Proceedings of the 26th Annual Computer Security Applications Conference, 1-9.
Yang, C., Harkreader, R. C., and Gu, G. (2011), Die free or live hard? empirical evaluation and new design for fighting evolving twitter spammers, In International Workshop on Recent Advances in Intrusion Detection, 318-337.
Yang, C., Harkreader, R. C., and Gu, G. (2013), Empirical evaluation and new design for fighting evolving Twitter spammers, IEEE Transactions on Information Forensics and Security, 8(8), 1280-1293. https://doi.org/10.1109/TIFS.2013.2267732
Zhang, X., Li, Z., Zhu, S., and Liang, W. (2016), Detecting spam and promoting campaigns in Twitter, ACM Transactions on the Web (TWEB), 10(1), 4:1-28.
Zheng, X., Zeng, Z., Chen, Z., Yu, Y., and Rong, C. (2015), Detecting spammers on social networks, Neurocomputing, 159, 27-34. https://doi.org/10.1016/j.neucom.2015.02.047

Journal of Korean Institute of Industrial Engineers (대한산업공학회지)

Facebook Spam Post Filtering based on Instagram-based Transfer Learning and Meta Information of Posts

인스타그램 기반의 전이학습과 게시글 메타 정보를 활용한 페이스북 스팸 게시글 판별

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)