[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.3837/tiis.2019.11.018

Detecting Malicious Social Robots with Generative Adversarial Networks

Wu, Bin (School of Cyberspace Security, Beijing University of Posts and Telecommunications)
Liu, Le (School of Cyberspace Security, Beijing University of Posts and Telecommunications)
Dai, Zhengge (Telecommunication Engineering with Management, Beijing University of Posts and Telecommunications)
Wang, Xiujuan (Faculty of Information Technology, Beijing University of Technology)
Zheng, Kangfeng (School of Cyberspace Security, Beijing University of Posts and Telecommunications)

Publication Information

KSII Transactions on Internet and Information Systems (TIIS) / v.13, no.11, 2019 , pp. 5594-5615 More about this Journal

Abstract

Malicious social robots, which are disseminators of malicious information on social networks, seriously affect information security and network environments. The detection of malicious social robots is a hot topic and a significant concern for researchers. A method based on classification has been widely used for social robot detection. However, this method of classification is limited by an unbalanced data set in which legitimate, negative samples outnumber malicious robots (positive samples), which leads to unsatisfactory detection results. This paper proposes the use of generative adversarial networks (GANs) to extend the unbalanced data sets before training classifiers to improve the detection of social robots. Five popular oversampling algorithms were compared in the experiments, and the effects of imbalance degree and the expansion ratio of the original data on oversampling were studied. The experimental results showed that the proposed method achieved better detection performance compared with other algorithms in terms of the F1 measure. The GAN method also performed well when the imbalance degree was smaller than 15%.

Keywords

malicious robots; social robots detection; generative adversarial networks; supervised classification; unbalanced data;

Citations & Related Records

Reference

1	Thomas K Landauer, Peter W. Foltz and Darrell Laham, "An introduction to latent semantic analysis," Discourse Processes, vol. 25, no. 2-3, pp. 259-284, November, 2009.
2	Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville and Yoshua Bengio, "Generative adversarial nets," in Proc. of Annual Conference on Neural Information Processing Systems, December 8-13, 2014.
3	Ugo Fiore, Alfredo De Santis, Francesca Perla, Paolo Zanetti and Francesco Palmieri, "Using generative adversarial networks for improving classification effectiveness in credit card fraud detection," Information Sciences, vol. 479, pp. 448-455, April, 2019. DOI
4	Stefano Cresci, Roberto Di Pietro, Marinella Petrocchi, Angelo Spognardi and Maurizio Tesconi, "The paradigm-shift of social spambots: Evidence, theories, and tools for the arms race," in Proc. of the 26th International Conference on World Wide Web Companion, pp.963-972, April 3-7, 2017.
5	Nitesh V. Chawla, Kevin W. Bowyer and Lawrence O. Hall, "SMOTE: Synthetic Minority Over-sampling Technique," Journal of Artificial Intelligence Research, vol. 16, pp.321-357, January, 2002. DOI
6	Hui Han, Wen Yuan Wang, Bing Huan Mao, "Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning," in Proc. of International Conference on Intelligent Computing, pp.878-887, August 23-26, 2005.
7	Seyda Ertekin, Jian Huang, Leon Bottou and C. Lee Giles, "Learning on the Border: Active Learning in Imbalanced Data Classification," in Proc. of the Sixteenth ACM Conference on Information and Knowledge Management, pp. 127-136, November 6-10, 2007.
8	Seyda Ertekin, Jian Huang and C. Lee Giles, "Active Learning for Class Imbalance Problem," in Proc. of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pp.823-824, July 23-27, 2007.
9	Fabian Pedregosa, Gael Varoquaux, Alexandre Gramfort and Vincent Michel, "Scikit-learn: Machine Learning in Python," Journal of Machine Learning Research, vol.12, pp.2825-2830, February 2011.
10	Haibo He, Yang Bai, Edwardo A. Garcia, and Shutao Li, "ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning," in Proc. of International Joint Conference on Neural Networks, pp.1322-1328, July 1-6, 2008.
11	Zafar Gilani, Reza Farahbakhsh, Gareth Tyson, Liang Wang and Jon Crowcroft, "Of Bots and Humans (on Twitter)," in Proc. of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp.349-354, July 31- August 3, 2017.
12	Christopher A. Cassa, Rumi Chunara, Kenneth Mandl, and John S. Brownstein, "Twitter as a sentinel in emergency situations: lessons from the Boston marathon explosions," PLoS Current, vol. 5, July, 2013.
13	Michael Conover, Jacob Ratkiewicz, Matthew Francisco, Bruno Goncalves, Filippo Menczer, and Alessandro Flammini, "Political polarization on Twitter," in Proc. of the 5th International AAAI Conference on Weblogs and Social Media, pp. 89-96, July 17-21, 2011.
14	Z. Chu, S. Gianvecchio, H. Wang, and S. Jajodia, "Who is tweeting on twitter: Human, bot, or cyborg?" in Proc. of the 26th Annual Computer Security Applications Conference, pp.21-30, December 6-10, 2010.
15	Onur Varol, Emilio Ferrara, Clayton A. Davis, Filippo Menczer and Alessandro Flammini, "Online human-bot interactions: detection, estimation, and characterization," in Proc. of the Eleventh International AAAI Conference on Web and Social Media, pp.280-289, May 15-18, 2017.
16	Zhi Yang, Christo Wilson, Xiao Wang, Tingting Gao, Ben Y. Zhao, and Yafei Dai, "Uncovering social network sybils in the wild," in Proc. of the 2011 ACM SIGCOMM conference on Internet measurement conference, pp.259-268, November 2-4, 2011.
17	V. S. Subrahmanian, A. Azaria, S. Durst, V. Kagan, A. Galstyan, K. Lerman, L. Zhu, E. Ferrara, A. Flammini, and F. Menczer, "The darpa twitter bot challenge," Computer, vol.49, no. 6, pp.38-46, June, 2016. DOI
18	Gang Wang, Tristan Konolige, Christo Wilson, Xiao Wang, Haitao Zheng, and Ben Y Zhao, "You Are How You Click: Clickstream Analysis for Sybil Detection," in Proc. of the 22nd USENIX Security Symposium, pp.241-256, August 14-16, 2013.
19	R. Zafarani and H. Liu, "10 Bits of Surprise: Detecting Malicious Users with Minimum Information," in Proc. of the 24th ACM International on Conference on Information and Knowledge Management, pp.423-431, October 18-23, 2015.
20	E. M. Clark, J. R. Williams, R. A. Galbraith, C. A. Jones, C. M. Danforth, and P. S. Dodds, "Sifting robotic from organic text: a natural language approach for detecting automation on twitter," Journal of Computational Science, vol. 16, pp.1-7, September 2016. DOI
21	Qiang Cao, Michael Sirivianos, Xiaowei Yang and Tiago Pregueiro, "Aiding the Detection of Fake Accounts in Large Scale Social Online Services," in Proc. of the 10th USENIX Symposium on Networked Systems Design and Implementation, pp.15-15, April 25-27, 2012.
22	C Cai , L Li and D Zengi, "Behavior Enhanced Deep Bot Detection in Social Media," in Proc. of IEEE International Conference on Intelligence and Security Informatics, pp.128-130, July 22-24, 2017.
23	Nikan Chavoshi, Hossein Hamooni and Abdullah Mueen, "DeBot: Twitter bot detection via warped correlation," in Proc. of the 16th IEEE International Conference on Data Mining, pp.817-822, December 12-15, 2016.
24	Sneha Kudugunta and Emilio Ferrara, "Deep Neural Networks for Bot Detection," Information Sciences, vol. 467, pp.312-322, October, 2018. DOI
25	Alex Beutel, Wanhong Xu, Venkatesan Guruswami, Christopher Palow and Christos Faloutsos, "CopyCatch: stopping group attacks by spotting lockstep behavior in social networks," in Proc. of the 22nd international conference on World Wide Web, pp.119-130, May 13-17, 2013.
26	Luan Quoc Tran, Xi Yin and Xiaoming Liu, "Disentangled Representation Learning GAN for Pose-Invariant Face Recognition," in Proc. of IEEE Computer Vision and Pattern Recognition, pp. 1415-1424, July 22-25, 2017.
27	Christoforos C. Charalambous and Anil A. Bharath, "A data augmentation methodology for training machine/deep learning gait recognition algorithms," in Proc. of the British Machine Vision Conference (BMVC), pp. 110.1-110.12, September 19-22, 2016.
28	Joseph Lemley, Shabab Bazrafkan and Peter Corcoran, "Smart Augmentation Learning an Optimal Data Augmentation Strategy," IEEE Access, vol. 5, pp. 5858-5869, March 2017. DOI
29	Antreas Antoniou, Amos Storkey and Harrison Edwards, "Data Augmentation Generative Adversarial Networks," arXiv:1711.04340, 2017.
30	Bushra Zafar, Rehan Ashraf, Nouman Ali, Mudassar Ahmad, Sohail Jabbar and Savvas A. Chatzichristofis, "Image classification by addition of spatial information based on histograms of orthogonal vectors," PLoS ONE, vol. 13, no. 6, June 2018.e01198175.
31	Biao Lenga, Kai Yua and Jingyan Qin, "Data augmentation for unbalanced face recognition training sets," Neurocomputing, vol. 235, pp. 10-14, December 2016. DOI
32	Wei-Ning Hsu, Yu Zhang and James Glass, "Unsupervised Domain Adaptation for Robust Speech Recognition via Variational Autoencoder-Based Data Augmentation," in Proc. of IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Dec 16-20, 2017.
33	Xiaodong Cui, Vaibhava Goel and Brian Kingsbury, "Data Augmentation for deep neural network acoustic modeling," in Proc. Of IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 23, no. 9, pp. 1469-1477, Sept 2015.
34	Marzieh Fadaee, Arianna Bisazza, and Christof Monz, "Data Augmentation for Low-Resource Neural Machine Translation," in Proc. of Association for Computational Linguistics, pp. 567-573, July 30-August 4, 2017.