DOI QR코드

DOI QR Code

Comparison of Machine Learning Techniques for Cyberbullying Detection on YouTube Arabic Comments

  • Alsubait, Tahani (College of Computer and Information Systems, Umm Al-Qura University) ;
  • Alfageh, Danyah (College of Computer and Information Systems, Umm Al-Qura University)
  • Received : 2021.01.05
  • Published : 2021.01.30

Abstract

Cyberbullying is a problem that is faced in many cultures. Due to their popularity and interactive nature, social media platforms have also been affected by cyberbullying. Social media users from Arab countries have also reported being a target of cyberbullying. Machine learning techniques have been a prominent approach used by scientists to detect and battle this phenomenon. In this paper, we compare different machine learning algorithms for their performance in cyberbullying detection based on a labeled dataset of Arabic YouTube comments. Three machine learning models are considered, namely: Multinomial Naïve Bayes (MNB), Complement Naïve Bayes (CNB), and Linear Regression (LR). In addition, we experiment with two feature extraction methods, namely: Count Vectorizer and Tfidf Vectorizer. Our results show that, using count vectroizer feature extraction, the Logistic Regression model can outperform both Multinomial and Complement Naïve Bayes models. However, when using Tfidf vectorizer feature extraction, Complement Naive Bayes model can outperform the other two models.

Keywords

References

  1. UNICEF. Cyberbullying: What is it and how to stop it. Feb. 2020.
  2. Ted Feinberg and Nicole Robey. "Cyberbullying". In: The education digest74.7 (2009), p. 26.
  3. Peter K Smith, Jess Mahdavi, Manuel Carvalho, Sonja Fisher, Shanette Russell, and Neil Tippett. "Cyberbullying: Its nature and impact in secondary school pupils". In: Journal of child psychology and psychiatry 49.4 (2008), pp. 376-385. https://doi.org/10.1111/j.1469-7610.2007.01846.x
  4. Ghada M Abaido. "Cyberbullying on social media platforms among university students in the United Arab Emirates". In: International Journal of Adolescence and Youth 25.1 (2020), pp. 407-420. https://doi.org/10.1080/02673843.2019.1669059
  5. Djedjiga Mouheb, Rutana Ismail, Shaheen Al Qaraghuli, Zaher Al Aghbari, and Ibrahim Kamel. "Detection of Offensive Messages in Arabic Social Media Communications". In: 2018 International Conference on Innovations in Information Technology (IIT). IEEE. 2018, pp. 24-29.
  6. Djedjiga Mouheb, Masa Hilal Abushamleh, Maya Hilal Abushamleh, Zaher Al Aghbari, and Ibrahim Kamel. "Real-time detection of cyberbullying in Arabic twitter streams". In: 2019 10th IFIP International Conference on New Technologies, Mobility and Security (NTMS). IEEE. 2019, pp. 1-5.
  7. Djedjiga Mouheb, Raghad Albarghash, Mohamad Fouzi Mowakeh, ZaherAl Aghbari, and Ibrahim Kamel. "Detection of Arabic Cyberbullying on Social Networks using Machine Learning". In:2019 IEEE/ACS 16th International Conference on Computer Systems and Applications (AICCSA). IEEE. 2019, pp. 1-5.
  8. Batoul Haidar, Maroun Chamoun, and Ahmed Serhrouchni. "Arabic Cyberbullying Detection: Enhancing Performance by Using Ensemble Machine Learning". In: 2019 International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (Smart-Data). IEEE. 2019, pp. 323-327.
  9. Batoul Haidar, Maroun Chamoun, and Ahmed Serhrouchni. "A multilingual system for cyberbullying detection: Arabic content detection using machine learning". In: Advances in Science, Technology and Engineering Systems Journal 2.6 (2017), pp. 275-284. https://doi.org/10.25046/aj020634
  10. Benaissa Azzeddine Rachid, Harbaoui Azza, and Hajjami Henda Ben Ghezala."Classification of Cyberbullying Text in Arabic". In: 2020 International Joint Conference on Neural Networks (IJCNN). IEEE. 2020, pp. 1-7.
  11. Azalden Alakrot, Liam Murray, and Nikola S Nikolov. "Dataset Construction for the Detection of Anti-Social Behaviour in Online Communication in Arabic". In: Procedia computer science. vol.142 (2018), pp. 174-181. https://doi.org/10.1016/j.procs.2018.10.473
  12. Azalden Alakrot, Liam Murray, and Nikola S Nikolov. "Towards accurate detection of offensive language in online communication in Arabic". In: Procedia computer science. vol142 (2018), pp. 315-320. https://doi.org/10.1016/j.procs.2018.10.491
  13. G. Singh, B. Kumar, L. Gaur, and A. Tyagi. "Comparison between Multinomial and Bernoulli Naive Bayes for Text Classification". In:2019 International Conference on Automation, Computational and Technology Management (ICACTM). 2019, pp. 593- 596.doi:10.1109/ICACTM.2019.8776800.
  14. Jason D Rennie, Lawrence Shih, Jaime Teevan, and David R Karger. "Tack-ling the poor assumptions of naive bayes text classifiers". In: Proceedings of the 20th international conference on machine learning (ICML-03). 2003, pp. 616-623.
  15. A. Prabhat and V. Khullar. "Sentiment classification on big data using Naive bayes and logistic regression". In: 2017 International Conference on Computer Communication and Informatics (ICCCI). 2017, pp. 1-5.doi:10.1109/ICCCI.2017.8117734.
  16. Alaa Tharwat. "Classification assessment methods". In: Applied Computing and Informatics (2020).