Browse > Article
http://dx.doi.org/10.6109/jkiice.2020.24.5.576

Ensemble Machine Learning Model Based YouTube Spam Comment Detection  

Jeong, Min Chul (Department of Digital Media, Ajou University)
Lee, Jihyeon (Department of English Language and Literature, Ajou University)
Oh, Hayoung (Global Convergence, Sungkyunkwan University)
Abstract
This paper proposes a technique to determine the spam comments on YouTube, which have recently seen tremendous growth. On YouTube, the spammers appeared to promote their channels or videos in popular videos or leave comments unrelated to the video, as it is possible to monetize through advertising. YouTube is running and operating its own spam blocking system, but still has failed to block them properly and efficiently. Therefore, we examined related studies on YouTube spam comment screening and conducted classification experiments with six different machine learning techniques (Decision tree, Logistic regression, Bernoulli Naive Bayes, Random Forest, Support vector machine with linear kernel, Support vector machine with Gaussian kernel) and ensemble model combining these techniques in the comment data from popular music videos - Psy, Katy Perry, LMFAO, Eminem and Shakira.
Keywords
Data analysis; Classification; Spam Comment; Ensemble Machine Learning; Youtube comment;
Citations & Related Records
연도 인용수 순위
  • Reference
1 A. Rafaqat, "Spammer Detection: A Study of Spam Filter Comments on YouTube Videos", Lahore Garrison Education System, May 2019, 1-6.
2 Project jupyter [Internet] Available: https://jupyter.org/
3 Welcome to Python.org [Internet] Available: https://python.org/
4 Scikit-learn: machine learning in python [Internet] Available: https://scikit-learn.org/stable/
5 YouTube Spam Collection v.1, [Internet] Available: http://dcomp.sor.ufscar.br/talmeida/youtubespamcollection
6 YouTube Spam Collection, [Internet] Available: http://www.dt.fee.unicamp.br/-tiago//youtubespamcollection/
7 Y. J. Jang, H. J. Kim, and H. J. Jo, "Data Mining", KNOU PRESS, 2016, 1-200.
8 KBS NEWS [Internet] Available: https://mn.kbs.co.kr/news/view.do?ncd=4260664
9 YouTube Help, [Internet] Available: https://support.google.com/youtube/answer/72857?hl=ko
10 M. S. Patil, and A. M. Bagade, "Online review spam detection using language model and feature selection." International Journal of Computer Applications, 59(7), December 2012, 1-4.   DOI
11 M. Mishne, G. Carmel, D. David, L. Ronny, "Blocking Blog Spam with Language Model Disagreement.", ACM Transactions on Multimedia Computing, Communications, and Applications, May, 2005, 1-6.
12 T. Bogers and D. B. Van, "Using Language Models for Spam Detection in Social Book marking", Proceedings of ECML/PKDD Discovery Challenge Workshop, 2008, 1-12.
13 P. S. Kiran, "Detecting spammers in YouTube : A study to find spam content in a video platform", IOSR Journal of Engineering (IOSRJEN), 05(07), July 2015, 26-30.
14 Y. Yusof and O. H. Sadoon, "Detecting video spammers in youtube social media", Proceedings of the 6th International Conference of Computing & Informatics, April 2017, 228-235.
15 A. Shreyas, and S. Nisha, "N-Gram Assisted Youtube Spam Comment Detection", Procedia Computer Science, 132, Jan 2018, 174-182.   DOI
16 A. Tulio, L. Johannes and A. Tiago, "TubeSpam: Comment Spam Filtering on YouTube", IEEE 14th International Conference on Machine Learning and Applications (ICMLA), Dec 2015, 1-6.
17 A. Thulfiqar, and A. Hussein, and Q. Samir, "YouTube spam comments detection using Artificial Neural Network", Journal of Engineering and Applied Sciences, 13(22), 2018, 9638-9642.
18 Bag-of-words model [Internet] Available: https://en.wikipedia.org/wiki/Bag-of-words_model