[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.6109/jkiice.2020.24.12.1595

Bias & Hate Speech Detection Using Deep Learning: Multi-channel CNN Modeling with Attention

Lee, Wonseok (Department of MIS, Keimyung University)
Lee, Hyunsang (School of Business, Kyungpook National University)

Publication Information

Journal of the Korea Institute of Information and Communication Engineering / v.24, no.12, 2020 , pp. 1595-1603 More about this Journal

Abstract

Online defamation incidents such as Internet news comments on portal sites, SNS, and community sites are increasing in recent years. Bias and hate expressions threaten online service users in various forms, such as invasion of privacy and personal attacks, and defamation issues. In the past few years, academia and industry have been approaching in various ways to solve this problem The purpose of this study is to build a dataset and experiment with deep learning classification modeling for detecting various bias expressions as well as hate expressions. The dataset was annotated 7 labels that 10 personnel cross-checked. In this study, each of the 7 classes in a dataset of about 137,111 Korean internet news comments is binary classified and analyzed through deep learning techniques. The Proposed technique used in this study is multi-channel CNN model with attention. As a result of the experiment, the weighted average f1 score was 70.32% of performance.

Keywords

Bias and hate expressions; CNN; Multi-channel; Attention; Internet news comments;

Citations & Related Records

Times Cited By KSCI : 4 (Citation Analysis)

Reference
Cited By KSCI

1	N. D. Gitari, Z. Zuping, H. Damien, and J. Long, "A Lexicon-Based Approach for Hate Speech Detection," International Journal of Multimedia and Ubiquitous Engineering, vol. 10, no. 4, pp. 215-230, 2015. DOI
2	W. Warner and J. Hirschberg, "Detecting Hate Speech on the World Wide Web," Paper presented at the Proceedings of the second workshop on language in social media, 2012.
3	R. Kshirsagar, T. Cukuvac, K. McKeown, and S. McGregor, "Predictive Embeddings for Hate Speech Detection on Twitter," in Proceeding of the 2018 Conference on Emprical Methods in Natural Language Processing, Brussels, pp. 1532-1543, 2018.
4	Z. Zhang, D. Robinson, and J. Tepper, "Detecting Hate Speech on Twitter Using a Convolution-Gru Based Deep Neural Network," Paper presented at the European semantic web conference, 2018.
5	P. Kapil, A. Ekbal, and D. Das. "Investigating Deep Learning Approaches for Hate Speech Detection in Social Media," in Proceeding of the 20th International Conference on Computational Linguistics and Intelligent Text Processing, La Rochelle: LR, 2020.
6	S. Hochreiter, and J. Schmidhuber, "Long Short-Term Memory," Neural computation, vol. 9, no. 8, pp. 1735-1780, 1997. DOI
7	Z. Liu, H. Huang, C. Lu, and S. Lyu. "Multichannel Cnn with Attention for Text Classification," arXiv preprint arXiv:2006.16174, 2020.
8	A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin. "Attention Is All You Need," Paper presented at the Advances in neural information processing systems, 2017.
9	Z. Lin, M. Feng, C. N. D. Santos, M. Yu, B. Xiang, B. Zhou, and Y. Bengio, "A Structured Self-Attentive Sentence Embedding," in Proceeding of the 5th International Conference on Learning Representations, Toulon, 2017.
10	Y. Kim, "Convolutional Neural Networks for Sentence Classification," in Proceeding of the 2014 Conference on Emprical Methods in Natural Language Processing, Doha, pp. 1746-1751, 2014.
11	R. K. Srivastava, K Greff, and J Schmidhuber, "Highway Networks," in Proceeding of the 32nd International Conference on Machine Learning, Lille, 2015.
12	R. J. Boeckmann, and C. T. Petrosino. "Understanding the Harm of Hate Crime," Journal of social issues, vol. 58, no. 2, pp. 207-225, 2002. DOI
13	SKTBrain, "Korean BERT pre-trained cased (KoBERT)," [Internet]. Available: https://github.com/SKTBrain/KoBERT.
14	H. J. Kim, Y. M. Yoon, and B. M. Lee, "Prediction System for Abusive Postings using Enhanced FFP," Journal of Advanced Information Technology and Convergence, vol. 9, no. 1, pp. 207-216, 2011.
15	Korean Ministry of Science and ICT, "2019 the Survey on Internet Use," 2020.
16	Korean National Police Agency. (2020) Total Cyber Crime Occurrence and Arrest Status [Internet]. Available:https://www.police.go.kr/www/open/publice/publice0204.jsp.
17	Reuters Institute, "Digital News Report 2020," 2020.
18	H. G. Kim, "The History of the Internet Real Name System in Korea," The Journal of constitutional precedents, vol. 14, pp. 157-192, 2013.
19	Hankook Research, "Toxic Comments, is it okay?," [Internet]. Available: https://hrcopinion.co.kr/archives/14589, 2020.
20	J. J. Hong, S. H. Kim, J. W. Park, and J. H. Choi, "A Malicious Comments Detection Technique on the Internet using Sentiment Analysis and SVM," Korea Institute of information and Communication Engineering, vol. 20, no. 2, pp. 260-267, 2016. DOI
21	J. H. Moon, W. I. Cho, and J. B. Lee, "Beep! Korean Corpus of Online News Comments for Toxic Speech Detection," in Proceeding of the 8th International Workshop on Natural Language Processing for Social Media, Taipei, 2020.
22	B. Pinkesh, S. Gupta, M. Gupta, and V. Varma, "Deep Learning for Hate Speech Detection in Tweets," Paper presented at the Proceedings of the 26th International Conference on World Wide Web Companion, 2017.
23	D. Thomas, D. Warmsley, M. Macy, and I. Weber, "Automated Hate Speech Detection and the Problem of Offensive Language," in Proceeding of the 11th International AAAI Conference on Web and Social Media, Montreal, pp. 512-515, 2017.
24	D. S. Park and J. W. Cha, "Semi-Supervised Learning for Detecting of Abusive Sentence on Twitter using Deep Neural Network with Fuzzy Category Representation," The Korean Institute of Information Scientists and Engineers, vol. 45, no. 11, pp. 1185-1192, 2018.

KSCI

Bias & Hate Speech Detection Using Deep Learning: Multi-channel CNN Modeling with Attention 딥러닝 기술을 활용한 차별 및 혐오 표현 탐지 : 어텐션 기반 다중 채널 CNN 모델링

Bias & Hate Speech Detection Using Deep Learning: Multi-channel CNN Modeling with Attention