DOI QR코드

DOI QR Code

Detection of Incivility based on Attention-embedding and multi-channel CNN

어텐션임베딩과 다채널 CNN 기반 반시민성 검출 알고리즘

  • Park, Youn-Jung (Global Convergence Contents Research Center, Sungkyunkwan University) ;
  • Lee, Se-Young (Department of media communication, Sungkyunkwan University) ;
  • Keum, Hee-Jo (Department of media communication, Sungkyunkwan University)
  • Received : 2022.09.30
  • Accepted : 2022.10.13
  • Published : 2022.12.31

Abstract

The online portal platform provides online news with online comments, but the anonymity of comments causes incivility, and online comments are considered social problems. While there are many foreign language-based incivility detection studies, in-depth research is not being conducted in Korea since there has not been implemented Korean language dataset which is labeled detailed criteria of incivility. In this study, the incivility notation of comments was conducted in a total of 13 items, uncivil words were summarized. Furthermore, Attention algorithm was applied to each comment and summary to extract embedding vectors. 2-d CNN followed at the end to detect incivility in given data. As a result, we showed that the proposed algorithm is useful for anti-citizen detection such as name-calling and offensive tones. This study is expected to contribute to the formation of a healthy online comment culture by detecting uncivil comments which hinder democratic discourse.

온라인 포털 플랫폼은 뉴스 기사와 온라인 댓글을 제공하고 있으나, 온라인 댓글의 익명성은 반시민적 표현을 증가시켜 사회적 문제점으로 간주되고 있다. 댓글의 반시민성 검출 연구가 많이 이루어진 국외와 달리, 국내에서는 비시민성을 세분화한 한국어 데이터셋이 구현되지 않아 심도있는 연구가 이루어지지 못하였다. 본 연구에서는 댓글의 반시민성에 대한 라벨링을 총 13가지 항목으로 시행하였으며 반시민적 표현으로 요약하였다. 또한 어텐션 알고리즘을 이중으로 적용하여 임베딩 벡터를 추출하였고 이후 2-d CNN으로 반시민성 항목을 분류하였다. 그 결과, 제안한 알고리즘이 무례한 호칭 및 공격적 어조 등의 반시민성 검출에 유용하다는 것을 보여주었다. 본 연구는 민주적 담론을 저해하는 반시민적 댓글들을 탐지함으로써 건전한 온라인 댓글 문화 형성에 기여할 것으로 기대된다.

Keywords

Acknowledgement

This work was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MSIT) (No. 2021S1A5C2A02088387

References

  1. Hankook Research. Toxic Comments, is it okay? [Internet]. Available: https://hrcopinion.co.kr/archives/14589.
  2. K. Kenski, K. Coe, and S. A. Rains, "Perceptions of Uncivil Discourse Online: An Examination of Types and Predictors," Communication Research, vol. 47, no. 6, pp. 795-814, Apr. 2020. https://doi.org/10.1177/0093650217699933
  3. S. H. Lee, "Biased Artificial Intelligence: Analyzing the Types of Hate Speech Classified by 'Cleanbot', NAVER AI for Detecting Malicious Comments," Journal of Cybercommunication Academic Society, vol. 38, no. 4, pp. 33-75, Dec. 2021. https://doi.org/10.36494/JCAS.2021.12.38.4.33
  4. P. Rossini, "Beyond Incivility: Understanding Patterns of Uncivil and Intolerant Discourse in Online Political Talk," Communication Research, vol. 49 no. 3, pp. 399-425, May 2022. https://doi.org/10.1177/0093650220921314
  5. K. Coe, K. Kenski, and S. A. Rains, "Online and uncivil? Patterns and determinants of incivility in newspaper website comments," Journal of Communication, vol. 64, no. 4, pp. 658-679, Jun. 2014. https://doi.org/10.1111/jcom.12104
  6. Z. Papacharissi, "Democracy online: Civility, politeness, and the democratic potential of online political discussion groups," New media and society, vol. 6, no. 2, pp. 259-283, Apr. 2004. https://doi.org/10.1177/1461444804041444
  7. S. Wright and J. Street, "Democracy, deliberation and design: the case of online discussion forums," New media and society, vol. 9, no. 5, pp. 849-869, Oct. 2007. https://doi.org/10.1177/1461444807081230
  8. A. A. Anderson, D. Brossard, D. A. Scheufele, M. A. Xenos and P. Ladwig, "The nasty effect: Online incivility and risk perceptions of emerging technologies," Journal of computer-mediated communication, vol. 19, no. 3, pp. 373-387, Apr. 2014. https://doi.org/10.1111/jcc4.12009
  9. P. Borah, "Does it matter where you read the news story? Interaction of incivility and news frames in the political blogosphere," Communication Research, vol. 41, no. 6, pp. 809-827, Aug. 2014. https://doi.org/10.1177/0093650212449353
  10. S. Agarwal and A. Sureka, "A focused crawler for mining hate and extremism promoting videos on YouTube," In Proceedings of the 25th ACM conference on Hypertext and social media, pp. 294-296, Sep. 2014.
  11. B. T. Gervais, "Incivility online: Affective and behavioral reactions to uncivil political posts in a web-based experiment," Journal of Information Technology and Politics, vol. 12, no. 2, pp. 167-185, Jan. 2015. https://doi.org/10.1080/19331681.2014.997416
  12. G. M. Masullo and J. Kim, "Exploring "angry" and "like" reactions on uncivil Facebook comments that correct misinformation in the news," Digital Journalism, vol. 9, no. 8, pp. 1103-1122, Oct. 2021. https://doi.org/10.1080/21670811.2020.1835512
  13. A. Al-Hassan and H. Al-Dossari, "Detection of hate speech in social networks: a survey on multilingual corpus," In 6th International Conference on Computer Science and Information Technology, vol. 10, pp. 10-5121, Feb. 2019.
  14. J. H. Moon, W. I. Cho, and J. B. Lee, "Beep! Korean Corpus of Online News Comments for Toxic Speech Detection," In Proceeding of the 8th International Workshop on Natural Language Processing for Social Media, Taipei, 2020.
  15. A. Stoll, M. Ziegele and O. Quiring, "Detecting impoliteness and incivility in online discussions: Classification approaches for German user comments," Computational Communication Research, vol. 2, no. 1, pp. 109-134, Feb. 2020. https://doi.org/10.5117/CCR2020.1.005.KATH
  16. K. Coe, K. Kenski and S. A. Rains, "Online and Uncivil? Patterns and Determinants of Incivility in Newspaper Website Comments," Journal of Communication, vol. 64, no. 4, pp. 658-679, Jun. 2014. https://doi.org/10.1111/jcom.12104
  17. F. Sadeque, S. Rains, Y. Shmargad, K. Kenski, K. Coe and S. Bethard, "Incivility detection in online comments," in Proceedings of the eighth joint conference on lexical and computational semantics, pp. 283-291, 2019.
  18. K. B. Ozler, K. Kenski, S. Rains, Y. Shmargad, K. Coe, and S. Bethard, "Fine-tuning for multi-domain and multi-label uncivil language detection," in Proceedings of the Fourth Workshop on Online Abuse and Harms, Online, pp. 28-33, 2020.
  19. W. Liu, L. Li, Z. Huang, and Y. Liu, "Multi-lingual Wikipedia Summarization and Title Generation on Low Resource Corpus," in Proceedings of the Workshop MultiLing 2019: Summarization Across Languages, Genres and Sources, Varna, Bulgaria, pp. 17-25, 2019.
  20. J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, "Bert: Pre-training of Deep Bidirectional Transformers for Language Understanding," in Proceedings of NAACL-HLT 2019, Minneapolis: MN, USA, 2019.
  21. R. Kshirsagar, T. Cukuvac, K. McKeown, and S. McGregor, "Predictive Embeddings for Hate Speech Detection on Twitter," in Proceedings of the 2nd Workshop on Abusive Language Online (ALW2), Brussels, Belgium, pp. 26-32, 2018.
  22. W. Lee and H. Lee, "Bias & Hate Speech Detection Using Deep Learning: Multi-channel CNN Modeling with Attention," Journal of the Korea Institute Of Information and Communication Engineering, vol. 24, no. 12, pp. 1595-1603, Dec. 2020. https://doi.org/10.6109/JKIICE.2020.24.12.1595
  23. J. Hong, S. Kim, J. Park, and J. Choi, "A Malicious Comments Detection Technique on the Internet using Sentiment Analysis and SVM," Journal of the Korea Institute of Information and Communication Engineering, vol. 20, no. 2, pp. 260-267, Feb. 2016. https://doi.org/10.6109/JKIICE.2016.20.2.260
  24. Y. Kim, H. Kang, S. Han, and H. Jeong, "Swear Word Detection through Convolutional Neural Network," in Proceedings of the Annual Spring Conference of KIPS, vol. 28, no. 2, pp. 685-686, 2021.
  25. D. Bahdanau, K. Cho, and Y. Bengio, "Neural Machine Translation by Jointly Learning to Align and Translate," arXiv preprint arXiv:1409.0473, 2014.
  26. Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, Nov. 1998. https://doi.org/10.1109/5.726791
  27. Z. Liu, H. Huang, C. Lu, and S. Lyu, "Multichannel CNN with Attention for Text Classification," arXiv preprint arXiv:2006.16174, 2020.
  28. C. Quan, L. Hua, X. Sun, and W. Bai, "Multichannel Convolutional Neural Network for Biological Relation Extraction," BioMed Research International, vol. 2016, Article ID. 1850404, Dec. 2016.