DOI QR코드

DOI QR Code

Blurring of Swear Words in Negative Comments through Convolutional Neural Network

컨볼루션 신경망 모델에 의한 악성 댓글 모자이크처리 방안

  • 김유민 (전남대학교 공과대학 소프트웨어공학과) ;
  • 강효빈 (전남대학교 공과대학 소프트웨어공학과) ;
  • 한수현 (전남대학교 공과대학 소프트웨어공학과) ;
  • 정희용 (전남대학교 공과대학 소프트웨어공학과)
  • Received : 2022.01.12
  • Accepted : 2022.02.28
  • Published : 2022.04.30

Abstract

With the development of online services, the ripple effect of negative comments is increasing, and the damage of cyber violence is rising. Various methods such as filtering based on forbidden words and reporting systems prevent this, but it is challenging to eradicate negative comments. Therefore, this study aimed to increase the accuracy of the classification of negative comments using deep learning and blur the parts corresponding to profanity. Two different conditional training helped decide the number of deep learning layers and filters. The accuracy of 88% confirmed with 90% of the dataset for training and 10% for tests. In addition, Grad-CAM enabled us to find and blur the location of swear words in negative comments. Although the accuracy of classifying comments based on simple forbidden words was 56%, it was found that blurring negative comments through the deep learning model was more effective.

온라인 서비스의 발달로 악성 댓글의 파급력이 커져 사이버 폭력 피해가 극심해지고 있다. 이를 방지하기 위해 금칙어 기반 필터링, 신고제도 등 다양한 방법이 사용되고 있지만 악성 댓글을 완벽하게 근절하기는 어렵다. 본 연구는 딥러닝을 사용하여 악성 댓글의 분류의 정확도를 높이고 욕설에 해당하는 부분을 모자이크처리 처리하는 것을 목적으로 진행되었다. 정확도를 높이기 위해 컨볼루션의 층수, 필터 수를 다르게 설정하여 두 가지 모델링을 진행하여 비교하였고, 데이터 세트의 90%를 훈련 데이터로, 10%를 테스트 데이터로 사용한 결과 최종 88%의 정확도를 도출해 낼 수 있었다. 또한 Grad-CAM을 사용하여 모델이 댓글의 어느 부분을 결과에 반영하였는지 표시하여 욕설 위치 정보를 출력하였다. 단순 금칙어 기반으로 댓글을 분류한 정확도는 56%이지만, 컨볼루션 신경망에 의한 분류 정확도가 88%인 것과 비교하면 딥러닝 모델로 악성 댓글의 욕설을 처리하는 것이 더 효과적인 것을 확인할 수 있었다.

Keywords

Acknowledgement

본 논문은 김유민, 강효빈, 한수현의 2021년도 학사 학위 논문에서 발췌 정리하였음.

References

  1. Chen, Y., Shou, Y., Zhu, S. and Xu, H. (2012). Detecting Offensive Language in Social Media to Protect Adolescent Online Safety. In Privacy, Security, Risk and Trust, IEEE, pp. 71-80.
  2. Cho. Y. (2018). Detecting swear words through deep learning in NDC2018 (http://ndc.vod.nexoncdn.co.kr/NDC2018/slides/NDC2018_0033/inde x.html).
  3. Jeong, H., Ko, J. and Shin, C. (2021). Abnormal Detection with Microscope through Deep Learning. Journal of the Korean Industrial Information Systems Research, 26(2), pp. 1-10.
  4. Kim, J., Jo, H. and Kim, B. (2018). Game Recommendation System Based on User Ratings. Journal of the Korean Industrial Information Systems Research, 23(6), pp. 9-19.
  5. Kim, Y. (2014). Convolutional Neural Networks for Sentence Classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1746-1751.
  6. Kim, Y., Kang, H., Han, S. and Jeong, H. (2021). Swear Word Detection through Convolutional Neural Network, Proceedings of the Korea Information Processing Society Conference, pp. 685-686.
  7. Korea Game Industry Agency. (2008). Study on game language guidelines (https://www.korean.go.kr/attachFile/viewer/202202/3cc1548a-ef43-4043-9c5c-af95e3d39559. pdf.htm),(https://www.korean.go.kr/front/etcData/etcDataView.do?mn_id=208&etc_seq=121).
  8. Nguyen, T.P.H., Shin, C. and Jeong, H. (2021). Finding the Difference in Capillaries of Taste Buds between Smokers and Non-Smokers Using the Convolutional Neural Networks. Applied Sciences, 11(8). 3460 (https://doi.org/10.3390/app11083460).
  9. Nobata, C., Tetreault, J., Thomas, A., Mehdad, Y. and Chang, Y. (2016). Abusive Language Detection in Online User Content. In WWW, pp. 145-153.
  10. Park, S., Kim, H. and Woo, J. (2019). Abusive Setence Detection using Deep Learning in Online Game. Proceedings of the Korean Society of Computer Information Conference, 27(2), pp. 13-14.
  11. Ryu, M. and Cho, H. (2020). An Analysis of IoT Service using Sentiment Analysis on Online Reviews: Focusing on the Characteristics of Service Providers. Journal of the Korean Industrial Information Systems Research, 25(5), pp. 91-102.
  12. Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D. and Batra, D. (2019). Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization. International Journal of Computer Vision, 128, pp. 336-359. https://doi.org/10.1007/s11263-019-01228-7
  13. Seo, S. and Cho, S. (2017). Transfer Learning Method for Solving Imbalance Data of Abusive Sentence Classification. Journal of KIISE, 44(12), pp. 1275-1281. https://doi.org/10.5626/JOK.2017.44.12.1275
  14. Hong, Jinju. (2015). A malicious comments detection technique on the internet. Master Thesis, Soongsil University.