DOI QR코드

DOI QR Code

BERT를 활용한 초등학교 고학년의 욕설문장 자동 분류방안 연구

A Study on Automatic Classification of Profanity Sentences of Elementary School Students Using BERT

  • 심재권 (고려대학교 영재교육원)
  • 투고 : 2021.03.04
  • 심사 : 2021.05.31
  • 발행 : 2021.05.30

초록

코로나19로 인해 초등학생이 온라인 환경에 머무는 시간이 증가함에 따라 작성하는 게시글, 댓글, 채팅의 양이 증가하였고, 타인의 감정을 상하게 하거나 욕설을 하는 등의 문제가 발생하고 있다. 네티켓을 초등학교에서 교육하고 있지만, 교육시간이 부족할 뿐 아니라 행동의 변화까지 기대하기는 어려움이 있어 자연어처리를 통한 기술적인 지원이 필요한 상황이다. 본 연구는 초등학생이 작성하는 문장에 사전언어학습 모델에 적용하여 자동으로 욕설문장을 필터링하는 실험을 진행하였다. 실험은 온라인 학습 플랫폼에서 초등학교 4-6학년의 채팅내역을 수집하였고, 채팅 내역중에 욕설로 신고되어 판정된 욕설문장을 함께 수집하여 사전학습된 언어모델을 통해 훈련하였다. 실험결과, 욕설문장을 분류한 결과 75%의 정확률을 보이는 것으로 분석되어 학습 데이터가 충분히 보완된다면, 초등학생이 사용하는 온라인 플랫폼에서 적용할 수 있음을 보여주었다.

As the amount of time that elementary school students spend online increased due to Corona 19, the amount of posts, comments, and chats they write increased, and problems such as offending others' feelings or using swear words are occurring. Netiquette is being educated in elementary school, but training time is insufficient. In addition, it is difficult to expect changes in student behavior. So, technical support through natural language processing is needed. In this study, an experiment was conducted to automatically filter profanity sentences by applying them to a pre-trained language model on sentences written by elementary school students. In the experiment, chat details of elementary school 4-6 graders were collected on an online learning platform, and general sentences and profanity sentences were trained through a pre-learned language model. As a result of the experiment, as a result of classifying profanity sentences, it was analyzed that the precision was 75%. It has been shown that if the learning data is sufficiently supplemented, it can be sufficiently applied to the online platform used by elementary school students.

키워드

과제정보

본 논문은 2020년도 정부(교육부)의 재원으로 한국연구재단의 지원을 받아 수행된 연구임.(2020R1I1A1A01058353)

참고문헌

  1. J. Lee, J. Eune, "A Study on the Preference Factors of KakaoTalk Emoticon", Cartoon & Animation Studies, no.51, pp.361-390, 2018. (in Korean)
  2. S. Suh, "A Study on way to Promote Learners' Participation in Real-Time Distance Education", Journal of Creative Information Culture, vol.6, no.3, pp.149-158, 2020. (in Korean) https://doi.org/10.32823/JCIC.6.3.202012.149
  3. S. Ahn, "Analysis of Changes in the Actual Condition of Distance Learning due to COVID-19", Journal of Creative Information Culture, vol.6, no.3, pp.189-197, 2020. (in Korean) https://doi.org/10.32823/JCIC.6.3.202012.189
  4. A. Cho, J. Lee, "An Exploratory Study on Vicious Cyber Replies", The Korea Journal of Youth Counseling, vol.18, no.2, pp.117-131, 2010. (in Korean) https://doi.org/10.35151/kyci.2010.18.2.008
  5. S. Lee, "A Swearword Filter System for Online Game Chatting", Journal of the Korea Institute of Information and Communication Engineering, vol.15, no.7, pp.1531-1536, 2011. (in Korean) https://doi.org/10.6109/JKIICE.2011.15.7.1531
  6. I. Na, S. Lee, J. Lee, J. Koh, "Abusive Detection Using Bidirectional Long Short-Term Memory Networks", The Korea Journal of BigData, vol.4, no.2, pp.35-45, 2019. (in Korean) https://doi.org/10.36498/KBIGDT.2019.4.2.35
  7. D. Jacob, M. Chang, K. Lee and K. Toutanova. "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding," ArXiv: abs/1810.04805v2, 2019.
  8. A. Radford, K. Narasimhan, T. Salimans, I. Sutskever, "Improving language understanding by generative pre-training". [Online]. Avaliable: https://www.cs.ubc.ca/~amuham01/LING530/papers/radford2018improving.pdf (visited March. 2021).
  9. E. Jang, H. Choi, H. Lee, "Stock prediction using combination of BERT sentiment Analysis and Macro economy index", Journal of The Korea Society of Computer and Information, vol.25 no.5, pp.47-56, 2020. (in Korean) https://doi.org/10.9708/JKSCI.2020.25.05.047
  10. A. Vaswani, N.Shazeer, N. Parmar, J. Uszkoreit, L.Jones, A.-N. Gomez, L. Kaiser, I. Polosukhin, "Attention is all you need", Advances in Neural Information Processing Systems, pp.5998-6008, 2017.
  11. M. E. Peters, M. Neumann, M. lyyer, M. Gardner, "Deep contextualized word representations", arXivpreprint arXiv:1802.05365, 2018.
  12. Y. Wu, M. Schuster, Z. Chen, Q.-V. Le, M. Norouzi, W. Macherey, M. Krikun, Y. Cao, Q. Gao, K. Macherey, J. Klingner, A. Shah, M. Johnson, X. Liu, L. Kaiser, S. Gouws, Y. Kato, T. Kudo, H. Kazawa, K. Stevens, G. Kurian, N. Patil, W. Wang, C. Young, J. Smith, J. Riesa, A. Rudnick, O. Vinyals, G. Corrado, M. Hughes, J. Dean, "Google's neural machine translation system: Bridging the gap between human and machine translation", arXivpreprint arXiv:1609.08144, 2016.
  13. Y. Choi, K. Lee, "Performance Analysis of Korean Morphological Analyzer based on Transformer and BERT", Journal of The Korean Institute of Information Scientists and Engineers, vol.47, no.8, pp.730-741, 2020. (in Korean)
  14. Y. Chun, "A Study on the Validation of Tr ansfer Learning Effect Using BERT Transformer and Deep Learning : Application of Legal Consultation Data Classification Problem", Journal of the Korea Management Engineers Society, vol.24 no.4, pp.77-89, 2019. (in Korean) https://doi.org/10.35373/kmes.24.4.5
  15. S. Hwang, D. Kim, "BERT-based Classification Model for Korean Documents", The Jounal of Society for e-Business Studies, vol.25, no.1, pp.203-214, 2020. (in Korean)
  16. S. Seo, S. Cho, "A Transfer Learning Method for Solving Imbalance Data of Abusive Sentence Classification", Journal of Korea Information Science Society, vol.44, no.12, pp.1275-1281, 2017. (in Korean)
  17. Sun C., Qiu X., Xu Y., Huang X., "How to Fine-Tune BERT for Text Classification?", Chinese Computational Linguistics Lecture Notes in Computer Science, vol.11856, pp.194-206, 2019. https://doi.org/10.1007/978-3-030-32381-3_16