[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.3837/tiis.2022.01.017

Profane or Not: Improving Korean Profane Detection using Deep Learning

Woo, Jiyoung (Big Data Engineering Department, Soonchunhyang University)
Park, Sung Hee (Hanwha Investment & Securities Co)
Kim, Huy Kang (Graduate School of Information Security, Korea University)

Publication Information

KSII Transactions on Internet and Information Systems (TIIS) / v.16, no.1, 2022 , pp. 305-318 More about this Journal

Abstract

Abusive behaviors have become a common issue in many online social media platforms. Profanity is common form of abusive behavior in online. Social media platforms operate the filtering system using popular profanity words lists, but this method has drawbacks that it can be bypassed using an altered form and it can detect normal sentences as profanity. Especially in Korean language, the syllable is composed of graphemes and words are composed of multiple syllables, it can be decomposed into graphemes without impairing the transmission of meaning, and the form of a profane word can be seen as a different meaning in a sentence. This work focuses on the problem of filtering system mis-detecting normal phrases with profane phrases. For that, we proposed the deep learning-based framework including grapheme and syllable separation-based word embedding and appropriate CNN structure. The proposed model was evaluated on the chatting contents from the one of the famous online games in South Korea and generated 90.4% accuracy.

Keywords

Profanity; deep learning; convolutional neural network; text mining; natural language processing;

Citations & Related Records

Times Cited By KSCI : 3 (Citation Analysis)

Reference
Cited By KSCI

1	S.M Park, "2019 Cyber Report ③A pen is stronger than a sword, a keyboard is more dangerous than a gun," SISAWEEK, 68, Kyonggidae-ro, Seodaemun-gu, Seoul, Republic of Korea, 2019.
2	Y.R. Cho, "[NDC2018] Profanity detector that catches s-111 shots, making it with deep learning," 25, Pangyo-ro 256beon-gil, Bundang-gu, Seongnam-si, Gyeonggi-do, Republic of Korea, 2018.
3	Y. Chen,Y. Zhou, S. Zhu, H. Xu, "Detecting offensive language in social media to protect adolescent online safety," in Proc. of the 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing, pp.71-80, 2012.
4	K. Park, J. Lee, "Developing a Vulgarity Filtering System for Online Games using SVM," in Proc. of the Korean Institute of Information Scientists and Engineers conference, vol. 33, no. 2, pp. 260-263, 2006.
5	P. Ratadiya, D. Mishra, "An Attention Ensemble Based Approach for Multilabel Profanity Detection," in Proc. of the 2019 ICDMW, pp.544-550, 2019.
6	H.-P. Su, Z.-J. Huang, H.-T. Chang, C.-J. Lin, "Rephrasing profanity in chinese text," in Proc. of the First Workshop on Abusive Language Online, pp.18-24, 2017.
7	T. Chu, K. Jue, M. Wang, "Comment abuse classification with deep learning," Stanford University, 450 Serra Mall, Stanford, CA 94305, United States, 2016.
8	A.M. Founta, D. Chatzakou, N. Kourtellis, J. Blackburn, A. Vakali, I. Leontiadis, "A unified deep learning architecture for abuse detection," in Proc. of the 10th ACM Conference on Web Science, pp. 105-114, 2019.
9	Y. Kim, "Convolutional neural networks for sentence classification," arXiv preprint arXiv:1408.5882, 2014.
10	H.-S. Lee, H.-R. Lee, J.-U. Park, Y.-S. Han, "An abusive text detection system based on enhanced abusive and non-abusive word lists," Decision Support Systems, vol. 113, pp. 22-31, 2018. DOI
11	T.-J. Yoon, H.-G. Cho, "The Online Game Coined Profanity Filtering System by using Semi-Global Alignment," The Journal of the Korea Contents Association, vol. 9, no. 12, pp. 113-120, 2009. DOI
12	S.Y. Kim, J.Y. Lee, "Fact finding surveying adolescents's language and culture in online games and a countermeasure strategy," The Journal of Korean Association of Computer Education, vol. 16, no. 1, pp. 33-42, 2013. DOI
13	M. Song, H. Park, K.-s. Shin, "Attention-based long short-term memory network using sentiment lexicon embedding for aspect-level sentiment analysis in Korean," Information Processing & Management, vol. 56, no. 3, pp. 637-653, 2019. DOI
14	Q.-P. Nguyen, A.-D. Vo, J.-C. Shin, C.-Y. Ock, "Effect of word sense disambiguation on neural machine translation: A case study in Korean," IEEE Access, vol. 6, pp. 38512-38523, 2018. DOI
15	N. Djuric, J. Zhou, R. Morris, M. Grbovic, V. Radosavljevic, N. Bhamidipati, "Hate speech detection with comment embeddings," in Proc. of the 24th international conference on world wide web, pp. 29-30, 2015.
16	S. Seo, S.-B. Cho, "A Transfer Learning Method for Solving Imbalance Data of Abusive Sentence Classification," Journal of KIISE, vol. 44, no. 12, pp. 1275-1281, 2017. DOI
17	M. Martens, S. Shen, A. Iosup, F. Kuipers, "Toxicity detection in multiplayer online games," in Proc. of the 2015 International Workshop on NetGames, pp. 1-6, 2015.
18	Y. Kim, D. Ra, S. Lim, "Zero-anaphora resolution in Korean based on deep language representation model: BERT," ETRI Journal, vol. 43, no. 2, pp. 299-312, 2021. DOI
19	J.J. Lee, S.B. Kwon, S.M. Ahn, "Sentiment Analysis Using Deep Learning Model based on Phoneme-level Korean," Journal of Information Technology Services, vol. 17, no. 1, pp. 79-89, 2018. DOI
20	H. Yenala, A. Jhanwar, M.K. Chinnakotla, J. Goyal, "Deep learning for detecting inappropriate content in text," International Journal of Data Science and Analytics, vol. 6, no. 4, pp. 273-286, 2018. DOI
21	C. Nobata,J. Tetreault, A. Thomas, Y. Mehdad, Y. Chang, "Abusive language detection in online user content," in Proc. of the 25th international conference on world wide web, pp.145-153, 2016.
22	S. Sood, J. Antin, E. Churchill, "Profanity use in online communities," in Proc. of the SIGCHI Conference on Human Factors in Computing Systems, pp.1481-1490, 2012.