Search | Korea Science

Token Classification for Detecting Modified Profanity (변형된 비속어 탐지를 위한 토큰 분류)

Sung-Min Ko;Youhyn Shin
- Proceedings of the Korea Information Processing Society Conference
- /
- 2023.11a
- /
- pp.498-499
- /
- 2023
비속어 탐지 기법으로 주로 사용되는 비속어 데이터베이스 활용 방식 혹은 문장 자체를 혐오, 비혐오로 분류하는 방식은 변형된 비속어 탐지에 어려움이 있다. 본 논문에서는 자연어 처리 태스크 중 하나인 개체명 인식 방법에서 착안하여 시퀀스 레이블링 기반의 비속어 탐지 방법을 제안한다. 한국어 악성 댓글 중 비속어 부분에 대해 레이블링 된 데이터셋을 구축하여 실험을 진행하고, 이를 통해 F1-Score 약 0.88 의 결과를 보인다.
https://doi.org/10.3745/PKIPS.y2023m11a.498 인용 PDF

Usage Analysis of Swearing Words on Web Board and Proposal of Problems Resolution Method (웹 게시판에서 비속어사용실태와 문제 해결 방안의 제시)

조동욱
- The Journal of the Korea Contents Association
- /
- v.3 no.4
- /
- pp.1-10
- /
- 2003
Recently, usage of swearing words on web board is the most typical Internet negative-functions. For this, technical method is proposed for blocking swearing words or sentences by analyzing swearing words usage types and behaviors. This system consists of 3 steps. Firstly, a survey, analysis of swearing words on web board and algorithm proposal for blocking these words must be studied. Secondly, sufficient and concrete opinion researches about every generations for measuring swearing degree must be accomplished. Finally, implementation on web board by programming will be done. This paper, in the first, deals with usage analysis of swearing and algorithm developement for solving these problems.
PDF

Design and Implementation of a Slang Remover Program on Web board (웹 게시판 비속어 처리 프로그램의 설계 및 구현)

Cho, Ah-Young;Ock, Cheol-Young
- Proceedings of the Korea Information Processing Society Conference
- /
- 2001.10b
- /
- pp.1075-1078
- /
- 2001
현재까지 게시판의 비속어 처리프로그램은 비속어를 발견하면 입력을 할 수 없도록 차단하는 차단식 프로그램이었다. 이런 프로그램은 사이버 상의 의사표현의 자유를 차단한다. 또한 어떤 단어의 경우는 비속어가 아닌데도 차단되어 입력을 원천봉쇄하기도 한다. 그래서 비속어를 차단하지 않고 처리해 주며 신생 비속어도 처리를 쉽게 해 주며 검출에 유연성을 제공하는 프로그램이 필요하다. 본 논문에서는 데이터베이스 상에서 구현된 게시판을 대상으로 비차단식, 유연성이 있는 비속어 추출 프로그램을 설계하고 구현하였다.
PDF

Swearword Detection Method Considering Meaning of Words and Sentences (단어와 문장의 의미를 고려한 비속어 판별 방법)

Yi, Moung Ho;Lim, Myung Jin;Shin, Ju Hyun
- Smart Media Journal
- /
- v.9 no.3
- /
- pp.98-106
- /
- 2020
Currently, as Internet users increase, the use of swearword is indiscriminately increasing. As a result, cyber violence among teenagers is increasing very seriously, and among them, cyber-language violence is the most serious. In order to eradicate cyber-language violence, research on detection of swearword has been conducted, but the method of detecting swearword by looking at the meaning of words and the flow of context is insufficient. Therefore,in this paper,we propose a method of detecting swearword using FastText model and LSTM model so that deliberately modified swearword and standard language can be accurately detected by looking at the flow of context.
https://doi.org/10.30693/SMJ.2020.9.3.98 인용 PDF KSCI

Design and Implementation of a Swearing Remover Program on Web board (웹 게시판 비속어 처리 프로그램의 설계 및 구현)

조아영
- Journal of the Korea Computer Industry Society
- /
- v.2 no.10
- /
- pp.1317-1328
- /
- 2001
The existing swearing remover programs could not have blocked even slightly transformed swearings because of their input blocking properties. To overcome these defects, this paper implemented a supervising program which analysize and remove/replace swearings on web board. For this purpose this paper first classified the patterns of swearings on web board and then implemented a tokenizer which can analysize those patterns. The module tokenizing and removing/replacing swearings on each web board was implemented as a thread so that it could be parallely controlled. As a result of running this Program on some web boards , we found out it had detected almost of the swearings as 91.9% of recall but it could not meet our purpose sufficiently on morphological transformed swearings and swearings in context. So the studies will be continued about processing on morphological ambiguous words, ambiguous words in meaning and sweaings in context by extracting this program's manual mode. We expect this program could induce the users to proper usage of words and replace the manual works of web board managers in schools, public bodies, broadcasting stations etc.
PDF

Developing a Vulgarity Filtering System for Online Games using SVM (SVM을 이용한 온라인게임 비속어 필터링 시스템)

Park, Kyo-Hyeon;Lee, Jee-Hyong
- Proceedings of the Korean Information Science Society Conference
- /
- 2006.10b
- /
- pp.260-263
- /
- 2006
최근 온라인 게임 산업이 커짐에 따라 이를 즐기는 유저도 급증하고 있다. 온라인 게임에서는 일반적으로 유저들이 서로를 구분하기 위해 사용하는 사용자 이름과 상호간 의사소통을 하기 위한 채팅을 지원한다. 유저의 수가 증가함에 따라 대화의 양은 더욱 더 많아지고, 선정성, 폭력성을 띄는 언어의 문제로 이어지고 있다. 이는 특히 18세 이하도 이용가능한 게임을 만드는 경우 더욱 중요하다. 하지만 대부분의 게임들이 금지어 리스트에 따른 단어 매칭방식의 비속어 필터링만을 제공하고 있다. 이러한 방법은 금지어로 지정된 단어를 포함한 정상적인 채팅도 막을 뿐만 아니라 일부 음절을 다른 기호로 바꾸어 표기한 비속어는 걸러내지 못한다. 변형된 단어들을 충분히 처리하지 못한다면 비속어 필터링 시스템은 단지 무력하고 쓸모없는 존재가 될 뿐이다. 본 논문에서는 SVM을 이용하여 학습이 가능한 비속어 필터링 시스템을 제안하고자 한다. SVM을 이용하면 사용자 편의성을 해치지 않고서도 보다 많은 종류의 비속어들을 효과적으로 걸러낼 수 있다.
PDF

Token-Based Classification and Dataset Construction for Detecting Modified Profanity (변형된 비속어 탐지를 위한 토큰 기반의 분류 및 데이터셋)

Sungmin Ko;Youhyun Shin
- The Transactions of the Korea Information Processing Society
- /
- v.13 no.4
- /
- pp.181-188
- /
- 2024
Traditional profanity detection methods have limitations in identifying intentionally altered profanities. This paper introduces a new method based on Named Entity Recognition, a subfield of Natural Language Processing. We developed a profanity detection technique using sequence labeling, for which we constructed a dataset by labeling some profanities in Korean malicious comments and conducted experiments. Additionally, to enhance the model's performance, we augmented the dataset by labeling parts of a Korean hate speech dataset using one of the large language models, ChatGPT, and conducted training. During this process, we confirmed that filtering the dataset created by the large language model by humans alone could improve performance. This suggests that human oversight is still necessary in the dataset augmentation process.
https://doi.org/10.3745/TKIPS.2024.13.4.181 인용 PDF

Swear Word Detection and Unknown Word Classification for Automatic English Writing Assessment (영작문 자동평가를 위한 비속어 검출과 미등록어 분류)

Lee, Gyoung;Kim, Sung Gwon;Lee, Kong Joo
- KIPS Transactions on Software and Data Engineering
- /
- v.3 no.9
- /
- pp.381-388
- /
- 2014
In this paper, we deal with implementation issues of an unknown word classifier for middle-school level English writing test. We define the type of unknown words occurred in English text and discuss the detection process for unknown words. Also, we define the type of swear words occurred in students's English writings, and suggest how to handle this type of words. We implement an unknown word classifier with a swear detection module for developing an automatic English writing scoring system. By experiments with actual test data, we evaluate the accuracy of the unknown word classifier as well as the swear detection module.
https://doi.org/10.3745/KTSDE.2014.3.9.381 인용 PDF KSCI

Design and Implementation of Profanity Filtering Chat Program Based on Deep Learning (딥러닝 기반 비속어 필터링 채팅 프로그램 설계 및 구현)

Lee, Geon-Hwan;Park, Joo-Chan;Choi, Dong-won;Lee, Yeon-Gyeong;Choi, Ho-Bin;Han, Youn-Hee
- Proceedings of the Korea Information Processing Society Conference
- /
- 2019.10a
- /
- pp.998-1001
- /
- 2019
최근에 게임이나 채팅 프로그램 내에서의 비속어 필터링은 금칙어 기반으로 운영되고 있다. 하지만 금칙어 기반의 프로그램은 여러 한계점을 보이며, 따라서, 본 논문에서는 'Text-CNN'을 활용한 딥러닝 기법에 기반하여 비속어 필터링 프로그램을 제안한다. 데이터의 자질을 '자모' 단위로 전처리하여 학습시키고 어느 부분이 비속어인지 검출하여 마스킹 처리하는 'LIME 알고리즘'을 사용하여 우리의 프로그램을 이용하는 사용자들에게 바른 언어습관을 지향하며 더 나아가 올바른 인터넷 문화를 조성할 수 있도록 필터링 채팅 프로그램을 제안한다.
https://doi.org/10.3745/PKIPS.y2019m10a.998 인용 PDF

A Swearword Filter System for Online Game Chatting (온라인게임 채팅에서의 비속어 차단시스템)

Lee, Song-Wook
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.15 no.7
- /
- pp.1531-1536
- /
- 2011
We propose an automatic swearword filter system for online game chatting by using Support Vector Machines(SVM). We collected chatting sentences from online games and tagged them as normal sentences or swearword included sentences. We use n-gram syllables and lexical-part of speech (POS) tags of a word as features and select useful features by chi square statistics. Each selected feature is represented as binary weight and used in training SVM. SVM classifies each chatting sentence as swearword included one or not. In experiment, we acquired overall 90.4% of F1 accuracy.
https://doi.org/10.6109/jkiice.2011.15.7.1531 인용 PDF KSCI

Search Result 40, Processing Time 0.032 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)