• Title/Summary/Keyword: 욕설

Search Result 46, Processing Time 0.028 seconds

Swear Word Detection through Convolutional Neural Network (딥러닝 기반 욕설 탐지)

  • Kim, Yumin;Gang, Hyobin;Han, Suhyeun;Jeong, Hieyong
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2021.11a
    • /
    • pp.685-686
    • /
    • 2021
  • 개인의 소셜미디어 활동이 활발해지면서 익명성을 악용하여 타인에게 욕설을 주저없이 해버리는 사용자가 늘고 있다. 본 연구는 욕설이 난무하는 채팅창에서 욕설 데이터를 크롤링하여 데이터셋을 구축하여 컨볼루션 네트워크로 학습시켰을 때 욕설을 탐지하고, 전체 문장에서 그 탐지한 욕설의 위치를 파악하여 블러링 처리를 할 수 있는지를 확인하는 것을 목적으로 한다. 전처리 작업으로 한글과 공백을 제외하고 형태소 단위로 토큰화한 후 불용어를 제거해서 패딩처리를 하였다. 학습 모델로는 1차원 컨볼루션을 사용하여 수집한 데이터의 80%를 훈련에 사용하고 나머지 20%를 테스트에 사용하였다. 키워드를 이용한 단순 분류 모델과 비교하였을 때, 본 연구에서 이용한 모델이 약 14% 정확도가 향상된 것을 확인할 수 있었다. 테스트에서 전체 문장에서 욕설이 포함되었을 때 욕설과 그 위치 정보를 잘 획득하는 것도 확인할 수 있었다.

A Study on Automatic Classification of Profanity Sentences of Elementary School Students Using BERT (BERT를 활용한 초등학교 고학년의 욕설문장 자동 분류방안 연구)

  • Shim, Jaekwoun
    • Journal of Creative Information Culture
    • /
    • v.7 no.2
    • /
    • pp.91-98
    • /
    • 2021
  • As the amount of time that elementary school students spend online increased due to Corona 19, the amount of posts, comments, and chats they write increased, and problems such as offending others' feelings or using swear words are occurring. Netiquette is being educated in elementary school, but training time is insufficient. In addition, it is difficult to expect changes in student behavior. So, technical support through natural language processing is needed. In this study, an experiment was conducted to automatically filter profanity sentences by applying them to a pre-trained language model on sentences written by elementary school students. In the experiment, chat details of elementary school 4-6 graders were collected on an online learning platform, and general sentences and profanity sentences were trained through a pre-learned language model. As a result of the experiment, as a result of classifying profanity sentences, it was analyzed that the precision was 75%. It has been shown that if the learning data is sufficiently supplemented, it can be sufficiently applied to the online platform used by elementary school students.

Analyzing the phenomenon of misogyny in online community (온라인 커뮤니티상에 나타난 여성혐오 현상 분석)

  • Lee, Ji-hyun;Woo, JiYoung
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2019.07a
    • /
    • pp.27-28
    • /
    • 2019
  • 본 논문에서는 한국 사회에 특유의 폭력성과 선정성으로 인해 큰 충격을 주고 있는 인터넷 커뮤니티 사이트 '일간 베스트' 글에 나타난 욕설과 여성 혐오에 대해 분석하고자 한다. 데이터는 일베 게시판에 올라온 게시글 2,000개를 웹 크롤링하여 수집하였으며, 수집한 게시글에 게임 내 금칙어 리스트와 여성 지칭어 사전을 기반으로 욕설 여부와 여성 지칭어를 태깅하였다. 태깅하여 분석한 결과 여성 지칭어를 사용한 게시글에는 욕설을 사용하는 글이 전체의 60.52%로 많았으며 욕설을 사용하지 않은 게시글에도 범행, 살해, 김치녀 등의 부정적인 단어가 많은 것을 볼 수 있었다.

  • PDF

Abusive Sentence Detection using Deep Learning in Online Game (딥러닝를 사용한 온라인 게임에서의 욕설 탐지)

  • Park, Sunghee;Kim, Huy Kang;Woo, Jiyoung
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2019.07a
    • /
    • pp.13-14
    • /
    • 2019
  • 욕설은 게임 내 가장 큰 불쾌 요소 중 하나이다. 지금까지 게임 사용자들의 욕설을 방지하기 위해서 금칙어를 기반으로 필터링 해왔으나, 한국어 특성상 단어를 변형하거나 중간에 숫자를 넣는 등 우회할 방법이 다양하기 때문에 효과적이지 않다. 따라서 본 논문에서는 실제 온라인 게임 'Archeage'에서 수집된 채팅 데이터를 기반으로 딥러닝 기법 중 하나인 콘볼루션 신경망을 사용하여 욕설을 탐지하는 모델을 구축하였다. 한글의 자음, 모음을 분리하여 실험하였을 때, 87%라는 정확도를 얻었다. 한 글자씩 분리한 경우, 조금 더 좋은 정확도를 얻었으나, 사전의 수가 자소를 분리한 경우보다 10배 이상 늘어난 것을 고려해보면 자소를 분리한 것이 더 효율적이다.

  • PDF

The Influence of Mother's and Father's Conflict Resolution Styles on Adolescents' Use of Swear Words: The Mediating Role of Aggression (부와 모의 갈등해결양식이 청소년의 욕설사용에 미치는 영향: 공격성의 매개역할)

  • Lee, Bohyun;Lee, Eunhee
    • The Journal of the Convergence on Culture Technology
    • /
    • v.4 no.2
    • /
    • pp.107-114
    • /
    • 2018
  • The study is to find out the influence of mother's and father's conflict resolution styles(aggressive and compromising) on adolescents' use of swear words. This study also investigates whether aggression has a mediated effect in terms of the relationship between mother's and father's conflict resolution styles and their children's use of swear words. To this end, self-report type of questionnaire was conducted to 570 students who attend at 6 different middle schools located in Gyeongnam Province. To the exclusion of incomplete and insincere answers, 477 were selected as the raw data of the research. The summarization of the results is as follows: First, the aggressive type of conflict resolution style with mothers has positive correlation with the students' use of swear words. When the conflict resolution style with mothers gets aggressive, their children's use of swear word increases accordingly. Second, it is confirmed that aggression has a mediated effect when it comes to teenagers' use of swear words triggered by mother's aggressive conflict resolution styles and father's aggressive conflict resolution styles. Therefore, if the conflict between children and parents is not appropriately resolved, the children's aggression accumulates and thereby children's use of swear words increases.

The Online Game Coined Profanity Filtering System by using Semi-Global Alignment (반 전역 정렬을 이용한 온라인 게임 변형 욕설 필터링 시스템)

  • Yoon, Tai-Jin;Cho, Hwan-Gue
    • The Journal of the Korea Contents Association
    • /
    • v.9 no.12
    • /
    • pp.113-120
    • /
    • 2009
  • Currently the verbal abuse in text message over on-line game is so serious. However we do not have any effective policy or technical tools yet. Till now in order to cope with this problem, the online game service providers have accumulated a set of forbidden words and applied this list on the textual word used in on-line game, which is called 'Swear filter'. But young on-line game players easily avoid this filtering method by coining another words which is not kept in the list. Especially Korean is very easy to make new variations of a vulgar word. In this paper, we propose one smart filtering algorithm to identify newly coined profanities. Important features of our method include the canonical form transformation of coined profanities, semi-global alignment between in the level of consonant and vowel units. For experiment, we have collected more than 1000 newly coined vulgar words in on-line gaming sites and tested these word against our methods. where our system have successfully filtered more than 90% of those newly coined vulgar words.

A Transfer Learning Method for Solving Imbalance Data of Abusive Sentence Classification (욕설문장 분류의 불균형 데이터 해결을 위한 전이학습 방법)

  • Seo, Suin;Cho, Sung-Bae
    • Journal of KIISE
    • /
    • v.44 no.12
    • /
    • pp.1275-1281
    • /
    • 2017
  • The supervised learning approach is suitable for classification of insulting sentences, but pre-decided training sentences are necessary. Since a Character-level Convolution Neural Network is robust for each character, so is appropriate for classifying abusive sentences, however, has a drawback that demanding a lot of training sentences. In this paper, we propose transfer learning method that reusing the trained filters in the real classification process after the filters get the characteristics of offensive words by generated abusive/normal pair of sentences. We got higher performances of the classifier by decreasing the effects of data shortage and class imbalance. We executed experiments and evaluations for three datasets and got higher F1-score of character-level CNN classifier when applying transfer learning in all datasets.

Abusive Detection Using Bidirectional Long Short-Term Memory Networks (양방향 장단기 메모리 신경망을 이용한 욕설 검출)

  • Na, In-Seop;Lee, Sin-Woo;Lee, Jae-Hak;Koh, Jin-Gwang
    • The Journal of Bigdata
    • /
    • v.4 no.2
    • /
    • pp.35-45
    • /
    • 2019
  • Recently, the damage with social cost of malicious comments is increasing. In addition to the news of talent committing suicide through the effects of malicious comments. The damage to malicious comments including abusive language and slang is increasing and spreading in various type and forms throughout society. In this paper, we propose a technique for detecting abusive language using a bi-directional long short-term memory neural network model. We collected comments on the web through the web crawler and processed the stopwords on unused words such as English Alphabet or special characters. For the stopwords processed comments, the bidirectional long short-term memory neural network model considering the front word and back word of sentences was used to determine and detect abusive language. In order to use the bi-directional long short-term memory neural network, the detected comments were subjected to morphological analysis and vectorization, and each word was labeled with abusive language. Experimental results showed a performance of 88.79% for a total of 9,288 comments screened and collected.

  • PDF

Unethical Expressions in Messenger Talks for Interactive Artificial Intelligence (대화형 인공지능을 위한 메신저 대화의 비윤리적 표현 연구)

  • Yelin Go;Kilim Nam;Hyunju Song
    • Annual Conference on Human and Language Technology
    • /
    • 2022.10a
    • /
    • pp.22-25
    • /
    • 2022
  • 본 연구는 대화형 인공지능이 비윤리적 표현을 학습하거나 생성하는 것을 방지하기 위한 기초적 연구로, 메신저 대화에 나타나는 단어 단위, 구 단위 이상의 비윤리적 표현을 수집하고 그 특성을 분석하였다. 비윤리적 표현은 '욕설, 혐오 및 차별 표현, 공격적 표현, 성적 표현'이 해당된다. 메신저 대화에 나타난 비윤리적 표현은 욕설이 가장 많은 비중을 차지했는데, 욕설에서는 비표준형뿐만 아니라 '존-', '미치다' 등과 같이 맥락을 고려하여 판단해야 하는 경우가 있다. 가장 높은 빈도로 나타난 욕설 '존나류, 씨발류, 새끼류'의 타입-토큰 비율(TTR)을 확인한 결과 '새끼류'의 TTR이 가장 높게 나타났다. 다음으로 메신저 대화에서는 공격적 표현이나 성적인 표현에 비해 혐오 및 차별 표현의 비중이 높았는데, '국적/인종'과 '젠더' 관련된 혐오 및 차별 표현이 특히 높게 나타났다. 혐오 및 차별 표현은 단어 단위보다는 구 단위 이상의 표현의 비중이 높았고 문장 단위로 떨어지기 보다는 대화 전체에 걸쳐 나타나는 것을 확인하였다. 따라서 혐오 및 차별 표현을 탐지하기 위해서는 단어 단위보다는 구 단위 이상 표현의 탐지에 대한 필요성이 있음을 학인하였다.

  • PDF

Exploring Types of Verbal Violence Through Speech Analysis on Non-facing Channels (비대면 채널에서의 음성분석을 통한 언어폭력 유형 탐색)

  • Kim, Jongseon;Ahn, Seongjin
    • The Journal of Korean Association of Computer Education
    • /
    • v.23 no.3
    • /
    • pp.71-79
    • /
    • 2020
  • This study investigates the rising issue of verbal violence at non-facing channels. Focus Group Interview(FGI) was conducted to examine verbal violence occurred during emotional labors in real-life cases. In addition, the distribution of verbal violence in the conversation was confirmed through a new big data technology called Speech Analysis(SA). The result findings highlighted the two perspectives as below. First, verbal violence occurred through calls, is classified into personal insult, swearing/verbal abuse, unreasonable demand, (sexual) harassment and intimidation/threat. Second, Speech Analysis result exhibited the most frequently appeared verbal violence were personal insult and swearing/verbal abuse. Informal language use and speaking in disrespectable manner was the highest rate in personal insult category. Moreover general cursing was the highest rate in swearing/verbal abuse category. In particular, the rate of using curse language was the highest in overall cases of verbal violence. This study summarizes the types of verbal violence that occur in non-facing channels and suggests a need for further investigation on how verbal stress affects working environment for emotional labor.