• Title/Summary/Keyword: Natural language process

Search Result 240, Processing Time 0.036 seconds

Token-Based Classification and Dataset Construction for Detecting Modified Profanity (변형된 비속어 탐지를 위한 토큰 기반의 분류 및 데이터셋)

  • Sungmin Ko;Youhyun Shin
    • The Transactions of the Korea Information Processing Society
    • /
    • v.13 no.4
    • /
    • pp.181-188
    • /
    • 2024
  • Traditional profanity detection methods have limitations in identifying intentionally altered profanities. This paper introduces a new method based on Named Entity Recognition, a subfield of Natural Language Processing. We developed a profanity detection technique using sequence labeling, for which we constructed a dataset by labeling some profanities in Korean malicious comments and conducted experiments. Additionally, to enhance the model's performance, we augmented the dataset by labeling parts of a Korean hate speech dataset using one of the large language models, ChatGPT, and conducted training. During this process, we confirmed that filtering the dataset created by the large language model by humans alone could improve performance. This suggests that human oversight is still necessary in the dataset augmentation process.

The Theory of Linguistic Semantic Interpretation Rule using Fuzzy Definition (퍼지 논리를 이용한 컴퓨터 언어해석 구현 규칙의 이용법)

  • 진현수
    • Proceedings of the IEEK Conference
    • /
    • 2003.11b
    • /
    • pp.227-230
    • /
    • 2003
  • We can not distinguish semantism of the feature of the current language “big”, “small”, “beautiful”. But we study artificial linguistic interface work and convert natural language to digital binary linguistic theory, we should define the basical conversion process. When we utilize the sum of product fuzzy theory and the visible numerical value, we can establish reasoning rule of input language. Fuzzy theory should be converted to general resulting rule.

  • PDF

An Application of RASA Technology to Design an AI Virtual Assistant: A Case of Learning Finance and Banking Terms in Vietnamese

  • PHAM, Thi My Ni;PHAM, Thi Ngoc Thao;NGUYEN, Ha Phuong Truc;LY, Bao Tuyen;NGUYEN, Truc Linh;LE, Hoanh Su
    • The Journal of Asian Finance, Economics and Business
    • /
    • v.9 no.5
    • /
    • pp.273-283
    • /
    • 2022
  • Banking and finance is a broad term that incorporates a variety of smaller, more specialized subjects such as corporate finance, tax finance, and insurance finance. A virtual assistant that assists users in searching for information about banking and finance terms might be an extremely beneficial tool for users. In this study, we explored the process of searching for information, seeking opportunities, and developing a virtual assistant in the first stages of starting learning and understanding Vietnamese to increase effectiveness and save time, which is also an innovative business practice in Use-case Vietnam. We built the FIBA2020 dataset and proposed a pipeline that used Natural Language Processing (NLP) inclusive of Natural Language Understanding (NLU) algorithms to build chatbot applications. The open-source framework RASA is used to implement the system in our study. We aim to improve our model performance by replacing parts of RASA's default tokenizers with Vietnamese tokenizers and experimenting with various language models. The best accuracy we achieved is 86.48% and 70.04% in the ideal condition and worst condition, respectively. Finally, we put our findings into practice by creating an Android virtual assistant application using the model trained using Whitespace tokenizer and the pre-trained language m-BERT.

A Process-Centered Knowledge Model for Analysis of Technology Innovation Procedures

  • Chun, Seungsu
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.3
    • /
    • pp.1442-1453
    • /
    • 2016
  • Now, there are prodigiously expanding worldwide economic networks in the information society, which require their social structural changes through technology innovations. This paper so tries to formally define a process-centered knowledge model to be used to analyze policy-making procedures on technology innovations. The eventual goal of the proposed knowledge model is to apply itself to analyze a topic network based upon composite keywords from a document written in a natural language format during the technology innovation procedures. Knowledge model is created to topic network that compositing driven keyword through text mining from natural language in document. And we show that the way of analyzing knowledge model and automatically generating feature keyword and relation properties into topic networks.

Recognizing Emotional Content of Emails as a byproduct of Natural Language Processing-based Metadata Extraction (이메일에 포함된 감성정보 관련 메타데이터 추출에 관한 연구)

  • Paik, Woo-Jin
    • Journal of the Korean Society for information Management
    • /
    • v.23 no.2
    • /
    • pp.167-183
    • /
    • 2006
  • This paper describes a metadata extraction technique based on natural language processing (NLP) which extracts personalized information from email communications between financial analysts and their clients. Personalized means connecting users with content in a personally meaningful way to create, grow, and retain online relationships. Personalization often results in the creation of user profiles that store individuals' preferences regarding goods or services offered by various e-commerce merchants. We developed an automatic metadata extraction system designed to process textual data such as emails, discussion group postings, or chat group transcriptions. The focus of this paper is the recognition of emotional contents such as mood and urgency, which are embedded in the business communications, as metadata.

Automatic Grading System for Subjective Questions Through Analyzing Question Type (질의문 유형 분석을 통한 서답형 자동 채점 시스템)

  • Kang, Won-Seog
    • The Journal of the Korea Contents Association
    • /
    • v.11 no.2
    • /
    • pp.13-21
    • /
    • 2011
  • It is not easy to develop the system as the subjective-type evaluation has the difficulty in natural language processing. This thesis designs and implements the automatic evaluation system with natural language processing technique. To solve the degradation of general evaluation system, we define the question type and improve the performance of evaluation through the adaptive process for each question type. To evaluate the system, we analyze the correlation between human evaluation and term-based evaluation, and between human evaluation and this system evaluation. We got the better result than term-based evaluation. It needs to expand the question type and improve the adaptive processing technique for each type.

Customized Information Analysis System Using National Defense News Data (국방 기사 데이터를 이용한 맞춤형 정보 분석 시스템)

  • Choi, Jung-Whoan;Lim, Chea-O
    • The Journal of the Korea Contents Association
    • /
    • v.10 no.12
    • /
    • pp.457-465
    • /
    • 2010
  • Customized information analysis system is a software system that can help to extract useful information from non-structured natural language data, process the information to customized form, and provide future forecast and reasoning information. To implement the information analysis system, we need natural language processing technology to analyze natural language, information extraction technology to detect necessary entity and its relationship from text, and data mining technology to discover new and unknown information from extracting data. This paper suggest virtual customized information analysis system processing national defense news data and introduce base technologies for information analysis.

Analyze the Open data for Natural Language Processing of Learning Counseling (학습 상담 내용의 자연어 처리를 위한 오픈 데이터 현황 분석)

  • Kim, Yu-Doo
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2019.05a
    • /
    • pp.500-501
    • /
    • 2019
  • In the $4^{th}$ generation industry, self-directed learning is very important than Injection learning. Therefore many educational institutions has developed method of self-directed learning. In order for self-directed learning to be effective, it is more important for faculty to manage the overall process of learning rather than being directly involved in the student's academic work. Therefore, learning counseling is an important way to effectively carry out self-directed learning. In this paper, we analyze the status of open data for natural language processing that can implement the learning consultation contents so that various applications can be done through natural language processing.

  • PDF

Building a Korean Sentiment Lexicon Using Collective Intelligence (집단지성을 이용한 한글 감성어 사전 구축)

  • An, Jungkook;Kim, Hee-Woong
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.49-67
    • /
    • 2015
  • Recently, emerging the notion of big data and social media has led us to enter data's big bang. Social networking services are widely used by people around the world, and they have become a part of major communication tools for all ages. Over the last decade, as online social networking sites become increasingly popular, companies tend to focus on advanced social media analysis for their marketing strategies. In addition to social media analysis, companies are mainly concerned about propagating of negative opinions on social networking sites such as Facebook and Twitter, as well as e-commerce sites. The effect of online word of mouth (WOM) such as product rating, product review, and product recommendations is very influential, and negative opinions have significant impact on product sales. This trend has increased researchers' attention to a natural language processing, such as a sentiment analysis. A sentiment analysis, also refers to as an opinion mining, is a process of identifying the polarity of subjective information and has been applied to various research and practical fields. However, there are obstacles lies when Korean language (Hangul) is used in a natural language processing because it is an agglutinative language with rich morphology pose problems. Therefore, there is a lack of Korean natural language processing resources such as a sentiment lexicon, and this has resulted in significant limitations for researchers and practitioners who are considering sentiment analysis. Our study builds a Korean sentiment lexicon with collective intelligence, and provides API (Application Programming Interface) service to open and share a sentiment lexicon data with the public (www.openhangul.com). For the pre-processing, we have created a Korean lexicon database with over 517,178 words and classified them into sentiment and non-sentiment words. In order to classify them, we first identified stop words which often quite likely to play a negative role in sentiment analysis and excluded them from our sentiment scoring. In general, sentiment words are nouns, adjectives, verbs, adverbs as they have sentimental expressions such as positive, neutral, and negative. On the other hands, non-sentiment words are interjection, determiner, numeral, postposition, etc. as they generally have no sentimental expressions. To build a reliable sentiment lexicon, we have adopted a concept of collective intelligence as a model for crowdsourcing. In addition, a concept of folksonomy has been implemented in the process of taxonomy to help collective intelligence. In order to make up for an inherent weakness of folksonomy, we have adopted a majority rule by building a voting system. Participants, as voters were offered three voting options to choose from positivity, negativity, and neutrality, and the voting have been conducted on one of the largest social networking sites for college students in Korea. More than 35,000 votes have been made by college students in Korea, and we keep this voting system open by maintaining the project as a perpetual study. Besides, any change in the sentiment score of words can be an important observation because it enables us to keep track of temporal changes in Korean language as a natural language. Lastly, our study offers a RESTful, JSON based API service through a web platform to make easier support for users such as researchers, companies, and developers. Finally, our study makes important contributions to both research and practice. In terms of research, our Korean sentiment lexicon plays an important role as a resource for Korean natural language processing. In terms of practice, practitioners such as managers and marketers can implement sentiment analysis effectively by using Korean sentiment lexicon we built. Moreover, our study sheds new light on the value of folksonomy by combining collective intelligence, and we also expect to give a new direction and a new start to the development of Korean natural language processing.

Sequence Labeling-based Multiple Causal Relations Extraction using Pre-trained Language Model for Maritime Accident Prevention (해양사고 예방을 위한 사전학습 언어모델의 순차적 레이블링 기반 복수 인과관계 추출)

  • Ki-Yeong Moon;Do-Hyun Kim;Tae-Hoon Yang;Sang-Duck Lee
    • Journal of the Korean Society of Safety
    • /
    • v.38 no.5
    • /
    • pp.51-57
    • /
    • 2023
  • Numerous studies have been conducted to analyze the causal relationships of maritime accidents using natural language processing techniques. However, when multiple causes and effects are associated with a single accident, the effectiveness of extracting these causal relations diminishes. To address this challenge, we compiled a dataset using verdicts from maritime accident cases in this study, analyzed their causal relations, and applied labeling considering the association information of various causes and effects. In addition, to validate the efficacy of our proposed methodology, we fine-tuned the KoELECTRA Korean language model. The results of our validation process demonstrated the ability of our approach to successfully extract multiple causal relationships from maritime accident cases.