• Title/Summary/Keyword: KoNLP

Search Result 31, Processing Time 0.029 seconds

Exploiting Korean Language Model to Improve Korean Voice Phishing Detection (한국어 언어 모델을 활용한 보이스피싱 탐지 기능 개선)

  • Boussougou, Milandu Keith Moussavou;Park, Dong-Joo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.10
    • /
    • pp.437-446
    • /
    • 2022
  • Text classification task from Natural Language Processing (NLP) combined with state-of-the-art (SOTA) Machine Learning (ML) and Deep Learning (DL) algorithms as the core engine is widely used to detect and classify voice phishing call transcripts. While numerous studies on the classification of voice phishing call transcripts are being conducted and demonstrated good performances, with the increase of non-face-to-face financial transactions, there is still the need for improvement using the latest NLP technologies. This paper conducts a benchmarking of Korean voice phishing detection performances of the pre-trained Korean language model KoBERT, against multiple other SOTA algorithms based on the classification of related transcripts from the labeled Korean voice phishing dataset called KorCCVi. The results of the experiments reveal that the classification accuracy on a test set of the KoBERT model outperforms the performances of all other models with an accuracy score of 99.60%.

A.I voice phishing detection solution using NLP Algorithms (NLP 알고리즘을 활용한 A.I 보이스피싱 탐지 솔루션)

  • Tae-Kyung Kim;Eun-Ju Park;Ji-Won Park;A-Lim Han
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.11a
    • /
    • pp.1045-1046
    • /
    • 2023
  • 본 논문은 디지털 소외계층과 사회적 약자를 고려한 보이스피싱 예방 솔루션을 제안한다. 통화 내용을 AWS Transcribe를 활용한 STT와 NLP 알고리즘을 사용해 실시간으로 보이스피싱 위험도를 파악하고 결과를 사용자에게 전달하도록 한다. NLP 알고리즘은 KoBIGBIRD와 DeBERTa 모델 각각을 커스터마이즈하여 보이스피싱 탐지에 적절하게 파인튜닝 했다. 이후, 성능과 인퍼런스를 비교하여 더 좋은 성능을 보인 KoBIGBIRD 모델로 보이스피싱 탐지를 수행한다.

Research on Natural Language Processing Package using Open Source Software (오픈소스 소프트웨어를 활용한 자연어 처리 패키지 제작에 관한 연구)

  • Lee, Jong-Hwa;Lee, Hyun-Kyu
    • The Journal of Information Systems
    • /
    • v.25 no.4
    • /
    • pp.121-139
    • /
    • 2016
  • Purpose In this study, we propose the special purposed R package named ""new_Noun()" to process nonstandard texts appeared in various social networks. As the Big data is getting interested, R - analysis tool and open source software is also getting more attention in many fields. Design/methodology/approach With more than 9,000 R packages, R provides a user-friendly functions of a variety of data mining, social network analysis and simulation functions such as statistical analysis, classification, prediction, clustering and association analysis. Especially, "KoNLP" - natural language processing package for Korean language - has reduced the time and effort of many researchers. However, as the social data increases, the informal expressions of Hangeul (Korean character) such as emoticons, informal terms and symbols make the difficulties increase in natural language processing. Findings In this study, to solve the these difficulties, special algorithms that upgrade existing open source natural language processing package have been researched. By utilizing the "KoNLP" package and analyzing the main functions in noun extracting command, we developed a new integrated noun processing package "new_Noun()" function to extract nouns which improves more than 29.1% compared with existing package.

Intelligent Wordcloud Using Text Mining (텍스트 마이닝을 이용한 지능적 워드클라우드)

  • Kim, Yeongchang;Ji, Sangsu;Park, Dongseo;Lee, Choong Ho
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2019.05a
    • /
    • pp.325-326
    • /
    • 2019
  • This paper proposes an intelligent word cloud by improving the existing method of representing word cloud by examining the frequency of nouns with text mining technique. In this paper, we propose a method to visually show word clouds focused on other parts, such as verbs, by effectively adding newly-coined words and the like to a dictionary that extracts noun words in text mining. In the experiment, the KoNLP package was used for extracting the frequency of existing nouns, and 80 new words that were not supported were added manually by examining frequency.

  • PDF

Analysis of the Korean Tokenizing Library Module (한글 토크나이징 라이브러리 모듈 분석)

  • Lee, Jae-kyung;Seo, Jin-beom;Cho, Young-bok
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.05a
    • /
    • pp.78-80
    • /
    • 2021
  • Currently, research on natural language processing (NLP) is rapidly evolving. Natural language processing is a technology that allows computers to analyze the meanings of languages used in everyday life, and is used in various fields such as speech recognition, spelling tests, and text classification. Currently, the most commonly used natural language processing library is NLTK based on English, which has a disadvantage in Korean language processing. Therefore, after introducing KonLPy and Soynlp, the Korean Tokenizing libraries, we will analyze morphology analysis and processing techniques, compare and analyze modules with Soynlp that complement KonLPy's shortcomings, and use them as natural language processing models.

  • PDF

Linguistic Analysis of Bumwoo KIM Chi Young's Cogitation on Mathematics (범우 김치영선생의 수학에 대한 사유의 언어적 분석)

  • Lee, Kang Sup;Lee, Hyun Soo
    • Communications of Mathematical Education
    • /
    • v.32 no.2
    • /
    • pp.207-223
    • /
    • 2018
  • In this study, we studied Bumwoo KIM Chi Young's cogitation on mathematics, and analyzed his typical 3 essays on mathematics by KoNLP. Approximately 80% of Bumwoo's sentences consist of less than 30. His writing became clearer over the years. It is verified from the mean and standard deviation of the number of words in a sentence are decreasing. Bumwoo emphasized the structure in mathematics, and he was a strong advocate of importancy on axiom, topolized and category as the characteristics of modern mathematics. In particular, it can be seen that the relations between 'mathematics', 'axiom', 'structure', 'Euclid', 'axiomatic system' and 'set' were his main topic.

Development and Evaluation of a Korean Treebank and its Application to NLP

  • Han, Chung-Hye;Han, Na-Rae;Ko, Eon-Suk;Martha Palmer
    • Language and Information
    • /
    • v.6 no.1
    • /
    • pp.123-138
    • /
    • 2002
  • This paper discusses issues in building a 54-thousand-word Korean Treebank using a phrase structure annotation, along with developing annotation guidelines based on the morpho-syntactic phenomena represented in the corpus. Various methods that were employed for quality control are presented. The evaluation on the quality of the Treebank and some of the NLP applications under development using the Treebank are also pre-sented.

  • PDF

A Study on Auto-Classification of Aviation Safety Data using NLP Algorithm (자연어처리 알고리즘을 이용한 위험기반 항공안전데이터 자동분류 방안 연구)

  • Sung-Hoon Yang;Young Choi;So-young Jung;Joo-hyun Ahn
    • Journal of Advanced Navigation Technology
    • /
    • v.26 no.6
    • /
    • pp.528-535
    • /
    • 2022
  • Although the domestic aviation industry has made rapid progress with the development of aircraft manufacturing and transportation technologies, aviation safety accidents continue to occur. The supervisory agency classifies hazards and risks based on risk-based aviation safety data, identifies safety trends for each air transportation operator, and conducts pre-inspections to prevent event and accidents. However, the human classification of data described in natural language format results in different results depending on knowledge, experience, and propensity, and it takes a considerable amount of time to understand and classify the meaning of the content. Therefore, in this journal, the fine-tuned KoBERT model was machine-learned over 5,000 data to predict the classification value of new data, showing 79.2% accuracy. In addition, some of the same result prediction and failed data for similar events were errors caused by human.

A Study on Trend of Overseas Expansion Strategy Research (기업의 해외 진출 전략 연구 동향)

  • Seo, Dong-Pil;Kim, Beom-Seok
    • Journal of the Korea Convergence Society
    • /
    • v.11 no.1
    • /
    • pp.279-284
    • /
    • 2020
  • Advances in technology are bringing a lot of change.. Due to Korea's economic growth, domestic companies are expanding overseas unlike in the past.. Recently, many small and medium-sized companies, in addition to large companies, are making inroads in Southeast Asian markets thanks to the Korean Wave. This study used the international academic database scopus to identify trends in the company's overseas expansion strategy. A search was conducted under the title of the word overseas advancement strategy, which secured a total of 153 papers. Abstracts of the research paper were refined for analysis and then analyzed using KoNLP package. As a result, 10 important keywords were derived. The purpose of this study is to identify the research trends of companies in overseas market through these results. This study provides a guideline for future research on overseas expansion strategies.

SimKoR: A Sentence Similarity Dataset based on Korean Review Data and Its Application to Contrastive Learning for NLP (SimKoR: 한국어 리뷰 데이터를 활용한 문장 유사도 데이터셋 제안 및 대조학습에서의 활용 방안 )

  • Jaemin Kim;Yohan Na;Kangmin Kim;Sang Rak Lee;Dong-Kyu Chae
    • Annual Conference on Human and Language Technology
    • /
    • 2022.10a
    • /
    • pp.245-248
    • /
    • 2022
  • 최근 자연어 처리 분야에서 문맥적 의미를 반영하기 위한 대조학습 (contrastive learning) 에 대한 연구가 활발히 이뤄지고 있다. 이 때 대조학습을 위한 양질의 학습 (training) 데이터와 검증 (validation) 데이터를 이용하는 것이 중요하다. 그러나 한국어의 경우 대다수의 데이터셋이 영어로 된 데이터를 한국어로 기계 번역하여 검토 후 제공되는 데이터셋 밖에 존재하지 않는다. 이는 기계번역의 성능에 의존하는 단점을 갖고 있다. 본 논문에서는 한국어 리뷰 데이터로 임베딩의 의미 반영 정도를 측정할 수 있는 간단한 검증 데이터셋 구축 방법을 제안하고, 이를 활용한 데이터셋인 SimKoR (Similarity Korean Review dataset) 을 제안한다. 제안하는 검증 데이터셋을 이용해서 대조학습을 수행하고 효과성을 보인다.

  • PDF