• Title/Summary/Keyword: KoBERT model

Search Result 45, Processing Time 0.017 seconds

Transformer-based Text Summarization Using Pre-trained Language Model (사전학습 언어 모델을 활용한 트랜스포머 기반 텍스트 요약)

  • Song, Eui-Seok;Kim, Museong;Lee, Yu-Rin;Ahn, Hyunchul;Kim, Namgyu
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2021.07a
    • /
    • pp.395-398
    • /
    • 2021
  • 최근 방대한 양의 텍스트 정보가 인터넷에 유통되면서 정보의 핵심 내용을 파악하기가 더욱 어려워졌으며, 이로 인해 자동으로 텍스트를 요약하려는 연구가 활발하게 이루어지고 있다. 텍스트 자동 요약을 위한 다양한 기법 중 특히 트랜스포머(Transformer) 기반의 모델은 추상 요약(Abstractive Summarization) 과제에서 매우 우수한 성능을 보이며, 해당 분야의 SOTA(State of the Art)를 달성하고 있다. 하지만 트랜스포머 모델은 매우 많은 수의 매개변수들(Parameters)로 구성되어 있어서, 충분한 양의 데이터가 확보되지 않으면 이들 매개변수에 대한 충분한 학습이 이루어지지 않아서 양질의 요약문을 생성하기 어렵다는 한계를 갖는다. 이러한 한계를 극복하기 위해 본 연구는 소량의 데이터가 주어진 환경에서도 양질의 요약문을 생성할 수 있는 문서 요약 방법론을 제안한다. 구체적으로 제안 방법론은 한국어 사전학습 언어 모델인 KoBERT의 임베딩 행렬을 트랜스포머 모델에 적용하는 방식으로 문서 요약을 수행하며, 제안 방법론의 우수성은 Dacon 한국어 문서 생성 요약 데이터셋에 대한 실험을 통해 ROUGE 지표를 기준으로 평가하였다.

  • PDF

A Novel Image Captioning based Risk Assessment Model (이미지 캡셔닝 기반의 새로운 위험도 측정 모델)

  • Jeon, Min Seong;Ko, Jae Pil;Cheoi, Kyung Joo
    • The Journal of Information Systems
    • /
    • v.32 no.4
    • /
    • pp.119-136
    • /
    • 2023
  • Purpose We introduce a groundbreaking surveillance system explicitly designed to overcome the limitations typically associated with conventional surveillance systems, which often focus primarily on object-centric behavior analysis. Design/methodology/approach The study introduces an innovative approach to risk assessment in surveillance, employing image captioning to generate descriptive captions that effectively encapsulate the interactions among objects, actions, and spatial elements within observed scenes. To support our methodology, we developed a distinctive dataset comprising pairs of [image-caption-danger score] for training purposes. We fine-tuned the BLIP-2 model using this dataset and utilized BERT to decipher the semantic content of the generated captions for assessing risk levels. Findings In a series of experiments conducted with our self-constructed datasets, we illustrate that these datasets offer a wealth of information for risk assessment and display outstanding performance in this area. In comparison to models pre-trained on established datasets, our generated captions thoroughly encompass the necessary object attributes, behaviors, and spatial context crucial for the surveillance system. Additionally, they showcase adaptability to novel sentence structures, ensuring their versatility across a range of contexts.

Artificial Intelligence for Assistance of Facial Expression Practice Using Emotion Classification (감정 분류를 이용한 표정 연습 보조 인공지능)

  • Dong-Kyu, Kim;So Hwa, Lee;Jae Hwan, Bong
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.17 no.6
    • /
    • pp.1137-1144
    • /
    • 2022
  • In this study, an artificial intelligence(AI) was developed to help with facial expression practice in order to express emotions. The developed AI used multimodal inputs consisting of sentences and facial images for deep neural networks (DNNs). The DNNs calculated similarities between the emotions predicted by the sentences and the emotions predicted by facial images. The user practiced facial expressions based on the situation given by sentences, and the AI provided the user with numerical feedback based on the similarity between the emotion predicted by sentence and the emotion predicted by facial expression. ResNet34 structure was trained on FER2013 public data to predict emotions from facial images. To predict emotions in sentences, KoBERT model was trained in transfer learning manner using the conversational speech dataset for emotion classification opened to the public by AIHub. The DNN that predicts emotions from the facial images demonstrated 65% accuracy, which is comparable to human emotional classification ability. The DNN that predicts emotions from the sentences achieved 90% accuracy. The performance of the developed AI was evaluated through experiments with changing facial expressions in which an ordinary person was participated.

Topic Modeling Insomnia Social Media Corpus using BERTopic and Building Automatic Deep Learning Classification Model (BERTopic을 활용한 불면증 소셜 데이터 토픽 모델링 및 불면증 경향 문헌 딥러닝 자동분류 모델 구축)

  • Ko, Young Soo;Lee, Soobin;Cha, Minjung;Kim, Seongdeok;Lee, Juhee;Han, Ji Yeong;Song, Min
    • Journal of the Korean Society for information Management
    • /
    • v.39 no.2
    • /
    • pp.111-129
    • /
    • 2022
  • Insomnia is a chronic disease in modern society, with the number of new patients increasing by more than 20% in the last 5 years. Insomnia is a serious disease that requires diagnosis and treatment because the individual and social problems that occur when there is a lack of sleep are serious and the triggers of insomnia are complex. This study collected 5,699 data from 'insomnia', a community on 'Reddit', a social media that freely expresses opinions. Based on the International Classification of Sleep Disorders ICSD-3 standard and the guidelines with the help of experts, the insomnia corpus was constructed by tagging them as insomnia tendency documents and non-insomnia tendency documents. Five deep learning language models (BERT, RoBERTa, ALBERT, ELECTRA, XLNet) were trained using the constructed insomnia corpus as training data. As a result of performance evaluation, RoBERTa showed the highest performance with an accuracy of 81.33%. In order to in-depth analysis of insomnia social data, topic modeling was performed using the newly emerged BERTopic method by supplementing the weaknesses of LDA, which is widely used in the past. As a result of the analysis, 8 subject groups ('Negative emotions', 'Advice and help and gratitude', 'Insomnia-related diseases', 'Sleeping pills', 'Exercise and eating habits', 'Physical characteristics', 'Activity characteristics', 'Environmental characteristics') could be confirmed. Users expressed negative emotions and sought help and advice from the Reddit insomnia community. In addition, they mentioned diseases related to insomnia, shared discourse on the use of sleeping pills, and expressed interest in exercise and eating habits. As insomnia-related characteristics, we found physical characteristics such as breathing, pregnancy, and heart, active characteristics such as zombies, hypnic jerk, and groggy, and environmental characteristics such as sunlight, blankets, temperature, and naps.

A Study on Automated Fake News Detection Using Verification Articles (검증 자료를 활용한 가짜뉴스 탐지 자동화 연구)

  • Han, Yoon-Jin;Kim, Geun-Hyung
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.10 no.12
    • /
    • pp.569-578
    • /
    • 2021
  • Thanks to web development today, we can easily access online news via various media. As much as it is easy to access online news, we often face fake news pretending to be true. As fake news items have become a global problem, fact-checking services are provided domestically, too. However, these are based on expert-based manual detection, and research to provide technologies that automate the detection of fake news is being actively conducted. As for the existing research, detection is made available based on contextual characteristics of an article and the comparison of a title and the main article. However, there is a limit to such an attempt making detection difficult when manipulation precision has become high. Therefore, this study suggests using a verifying article to decide whether a news item is genuine or not to be affected by article manipulation. Also, to improve the precision of fake news detection, the study added a process to summarize a subject article and a verifying article through the summarization model. In order to verify the suggested algorithm, this study conducted verification for summarization method of documents, verification for search method of verification articles, and verification for the precision of fake news detection in the finally suggested algorithm. The algorithm suggested in this study can be helpful to identify the truth of an article before it is applied to media sources and made available online via various media sources.