• Title/Summary/Keyword: Text Model learning

Search Result 436, Processing Time 0.025 seconds

A Study on Brand Image Analysis of Gaming Business Corporation using KoBERT and Twitter Data

  • Kim, Hyunji
    • Journal of Korea Game Society
    • /
    • v.21 no.6
    • /
    • pp.75-86
    • /
    • 2021
  • Brand image refers to how customers, stakeholders and the market see and recognize the brand. A positive brand image leads to continuous purchases, but a negative brand image is directly linked to consumers' buying behavior, such as stopping purchases, so from the corporate perspective, it needs to be quickly and accurately identified. Currently, methods of investigating brand images include surveys and SNS surveys, which have limited number of samples and are time-consuming and costly. Therefore, in this study, we are going to conduct an emotional analysis of text data on social media by utilizing the machine learning based KoBERT model, and then suggest how to use it for game corporate brand image analysis and verify its performance. The result has proved some degree of usability showing the same ranking within five brands when compared with the BRI Korea's brand reputation ranking.

Comparison of Sentiment Classification Performance of for RNN and Transformer-Based Models on Korean Reviews (RNN과 트랜스포머 기반 모델들의 한국어 리뷰 감성분류 비교)

  • Jae-Hong Lee
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.18 no.4
    • /
    • pp.693-700
    • /
    • 2023
  • Sentiment analysis, a branch of natural language processing that classifies and identifies subjective opinions and emotions in text documents as positive or negative, can be used for various promotions and services through customer preference analysis. To this end, recent research has been conducted utilizing various techniques in machine learning and deep learning. In this study, we propose an optimal language model by comparing the accuracy of sentiment analysis for movie, product, and game reviews using existing RNN-based models and recent Transformer-based language models. In our experiments, LMKorBERT and GPT3 showed relatively good accuracy among the models pre-trained on the Korean corpus.

Meta Learning based Global Relation Extraction trained by Traditional Korean data (전통 문화 데이터를 이용한 메타 러닝 기반 전역 관계 추출)

  • Kim, Kuekyeng;Kim, Gyeongmin;Jo, Jaechoon;Lim, Heuiseok
    • Journal of the Korea Convergence Society
    • /
    • v.9 no.11
    • /
    • pp.23-28
    • /
    • 2018
  • Recent approaches to Relation Extraction methods mostly tend to be limited to mention level relation extractions. These types of methods, while featuring high performances, can only extract relations limited to a single sentence or so. The inability to extract these kinds of data is a terrible amount of information loss. To tackle this problem this paper presents an Augmented External Memory Neural Network model to enable Global Relation Extraction. the proposed model's Global relation extraction is done by first gathering and analyzing the mention level relation extraction by the Augmented External Memory. Additionally the proposed model shows high level of performances in korean due to the fact it can take the often omitted subjects and objectives into consideration.

A Study on Recognition of Citation Metadata using Bidirectional GRU-CRF Model based on Pre-trained Language Model (사전학습 된 언어 모델 기반의 양방향 게이트 순환 유닛 모델과 조건부 랜덤 필드 모델을 이용한 참고문헌 메타데이터 인식 연구)

  • Ji, Seon-yeong;Choi, Sung-pil
    • Journal of the Korean Society for information Management
    • /
    • v.38 no.1
    • /
    • pp.221-242
    • /
    • 2021
  • This study applied reference metadata recognition using bidirectional GRU-CRF model based on pre-trained language model. The experimental group consists of 161,315 references extracted by 53,562 academic documents in PDF format collected from 40 journals published in 2018 based on rules. In order to construct an experiment set. This study was conducted to automatically extract the references from academic literature in PDF format. Through this study, the language model with the highest performance was identified, and additional experiments were conducted on the model to compare the recognition performance according to the size of the training set. Finally, the performance of each metadata was confirmed.

Development of SVM-based Construction Project Document Classification Model to Derive Construction Risk (건설 리스크 도출을 위한 SVM 기반의 건설프로젝트 문서 분류 모델 개발)

  • Kang, Donguk;Cho, Mingeon;Cha, Gichun;Park, Seunghee
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.43 no.6
    • /
    • pp.841-849
    • /
    • 2023
  • Construction projects have risks due to various factors such as construction delays and construction accidents. Based on these construction risks, the method of calculating the construction period of the construction project is mainly made by subjective judgment that relies on supervisor experience. In addition, unreasonable shortening construction to meet construction project schedules delayed by construction delays and construction disasters causes negative consequences such as poor construction, and economic losses are caused by the absence of infrastructure due to delayed schedules. Data-based scientific approaches and statistical analysis are needed to solve the risks of such construction projects. Data collected in actual construction projects is stored in unstructured text, so to apply data-based risks, data pre-processing involves a lot of manpower and cost, so basic data through a data classification model using text mining is required. Therefore, in this study, a document-based data generation classification model for risk management was developed through a data classification model based on SVM (Support Vector Machine) by collecting construction project documents and utilizing text mining. Through quantitative analysis through future research results, it is expected that risk management will be possible by being used as efficient and objective basic data for construction project process management.

Object detection in financial reporting documents for subsequent recognition

  • Sokerin, Petr;Volkova, Alla;Kushnarev, Kirill
    • International journal of advanced smart convergence
    • /
    • v.10 no.1
    • /
    • pp.1-11
    • /
    • 2021
  • Document page segmentation is an important step in building a quality optical character recognition module. The study examined already existing work on the topic of page segmentation and focused on the development of a segmentation model that has greater functional significance for application in an organization, as well as broad capabilities for managing the quality of the model. The main problems of document segmentation were highlighted, which include a complex background of intersecting objects. As classes for detection, not only classic text, table and figure were selected, but also additional types, such as signature, logo and table without borders (or with partially missing borders). This made it possible to pose a non-trivial task of detecting non-standard document elements. The authors compared existing neural network architectures for object detection based on published research data. The most suitable architecture was RetinaNet. To ensure the possibility of quality control of the model, a method based on neural network modeling using the RetinaNet architecture is proposed. During the study, several models were built, the quality of which was assessed on the test sample using the Mean average Precision metric. The best result among the constructed algorithms was shown by a model that includes four neural networks: the focus of the first neural network on detecting tables and tables without borders, the second - seals and signatures, the third - pictures and logos, and the fourth - text. As a result of the analysis, it was revealed that the approach based on four neural networks showed the best results in accordance with the objectives of the study on the test sample in the context of most classes of detection. The method proposed in the article can be used to recognize other objects. A promising direction in which the analysis can be continued is the segmentation of tables; the areas of the table that differ in function will act as classes: heading, cell with a name, cell with data, empty cell.

Reputation Analysis of Document Using Probabilistic Latent Semantic Analysis Based on Weighting Distinctions (가중치 기반 PLSA를 이용한 문서 평가 분석)

  • Cho, Shi-Won;Lee, Dong-Wook
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.58 no.3
    • /
    • pp.632-638
    • /
    • 2009
  • Probabilistic Latent Semantic Analysis has many applications in information retrieval and filtering, natural language processing, machine learning from text, and in related areas. In this paper, we propose an algorithm using weighted Probabilistic Latent Semantic Analysis Model to find the contextual phrases and opinions from documents. The traditional keyword search is unable to find the semantic relations of phrases, Overcoming these obstacles requires the development of techniques for automatically classifying semantic relations of phrases. Through experiments, we show that the proposed algorithm works well to discover semantic relations of phrases and presents the semantic relations of phrases to the vector-space model. The proposed algorithm is able to perform a variety of analyses, including such as document classification, online reputation, and collaborative recommendation.

Video Captioning with Visual and Semantic Features

  • Lee, Sujin;Kim, Incheol
    • Journal of Information Processing Systems
    • /
    • v.14 no.6
    • /
    • pp.1318-1330
    • /
    • 2018
  • Video captioning refers to the process of extracting features from a video and generating video captions using the extracted features. This paper introduces a deep neural network model and its learning method for effective video captioning. In this study, visual features as well as semantic features, which effectively express the video, are also used. The visual features of the video are extracted using convolutional neural networks, such as C3D and ResNet, while the semantic features are extracted using a semantic feature extraction network proposed in this paper. Further, an attention-based caption generation network is proposed for effective generation of video captions using the extracted features. The performance and effectiveness of the proposed model is verified through various experiments using two large-scale video benchmarks such as the Microsoft Video Description (MSVD) and the Microsoft Research Video-To-Text (MSR-VTT).

Fine-tuning BERT Models for Keyphrase Extraction in Scientific Articles

  • Lim, Yeonsoo;Seo, Deokjin;Jung, Yuchul
    • Journal of Advanced Information Technology and Convergence
    • /
    • v.10 no.1
    • /
    • pp.45-56
    • /
    • 2020
  • Despite extensive research, performance enhancement of keyphrase (KP) extraction remains a challenging problem in modern informatics. Recently, deep learning-based supervised approaches have exhibited state-of-the-art accuracies with respect to this problem, and several of the previously proposed methods utilize Bidirectional Encoder Representations from Transformers (BERT)-based language models. However, few studies have investigated the effective application of BERT-based fine-tuning techniques to the problem of KP extraction. In this paper, we consider the aforementioned problem in the context of scientific articles by investigating the fine-tuning characteristics of two distinct BERT models - BERT (i.e., base BERT model by Google) and SciBERT (i.e., a BERT model trained on scientific text). Three different datasets (WWW, KDD, and Inspec) comprising data obtained from the computer science domain are used to compare the results obtained by fine-tuning BERT and SciBERT in terms of KP extraction.

Sign Language Dataset Built from S. Korean Government Briefing on COVID-19 (대한민국 정부의 코로나 19 브리핑을 기반으로 구축된 수어 데이터셋 연구)

  • Sim, Hohyun;Sung, Horyeol;Lee, Seungjae;Cho, Hyeonjoong
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.8
    • /
    • pp.325-330
    • /
    • 2022
  • This paper conducts the collection and experiment of datasets for deep learning research on sign language such as sign language recognition, sign language translation, and sign language segmentation for Korean sign language. There exist difficulties for deep learning research of sign language. First, it is difficult to recognize sign languages since they contain multiple modalities including hand movements, hand directions, and facial expressions. Second, it is the absence of training data to conduct deep learning research. Currently, KETI dataset is the only known dataset for Korean sign language for deep learning. Sign language datasets for deep learning research are classified into two categories: Isolated sign language and Continuous sign language. Although several foreign sign language datasets have been collected over time. they are also insufficient for deep learning research of sign language. Therefore, we attempted to collect a large-scale Korean sign language dataset and evaluate it using a baseline model named TSPNet which has the performance of SOTA in the field of sign language translation. The collected dataset consists of a total of 11,402 image and text. Our experimental result with the baseline model using the dataset shows BLEU-4 score 3.63, which would be used as a basic performance of a baseline model for Korean sign language dataset. We hope that our experience of collecting Korean sign language dataset helps facilitate further research directions on Korean sign language.