• Title/Summary/Keyword: BERT

Search Result 396, Processing Time 0.021 seconds

Multicriteria Movie Recommendation Model Combining Aspect-based Sentiment Classification Using BERT

  • Lee, Yurin;Ahn, Hyunchul
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.3
    • /
    • pp.201-207
    • /
    • 2022
  • In this paper, we propose a movie recommendation model that uses the users' ratings as well as their reviews. To understand the user's preference from multicriteria perspectives, the proposed model is designed to apply attribute-based sentiment analysis to the reviews. For doing this, it divides the reviews left by customers into multicriteria components according to its implicit attributes, and applies BERT-based sentiment analysis to each of them. After that, our model selectively combines the attributes that each user considers important to CF to generate recommendation results. To validate usefulness of the proposed model, we applied it to the real-world movie recommendation case. Experimental results showed that the accuracy of the proposed model was improved compared to the traditional CF. This study has academic and practical significance since it presents a new approach to select and use models in consideration of individual characteristics, and to derive various attributes from a review instead of evaluating each of them.

Document Classification Methodology Using Autoencoder-based Keywords Embedding

  • Seobin Yoon;Namgyu Kim
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.9
    • /
    • pp.35-46
    • /
    • 2023
  • In this study, we propose a Dual Approach methodology to enhance the accuracy of document classifiers by utilizing both contextual and keyword information. Firstly, contextual information is extracted using Google's BERT, a pre-trained language model known for its outstanding performance in various natural language understanding tasks. Specifically, we employ KoBERT, a pre-trained model on the Korean corpus, to extract contextual information in the form of the CLS token. Secondly, keyword information is generated for each document by encoding the set of keywords into a single vector using an Autoencoder. We applied the proposed approach to 40,130 documents related to healthcare and medicine from the National R&D Projects database of the National Science and Technology Information Service (NTIS). The experimental results demonstrate that the proposed methodology outperforms existing methods that rely solely on document or word information in terms of accuracy for document classification.

Analyzing Effective Poll Prediction Model Using Social Media (SNS) Data Augmentation (소셜 미디어(SNS) 데이터 증강을 활용한 효과적인 여론조사 예측 모델 분석)

  • Hwang, Sunik;Oh, Hayoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.12
    • /
    • pp.1800-1808
    • /
    • 2022
  • During the election period, many polling agencies survey and distribute the approval ratings for each candidate. In the past, public opinion was expressed through the Internet, mobile SNS, or community, although in the past, people had no choice but to survey the approval rating by relying on opinion polls. Therefore, if the public opinion expressed on the Internet is understood through natural language analysis, it is possible to determine the candidate's approval rate as accurately as the result of the opinion poll. Therefore, this paper proposes a method of inferring the approval rate of candidates during the election period by synthesizing the political comments of users through internet community posting data. In order to analyze the approval rate in the post, I would like to suggest a method for generating the model that has the highest correlation with the actual opinion poll by using the KoBert, KcBert, and KoELECTRA models.

Intrusion Detection System based on Packet Payload Analysis using Transformer

  • Woo-Seung Park;Gun-Nam Kim;Soo-Jin Lee
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.11
    • /
    • pp.81-87
    • /
    • 2023
  • Intrusion detection systems that learn metadata of network packets have been proposed recently. However these approaches require time to analyze packets to generate metadata for model learning, and time to pre-process metadata before learning. In addition, models that have learned specific metadata cannot detect intrusion by using original packets flowing into the network as they are. To address the problem, this paper propose a natural language processing-based intrusion detection system that detects intrusions by learning the packet payload as a single sentence without an additional conversion process. To verify the performance of our approach, we utilized the UNSW-NB15 and Transformer models. First, the PCAP files of the dataset were labeled, and then two Transformer (BERT, DistilBERT) models were trained directly in the form of sentences to analyze the detection performance. The experimental results showed that the binary classification accuracy was 99.03% and 99.05%, respectively, which is similar or superior to the detection performance of the techniques proposed in previous studies. Multi-class classification showed better performance with 86.63% and 86.36%, respectively.

Recommender system using BERT sentiment analysis (BERT 기반 감성분석을 이용한 추천시스템)

  • Park, Ho-yeon;Kim, Kyoung-jae
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.2
    • /
    • pp.1-15
    • /
    • 2021
  • If it is difficult for us to make decisions, we ask for advice from friends or people around us. When we decide to buy products online, we read anonymous reviews and buy them. With the advent of the Data-driven era, IT technology's development is spilling out many data from individuals to objects. Companies or individuals have accumulated, processed, and analyzed such a large amount of data that they can now make decisions or execute directly using data that used to depend on experts. Nowadays, the recommender system plays a vital role in determining the user's preferences to purchase goods and uses a recommender system to induce clicks on web services (Facebook, Amazon, Netflix, Youtube). For example, Youtube's recommender system, which is used by 1 billion people worldwide every month, includes videos that users like, "like" and videos they watched. Recommended system research is deeply linked to practical business. Therefore, many researchers are interested in building better solutions. Recommender systems use the information obtained from their users to generate recommendations because the development of the provided recommender systems requires information on items that are likely to be preferred by the user. We began to trust patterns and rules derived from data rather than empirical intuition through the recommender systems. The capacity and development of data have led machine learning to develop deep learning. However, such recommender systems are not all solutions. Proceeding with the recommender systems, there should be no scarcity in all data and a sufficient amount. Also, it requires detailed information about the individual. The recommender systems work correctly when these conditions operate. The recommender systems become a complex problem for both consumers and sellers when the interaction log is insufficient. Because the seller's perspective needs to make recommendations at a personal level to the consumer and receive appropriate recommendations with reliable data from the consumer's perspective. In this paper, to improve the accuracy problem for "appropriate recommendation" to consumers, the recommender systems are proposed in combination with context-based deep learning. This research is to combine user-based data to create hybrid Recommender Systems. The hybrid approach developed is not a collaborative type of Recommender Systems, but a collaborative extension that integrates user data with deep learning. Customer review data were used for the data set. Consumers buy products in online shopping malls and then evaluate product reviews. Rating reviews are based on reviews from buyers who have already purchased, giving users confidence before purchasing the product. However, the recommendation system mainly uses scores or ratings rather than reviews to suggest items purchased by many users. In fact, consumer reviews include product opinions and user sentiment that will be spent on evaluation. By incorporating these parts into the study, this paper aims to improve the recommendation system. This study is an algorithm used when individuals have difficulty in selecting an item. Consumer reviews and record patterns made it possible to rely on recommendations appropriately. The algorithm implements a recommendation system through collaborative filtering. This study's predictive accuracy is measured by Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE). Netflix is strategically using the referral system in its programs through competitions that reduce RMSE every year, making fair use of predictive accuracy. Research on hybrid recommender systems combining the NLP approach for personalization recommender systems, deep learning base, etc. has been increasing. Among NLP studies, sentiment analysis began to take shape in the mid-2000s as user review data increased. Sentiment analysis is a text classification task based on machine learning. The machine learning-based sentiment analysis has a disadvantage in that it is difficult to identify the review's information expression because it is challenging to consider the text's characteristics. In this study, we propose a deep learning recommender system that utilizes BERT's sentiment analysis by minimizing the disadvantages of machine learning. This study offers a deep learning recommender system that uses BERT's sentiment analysis by reducing the disadvantages of machine learning. The comparison model was performed through a recommender system based on Naive-CF(collaborative filtering), SVD(singular value decomposition)-CF, MF(matrix factorization)-CF, BPR-MF(Bayesian personalized ranking matrix factorization)-CF, LSTM, CNN-LSTM, GRU(Gated Recurrent Units). As a result of the experiment, the recommender system based on BERT was the best.

Sentiment analysis of Korean movie reviews using XLM-R

  • Shin, Noo Ri;Kim, TaeHyeon;Yun, Dai Yeol;Moon, Seok-Jae;Hwang, Chi-gon
    • International Journal of Advanced Culture Technology
    • /
    • v.9 no.2
    • /
    • pp.86-90
    • /
    • 2021
  • Sentiment refers to a person's thoughts, opinions, and feelings toward an object. Sentiment analysis is a process of collecting opinions on a specific target and classifying them according to their emotions, and applies to opinion mining that analyzes product reviews and reviews on the web. Companies and users can grasp the opinions of public opinion and come up with a way to do so. Recently, natural language processing models using the Transformer structure have appeared, and Google's BERT is a representative example. Afterwards, various models came out by remodeling the BERT. Among them, the Facebook AI team unveiled the XLM-R (XLM-RoBERTa), an upgraded XLM model. XLM-R solved the data limitation and the curse of multilinguality by training XLM with 2TB or more refined CC (CommonCrawl), not Wikipedia data. This model showed that the multilingual model has similar performance to the single language model when it is trained by adjusting the size of the model and the data required for training. Therefore, in this paper, we study the improvement of Korean sentiment analysis performed using a pre-trained XLM-R model that solved curse of multilinguality and improved performance.

A Study on Brand Image Analysis of Gaming Business Corporation using KoBERT and Twitter Data

  • Kim, Hyunji
    • Journal of Korea Game Society
    • /
    • v.21 no.6
    • /
    • pp.75-86
    • /
    • 2021
  • Brand image refers to how customers, stakeholders and the market see and recognize the brand. A positive brand image leads to continuous purchases, but a negative brand image is directly linked to consumers' buying behavior, such as stopping purchases, so from the corporate perspective, it needs to be quickly and accurately identified. Currently, methods of investigating brand images include surveys and SNS surveys, which have limited number of samples and are time-consuming and costly. Therefore, in this study, we are going to conduct an emotional analysis of text data on social media by utilizing the machine learning based KoBERT model, and then suggest how to use it for game corporate brand image analysis and verify its performance. The result has proved some degree of usability showing the same ranking within five brands when compared with the BRI Korea's brand reputation ranking.

Development of Tourism Information Named Entity Recognition Datasets for the Fine-tune KoBERT-CRF Model

  • Jwa, Myeong-Cheol;Jwa, Jeong-Woo
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.14 no.2
    • /
    • pp.55-62
    • /
    • 2022
  • A smart tourism chatbot is needed as a user interface to efficiently provide smart tourism services such as recommended travel products, tourist information, my travel itinerary, and tour guide service to tourists. We have been developed a smart tourism app and a smart tourism information system that provide smart tourism services to tourists. We also developed a smart tourism chatbot service consisting of khaiii morpheme analyzer, rule-based intention classification, and tourism information knowledge base using Neo4j graph database. In this paper, we develop the Korean and English smart tourism Name Entity (NE) datasets required for the development of the NER model using the pre-trained language models (PLMs) for the smart tourism chatbot system. We create the tourism information NER datasets by collecting source data through smart tourism app, visitJeju web of Jeju Tourism Organization (JTO), and web search, and preprocessing it using Korean and English tourism information Name Entity dictionaries. We perform training on the KoBERT-CRF NER model using the developed Korean and English tourism information NER datasets. The weight-averaged precision, recall, and f1 scores are 0.94, 0.92 and 0.94 on Korean and English tourism information NER datasets.

Weibo Disaster Rumor Recognition Method Based on Adversarial Training and Stacked Structure

  • Diao, Lei;Tang, Zhan;Guo, Xuchao;Bai, Zhao;Lu, Shuhan;Li, Lin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.10
    • /
    • pp.3211-3229
    • /
    • 2022
  • To solve the problems existing in the process of Weibo disaster rumor recognition, such as lack of corpus, poor text standardization, difficult to learn semantic information, and simple semantic features of disaster rumor text, this paper takes Sina Weibo as the data source, constructs a dataset for Weibo disaster rumor recognition, and proposes a deep learning model BERT_AT_Stacked LSTM for Weibo disaster rumor recognition. First, add adversarial disturbance to the embedding vector of each word to generate adversarial samples to enhance the features of rumor text, and carry out adversarial training to solve the problem that the text features of disaster rumors are relatively single. Second, the BERT part obtains the word-level semantic information of each Weibo text and generates a hidden vector containing sentence-level feature information. Finally, the hidden complex semantic information of poorly-regulated Weibo texts is learned using a Stacked Long Short-Term Memory (Stacked LSTM) structure. The experimental results show that, compared with other comparative models, the model in this paper has more advantages in recognizing disaster rumors on Weibo, with an F1_Socre of 97.48%, and has been tested on an open general domain dataset, with an F1_Score of 94.59%, indicating that the model has better generalization.

A Study on the Construction of an Emotion Corpus Using a Pre-trained Language Model (사전 학습 언어 모델을 활용한 감정 말뭉치 구축 연구 )

  • Yeonji Jang;Fei Li;Yejee Kang;Hyerin Kang;Seoyoon Park;Hansaem Kim
    • Annual Conference on Human and Language Technology
    • /
    • 2022.10a
    • /
    • pp.238-244
    • /
    • 2022
  • 감정 분석은 텍스트에 표현된 인간의 감정을 인식하여 다양한 감정 유형으로 분류하는 것이다. 섬세한 인간의 감정을 보다 정확히 분류하기 위해서는 감정 유형의 분류가 무엇보다 중요하다. 본 연구에서는 사전 학습 언어 모델을 활용하여 우리말샘의 감정 어휘와 용례를 바탕으로 기쁨, 슬픔, 공포, 분노, 혐오, 놀람, 흥미, 지루함, 통증의 감정 유형으로 분류된 감정 말뭉치를 구축하였다. 감정 말뭉치를 구축한 후 성능 평가를 위해 대표적인 트랜스포머 기반 사전 학습 모델 중 RoBERTa, MultiDistilBert, MultiBert, KcBert, KcELECTRA. KoELECTRA를 활용하여 보다 넓은 범위에서 객관적으로 모델 간의 성능을 평가하고 각 감정 유형별 정확도를 바탕으로 감정 유형의 특성을 알아보았다. 그 결과 각 모델의 학습 구조가 다중 분류 말뭉치에 어떤 영향을 주는지 구체적으로 파악할 수 있었으며, ELECTRA가 상대적으로 우수한 성능을 보여주고 있음을 확인하였다. 또한 감정 유형별 성능을 비교를 통해 다양한 감정 유형 중 기쁨, 슬픔, 공포에 대한 성능이 우수하다는 것을 알 수 있었다.

  • PDF