DOI QR코드

DOI QR Code

Sentiment analysis of Korean movie reviews using XLM-R

  • Shin, Noo Ri (Graduate School of Smart Convergence KwangWoon University) ;
  • Kim, TaeHyeon (Department of Plasma Bioscience and Display, KwangWoon University) ;
  • Yun, Dai Yeol (Department of information and communication Engineering, Institute of Information Technology, Kwangwoon University) ;
  • Moon, Seok-Jae (Institute of Information Technology, Kwangwoon University) ;
  • Hwang, Chi-gon (Department of Computer Engineering, Institute of Information Technology, Kwangwoon University)
  • Received : 2021.04.19
  • Accepted : 2021.05.20
  • Published : 2021.06.30

Abstract

Sentiment refers to a person's thoughts, opinions, and feelings toward an object. Sentiment analysis is a process of collecting opinions on a specific target and classifying them according to their emotions, and applies to opinion mining that analyzes product reviews and reviews on the web. Companies and users can grasp the opinions of public opinion and come up with a way to do so. Recently, natural language processing models using the Transformer structure have appeared, and Google's BERT is a representative example. Afterwards, various models came out by remodeling the BERT. Among them, the Facebook AI team unveiled the XLM-R (XLM-RoBERTa), an upgraded XLM model. XLM-R solved the data limitation and the curse of multilinguality by training XLM with 2TB or more refined CC (CommonCrawl), not Wikipedia data. This model showed that the multilingual model has similar performance to the single language model when it is trained by adjusting the size of the model and the data required for training. Therefore, in this paper, we study the improvement of Korean sentiment analysis performed using a pre-trained XLM-R model that solved curse of multilinguality and improved performance.

Keywords

References

  1. Y.T. Oh, M.T. Kim, and W.J. Kim, "Korean Movie-review Sentiment Analysis Using Parallel Stacked Bidirectional LSTM Model," Journal of KIISE 46.1, 45-49, 2019. doi: 10.5626/JOK.2019.46.1.45
  2. G.Y. Kim, and C.K. Lee, " Korean Movie Review Sentiment Analysis Using Convolutional Neural Network," In: Proc. of the KIISE Korea Computer Congress, 747-749, 2016.
  3. K.H. Park, S.H. Na, J.H. Shin, and Y.K. Kim, "BERT for Korean Natural Language Processing: Named Entity Tagging, Sentiment Analysis, Dependency Parsing and Semantic Role Labeling," The Korean Institute of Information Scientists and Engineers, 584-586, 2019.
  4. S.A. Lee, H.S. Jang, Y.M. Baik, S.Z. Park, and H.P. Shin, "A Small-Scale Korean-Specific BERT Language Model," Journal of KIISE 47.7, 682-692, 2020. doi: 10.5626/JOK.2020.47.7.682
  5. J. Devlin, M.W. Chang, K. Lee, and K. Toutanova, "Bert: Pre-training of Deep Bidirectional Transformers for Language Understanding," arXiv preprint arXiv:1810.04805, 2018.
  6. A. Conneau, K. Khandelwal, N. Goyal, V. Chaudhary, G. Wenzek, F. Guzman, E. Grave, M. Ott, L. Zettlemoyer, and V. Stoyanov, "Unsupervised Cross-lingual Representation Learning at Scale," arXiv preprint arXiv:1911.02116, 2019.
  7. L. Park, Naver sentiment movie corpus v1.0, https://github.com/e9t/nsmc