DOI QR코드

DOI QR Code

Fine-tuning BERT-based NLP Models for Sentiment Analysis of Korean Reviews: Optimizing the sequence length

BERT 기반 자연어처리 모델의 미세 조정을 통한 한국어 리뷰 감성 분석: 입력 시퀀스 길이 최적화

  • Sunga Hwang (Yonsei Graduate School of Information) ;
  • Seyeon Park (Yonsei Graduate School of Information) ;
  • Beakcheol Jang (Yonsei Graduate School of Information)
  • 황성아 ;
  • 박세연 ;
  • 장백철
  • Received : 2024.07.05
  • Accepted : 2024.08.06
  • Published : 2024.08.31

Abstract

This paper proposes a method for fine-tuning BERT-based natural language processing models to perform sentiment analysis on Korean review data. By varying the input sequence length during this process and comparing the performance, we aim to explore the optimal performance according to the input sequence length. For this purpose, text review data collected from the clothing shopping platform M was utilized. Through web scraping, review data was collected. During the data preprocessing stage, positive and negative satisfaction scores were recalibrated to improve the accuracy of the analysis. Specifically, the GPT-4 API was used to reset the labels to reflect the actual sentiment of the review texts, and data imbalance issues were addressed by adjusting the data to 6:4 ratio. The reviews on the clothing shopping platform averaged about 12 tokens in length, and to provide the optimal model suitable for this, five BERT-based pre-trained models were used in the modeling stage, focusing on input sequence length and memory usage for performance comparison. The experimental results indicated that an input sequence length of 64 generally exhibited the most appropriate performance and memory usage. In particular, the KcELECTRA model showed optimal performance and memory usage at an input sequence length of 64, achieving higher than 92% accuracy and reliability in sentiment analysis of Korean review data. Furthermore, by utilizing BERTopic, we provide a Korean review sentiment analysis process that classifies new incoming review data by category and extracts sentiment scores for each category using the final constructed model.

본 연구는 BERT 기반 자연어처리 모델들을 미세 조정하여 한국어 리뷰 데이터를 대상으로 감성 분석을 수행하는 방법을 제안한다. 이 과정에서 입력 시퀀스 길이에 변화를 주어 그 성능을 비교 분석함으로써 입력 시퀀스 길이에 따른 최적의 성능을 탐구하고자 한다. 이를 위해 의류 쇼핑 플랫폼 M사에서 수집한 텍스트 리뷰 데이터를 활용한다. 웹 스크래핑을 통해 리뷰 데이터를 수집하고, 데이터 전처리 단계에서는 긍정 및 부정 만족도 점수 라벨을 재조정하여 분석의 정확성을 높였다. 구체적으로, GPT-4 API를 활용하여 리뷰 텍스트의 실제 감성을 반영한 라벨을 재설정하고, 데이터 불균형 문제를 해결하기 위해 6:4 비율로 데이터를 조정하였다. 의류 쇼핑 플랫폼에 존재하는 리뷰들을 평균적으로 약 12 토큰의 길이를 띄었으며, 이에 적합한 최적의 모델을 제공하기 위해 모델링 단계에서는 BERT기반 사전학습 모델 5가지를 활용하여 입력 시퀀스 길이와 메모리 사용량에 집중하여 성능을 비교하였다. 실험 결과, 입력 시퀀스 길이가 64일 때 대체적으로 가장 적절한 성능 및 메모리 사용량을 나타내는 경향을 띄었다. 특히, KcELECTRA 모델이 입력 시퀀스 길이 64에서 가장 최적의 성능 및 메모리 사용량을 보였으며, 이를 통해 한국어 리뷰 데이터의 감성 분석에서 92%이상의 정확도와 신뢰성을 달성할 수 있었다. 더 나아가, BERTopic을 활용하여 새로 입력되는 리뷰 데이터를 카테고리별로 분류하고, 최종 구축한 모델로 각 카테고리에 대한 감성 점수를 추출하는 한국어 리뷰 감성 분석 프로세스를 제공한다.

Keywords

Acknowledgement

This work was supported by the National Research Foundation of Korea (NRF) funded by Korean Government under Grant RS-2023-00273751.

References

  1. Fernandes, Semila et al., "Measuring the impact of online reviews on consumer purchase decisions-A scale development study," Journal of Retailing and Consumer Services, Vol.68, 2022. https://doi.org/10.1016/j.jretconser.2022.103066
  2. Chen, Tao et al., "The impact of online reviews on consumers' purchasing decisions: Evidence from an eye-tracking study," Frontiers in Psychology, Vol.13, 2022. https://doi.org/10.3389/fpsyg.2022.865702
  3. Kutabish, Saleh, Ana Maria Soares, and Beatriz Casais, "The influence of online ratings and reviews in consumer buying behavior: a systematic literature review," in Proc. of International Conference on Digital Economy, Springer, Cham, pp.113-136, 2023. https://doi.org/10.1007/978-3-031-42788-6_8
  4. Wankhade, Mayur, Annavarapu Chandra Sekhara Rao, and Chaitanya Kulkarni, "A survey on sentiment analysis methods, applications, and challenges," Artificial Intelligence Review, Vol.55, No.7, pp.5731-5780, 2022. https://doi.org/10.1007/s10462-022-10144-1
  5. Devlin, Jacob, et al., "Bert: Pre-training of deep bidirectional transformers for language understanding," arXiv preprint arXiv:1810.04805, 2018. https://doi.org/10.48550/arXiv.1810.04805
  6. Kulkarni, Ajay, Deri Chong, and Feras A. Batarseh. "Foundations of data imbalance and solutions for a data democracy," Data democracy, pp.83-106, Academic Press, 2020. https://doi.org/10.1016/B978-0-12-818366-3.00005-8
  7. Conglong Li, Minjia Zhang, and Yuxiong He, "The stability-efficiency dilemma: Investigating sequence length warmup for training GPT models," Advances in Neural Information Processing Systems, vol.35, pp.26736-26750, 2021. https://doi.org/10.48550/arXiv.2108.06084
  8. Achiam, Josh et al., "Gpt-4 technical report," arXiv preprint arXiv:2303.08774, 2023. https://doi.org/10.48550/arXiv.2303.08774
  9. Hariri, Walid, "Sentiment Analysis of Citations in Scientific Articles Using ChatGPT: Identifying Potential Biases and Conflicts of Interest," arXiv preprint arXiv:2404.01800, 2024. https://doi.org/10.48550/arXiv.2404.01800
  10. Fatouros, Georgios et al., "Transforming sentiment analysis in the financial domain with ChatGPT," Machine Learning with Applications, Vol.14, 2023. https://doi.org/10.1016/j.mlwa.2023.100508
  11. Jin, Jian, Ping Ji, and Chun Kit Kwong, "What makes consumers unsatisfied with your products: Review analysis at a fine-grained level," Engineering Applications of Artificial Intelligence, Vol.47, p.38-48, 2016. https://doi.org/10.1016/j.engappai.2015.05.006
  12. Qi, Jiayin et al., "Mining customer requirements from online reviews: A product improvement perspective," Information & Management, Vol.53, No.8, pp.951-963, 2016. https://doi.org/10.1016/j.im.2016.06.002
  13. Kwark, Young, Jianqing Chen, and Srinivasan Raghunathan, "User-generated content and competing firms' product design," Management Science, Vol.64, No.10, pp.4608-4628, 2018. https://doi.org/10.1287/mnsc.2017.2839
  14. Vaswani, Ashish et al., "Attention is all you need," in Proc. of 31st Conference on neural information processing systems, Vol.30, 2017. https://doi.org/10.48550/arXiv.1706.03762
  15. Liu, Yinhan et al., "Roberta: A robustly optimized bert pretraining approach," arXiv preprint arXiv:1907.11692, 2019. https://doi.org/10.48550/arXiv.1907.11692
  16. Lan, Zhenzhong et al., "Albert: A lite bert for self-supervised learning of language representations," arXiv:1909.11942, 2019. https://doi.org/10.48550/arXiv.1909.11942
  17. Park, Sungjoon et al., "Klue: Korean language understanding evaluation," arXiv:2105.09680, 2021. https://doi.org/10.48550/arXiv.2105.09680
  18. Grootendorst, Maarten, "BERTopic: Neural topic modeling with a class-based TF-IDF procedure," arXiv:2203.05794, 2022. https://doi.org/10.48550/arXiv.2203.05794
  19. McInnes, Leland, John Healy, and James Melville, "Umap: Uniform manifold approximation and projection for dimension reduction," arXiv:1802.03426, 2018. https://doi.org/10.48550/arXiv.1802.03426
  20. McInnes, Leland, John Healy, and Steve Astels, "hdbscan: Hierarchical density based clustering," J. Open Source Softw., Vol.2, No.11, 2017. http://dx.doi.org/10.21105/joss.00205
  21. Khan, Atif et al., "Movie Review Summarization Using Supervised Learning and Graph-Based Ranking Algorithm," Computational intelligence and neuroscience, Vol.2020, No.1, 2020. https://doi.org/10.1155/2020/7526580
  22. So, Jin-Soo, and Pan-Seop Shin, "Rating prediction by evaluation item through sentiment analysis of restaurant review," Journal of the Korea Society of Computer and Information, Vol.25, No.6, pp.81-89, 2020. https://doi.org/10.9708/jksci.2020.25.06.081
  23. Pontiki, Maria et al., "Semeval-2016 task 5: Aspect based sentiment analysis," in Proc. of Workshop on Semantic Evaluation (SemEval-2016), Association for Computational Linguistics, 2016. https://doi.org/10.18653/v1/S16-1002