DOI QR코드

DOI QR Code

Analyzing Effective Poll Prediction Model Using Social Media (SNS) Data Augmentation

소셜 미디어(SNS) 데이터 증강을 활용한 효과적인 여론조사 예측 모델 분석

  • Hwang, Sunik (Datascience, Sungkyunkwan University) ;
  • Oh, Hayoung (College of Computing and Informatics, Sungkyunkwan University)
  • Received : 2022.11.09
  • Accepted : 2022.12.01
  • Published : 2022.12.31

Abstract

During the election period, many polling agencies survey and distribute the approval ratings for each candidate. In the past, public opinion was expressed through the Internet, mobile SNS, or community, although in the past, people had no choice but to survey the approval rating by relying on opinion polls. Therefore, if the public opinion expressed on the Internet is understood through natural language analysis, it is possible to determine the candidate's approval rate as accurately as the result of the opinion poll. Therefore, this paper proposes a method of inferring the approval rate of candidates during the election period by synthesizing the political comments of users through internet community posting data. In order to analyze the approval rate in the post, I would like to suggest a method for generating the model that has the highest correlation with the actual opinion poll by using the KoBert, KcBert, and KoELECTRA models.

선거기간이 되면 많은 여론조사 기관에서 후보자별 지지율을 조사하여 배포한다. 과거에는 여론조사 기관에 의존하여 지지율을 조사할 수밖에 없었지만, 현대 사회에서는 인터넷이나 모바일 SNS나 커뮤니티를 통해 국민 여론이 표출된다. 따라서 인터넷상에 표출된 국민 여론을 자연어 분석을 통해서 파악하면 여론조사 결과만큼 정확한 후보자 지지율을 파악할 수 있다. 따라서 본 논문은 인터넷 커뮤니티 게시글 데이터를 통해 유저들의 정치 관련 언급을 종합하여 선거기간 후보자의 지지율을 추론하는 방법을 제시한다. 게시글에서 지지율을 분석하기 위해 KoBert, KcBert, KoELECTRA모델을 활용하여 실제 여론조사와 가장 상관관계가 높은 모델 생성 방법을 제시하고자 한다.

Keywords

Acknowledgement

This work was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MSIT). (No. NRF-2022R1F1A1074696)

References

  1. J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, "Bert: Pre-training of deep bidirectional tranformers for language understanding," in Proceedings of NAACL-HLT 2019, Minneapolis: MN, pp. 4171-4186, 2018.
  2. H. Shin, M. Kim, Y. M. Jo, H. Jang, and A. Cattle, "Annotation Scheme for Constructing Sentiment Corpus in Korean," in Proceedings of the 26th Pacific Asia Conference on Language, Information, and Computation, Bali, Indosesia, pp. 181-190, 2012.
  3. S. A. Lee and H. P. Shin, "A Method of Infusing Additional Features into Pre-Trained BERT Models for Sentiment Analysis," in Proceedings of the 2020 Korea Software Symposium, Online, pp. 275-277, 2020.
  4. SKTBrain, developed Kobert source code Guide [Internet]. Available: https://github.com/SKTBrain/KoBERT.
  5. J. B. Lee, "KcBERT: Korean comments BERT," in Proceedings of the 32nd Korean and Korean Information Processing Conference, Online, pp. 437-440, 2020.
  6. J. W. Park, developed KoELECTRA source code Guide [Internet]. Available: https://github.com/monologg/KoELECTRA.
  7. J. S. Bae, C. G. Lee, J. H. Lim, and H. K. Kim. "BERT-based Data Augmentation Techniques for Korean Semantic Role Labeling," in Proceedings of the Korean Computer Science and Technology Conference, Online, pp. 335-337, 2020.
  8. NamuWiki, Korean Opinion poll overview [Internet]. Available: https://namu.wiki/w/%EC%97%AC%EB%A1%A0%EC%A1%B0%EC%82%AC.