DOI QR코드

DOI QR Code

한국어 반어 표현 탐지기

Korean Ironic Expression Detector

  • 방승주 (충남대학교 전파정보통신공학과 ) ;
  • 박요한 (충남대학교 전파정보통신공학과 ) ;
  • 김지은 (한국외국어대학교 ELLT학과 ) ;
  • 이공주 (충남대학교 전파정보통신공학과 )
  • 투고 : 2024.01.24
  • 심사 : 2024.02.29
  • 발행 : 2024.03.31

초록

자연어 처리 분야에서 반어 및 비꼼 탐지의 중요성이 커지고 있음에도 불구하고, 한국어에 관한 연구는 다른 언어들에 비해 상대적으로 많이 부족한 편이다. 본 연구는 한국어 텍스트에서의 반어 탐지를 위해 다양한 모델을 실험하는 것을 목적으로 한다. 본 연구는 BERT기반 모델인 KoBERT와 ChatGPT를 사용하여 반어 탐지 실험을 수행하였다. KoBERT의 경우, 감성 데이터를 추가 학습하는 두 가지 방법(전이 학습, 멀티태스크 학습)을 적용하였다. 또한 ChatGPT의 경우, Few-Shot Learning기법을 적용하여 프롬프트에 입력되는 예시 문장의 개수를 증가시켜 실험하였다. 실험을 수행한 결과, 감성 데이터를 추가학습한 전이 학습 모델과 멀티태스크 학습 모델이 감성 데이터를 추가 학습하지 않은 기본 모델보다 우수한 성능을 보였다. 한편, ChatGPT는 KoBERT에 비해 현저히 낮은 성능을 나타내었으며, 입력 예시 문장의 개수를 증가시켜도 뚜렷한 성능 향상이 이루어지지 않았다. 종합적으로, 본 연구는 KoBERT를 기반으로 한 모델이 ChatGPT보다 반어 탐지에 더 적합하다는 결론을 도출했으며, 감성 데이터의 추가학습이 반어 탐지 성능 향상에 기여할 수 있는 가능성을 제시하였다.

Despite the increasing importance of irony and sarcasm detection in the field of natural language processing, research on the Korean language is relatively scarce compared to other languages. This study aims to experiment with various models for irony detection in Korean text. The study conducted irony detection experiments using KoBERT, a BERT-based model, and ChatGPT. For KoBERT, two methods of additional training on sentiment data were applied (Transfer Learning and MultiTask Learning). Additionally, for ChatGPT, the Few-Shot Learning technique was applied by increasing the number of example sentences entered as prompts. The results of the experiments showed that the Transfer Learning and MultiTask Learning models, which were trained with additional sentiment data, outperformed the baseline model without additional sentiment data. On the other hand, ChatGPT exhibited significantly lower performance compared to KoBERT, and increasing the number of example sentences did not lead to a noticeable improvement in performance. In conclusion, this study suggests that a model based on KoBERT is more suitable for irony detection than ChatGPT, and it highlights the potential contribution of additional training on sentiment data to improve irony detection performance.

키워드

과제정보

이 성과는 정부(과학기술정보통신부)의 재원으로 한국연구재단의 지원을 받아 수행된 연구임(No. RS-2023-00241142).

참고문헌

  1. H. Kil, "How to realize rhetorical irony in Korean," Studies in Humanities, Vol.13, pp.1-35, 2005. 
  2. C. Turban and U. Kruschwitz, "Tackling irony detection using ensemble classifiers," Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022. 
  3. J. Sarzynska-Wawer et al., "Detecting formal thought disorder by deep contextualized word representations," Psychiatry Research, Vol.304, pp.114135, 2021. 
  4. J. Devlin, M. Chang, K. Lee, and K Toutanova, "Bert: Pretraining of deep bidirectional transformers for language understanding," arXiv preprint arXiv:1810.04805. 2018. 
  5. A. Radford and K. Narasimhan, "Improving Language Understanding by Generative Pre-Training," 2018. 
  6. A. Vaswani et al., "Attention is all you need," Advances in Neural Information Processing Systems, Vol.30, 2017. 
  7. M. Kowsher, A. A. Sami, N. J. Prottasha, M. S. Arefin, P. K. Dhar, and T. Koshoiba, "Bangla-BERT: transformer-based efficient model for transfer learning and language understanding," IEEE Access, Vol.10, pp.91855-91870, 2022.  https://doi.org/10.1109/ACCESS.2022.3197662
  8. A. Arnold, R. Nallapati, and W. W. Cohen, "A comparative study of methods for transductive transfer learning," Seventh IEEE international conference on data mining workshops (ICDMW 2007). IEEE, 2007. 
  9. O. Habimana, Y. Li, R. Li, X. Gu, and Y. Peng, "A multi-task learning approach to improve sentiment analysis with explicit recommendation," 2020 International Joint Conference on Neural Networks (IJCNN). IEEE, 2020. 
  10. T. B. Brown et al., "Language models are few-shot learners," Advances in Neural Information Processing Systems, Vol.33, pp.1877-1901, 2020. 
  11. L. Ouyang et al., "Training language models to follow instructions with human feedback," Advances in Neural Information Processing Systems, Vol.35, pp.27730-27744, 2022. 
  12. L. Loukas, I. Stogiannidis, P. Malakasiotis, and S. Vassos, "Breaking the bank with chatgpt: Few-shot text classification for finance," arXiv preprint arXiv:2308.14634, 2023. 
  13. A. Baruah, K. Das, F. Barbhuiya, and K. Dey, "Contextaware sarcasm detection using bert," Proceedings of the Second Workshop on Figurative Language Processing, 2020. 
  14. P. Golazizian, B. Sabeti, S. A. A. Asli, Z. Majdabadi, O. Momenzadeh, and R. Fahmi, "Irony detection in Persian language: A transfer learning approach using emoji prediction," Proceedings of the Twelfth Language Resources and Evaluation Conference, 2020. 
  15. M. Kosterin, I. Paramonov, and N. Lagutina, "Automatic Irony and Sarcasm Detection in Russian Sentences: Baseline Methods," 2023 33rd Conference of Open Innovations Association (FRUCT). IEEE, 2023. 
  16. Y. Kuratov and M. Arkhipov, "Adaptation of deep bidirectional multilingual transformers for Russian language," arXiv preprint arXiv:1905.07213, 2019. 
  17. A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, and C. Potts, "Learning word vectors for sentiment analysis," Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 2011. 
  18. K. J. Lee, S. Bang, and J. E. Kim, "Korean irony corpus construction," Language and Information, Vol.27, No.1, pp.19-36, 2023.  https://doi.org/10.29403/LI27.1.2
  19. I. Loshchilov and F. Hutter, "Decoupled weight decay regularization," arXiv preprint arXiv:1711.05101, 2017.