DOI QR코드

DOI QR Code

한국어 학습 모델별 한국어 쓰기 답안지 점수 구간 예측 성능 비교

Comparison of Korean Classification Models' Korean Essay Score Range Prediction Performance

  • 조희련 (중앙대학교 인문콘텐츠연구소) ;
  • 임현열 (중앙대학교 다빈치교양대학) ;
  • 이유미 (중앙대학교 인문콘텐츠연구소) ;
  • 차준우 (중앙대학교 한국어교육원)
  • 투고 : 2021.07.02
  • 심사 : 2021.08.26
  • 발행 : 2022.03.31

초록

우리는 유학생이 작성한 한국어 쓰기 답안지의 점수 구간을 예측하는 문제에서 세 개의 딥러닝 기반 한국어 언어모델의 예측 성능을 조사한다. 이를 위해 총 304편의 답안지로 구성된 실험 데이터 세트를 구축하였는데, 답안지의 주제는 직업 선택의 기준('직업'), 행복한 삶의 조건('행복'), 돈과 행복('경제'), 성공의 정의('성공')로 다양하다. 이들 답안지는 네 개의 점수 구간으로 구분되어 평어 레이블(A, B, C, D)이 매겨졌고, 총 11건의 점수 구간 예측 실험이 시행되었다. 구체적으로는 5개의 '직업' 답안지 점수 구간(평어) 예측 실험, 5개의 '행복' 답안지 점수 구간 예측 실험, 1개의 혼합 답안지 점수 구간 예측 실험이 시행되었다. 이들 실험에서 세 개의 딥러닝 기반 한국어 언어모델(KoBERT, KcBERT, KR-BERT)이 다양한 훈련 데이터로 미세조정되었다. 또 두 개의 전통적인 확률적 기계학습 분류기(나이브 베이즈와 로지스틱 회귀)도 그 성능이 분석되었다. 실험 결과 딥러닝 기반 한국어 언어모델이 전통적인 기계학습 분류기보다 우수한 성능을 보였으며, 특히 KR-BERT는 전반적인 평균 예측 정확도가 55.83%로 가장 우수한 성능을 보였다. 그 다음은 KcBERT(55.77%)였고 KoBERT(54.91%)가 뒤를 이었다. 나이브 베이즈와 로지스틱 회귀 분류기의 성능은 각각 52.52%와 50.28%였다. 학습된 분류기 모두 훈련 데이터의 부족과 데이터 분포의 불균형 때문에 예측 성능이 별로 높지 않았고, 분류기의 어휘가 글쓰기 답안지의 오류를 제대로 포착하지 못하는 한계가 있었다. 이 두 가지 한계를 극복하면 분류기의 성능이 향상될 것으로 보인다.

We investigate the performance of deep learning-based Korean language models on a task of predicting the score range of Korean essays written by foreign students. We construct a data set containing a total of 304 essays, which include essays discussing the criteria for choosing a job ('job'), conditions of a happy life ('happ'), relationship between money and happiness ('econ'), and definition of success ('succ'). These essays were labeled according to four letter grades (A, B, C, and D), and a total of eleven essay score range prediction experiments were conducted (i.e., five for predicting the score range of 'job' essays, five for predicting the score range of 'happiness' essays, and one for predicting the score range of mixed topic essays). Three deep learning-based Korean language models, KoBERT, KcBERT, and KR-BERT, were fine-tuned using various training data. Moreover, two traditional probabilistic machine learning classifiers, naive Bayes and logistic regression, were also evaluated. Experiment results show that deep learning-based Korean language models performed better than the two traditional classifiers, with KR-BERT performing the best with 55.83% overall average prediction accuracy. A close second was KcBERT (55.77%) followed by KoBERT (54.91%). The performances of naive Bayes and logistic regression classifiers were 52.52% and 50.28% respectively. Due to the scarcity of training data and the imbalance in class distribution, the overall prediction performance was not high for all classifiers. Moreover, the classifiers' vocabulary did not explicitly capture the error features that were helpful in correctly grading the Korean essay. By overcoming these two limitations, we expect the score range prediction performance to improve.

키워드

과제정보

이 논문은 2017년 대한민국 교육부와 한국연구재단의 지원을 받아 수행된 연구임(NRF-2017S1A6A3A01078538).

참고문헌

  1. H. Cho, H. Im, J. Cha, and Y. Yi, "Comparison of automatic score range prediction of Korean essays using KoBERT, Naive Bayes & Logistic Regression," in Proceedings of the KIPS Spring Conference 2021, Vol.28, No.1, pp.501-504, 2021.
  2. S. Yoo and K. Yang, "The status of Korean as an international language," Hallyu Now: Global Hallyu Issue Magazine, Vol.34, pp.9-16, 2020.
  3. H. J. Park and W. S. Kang, "Design and implementation of a subjective-type evaluation system using natural language processing technique," The Journal of Korean Association of Computer Education, Vol.6, No.3, pp.207-216, 2003.
  4. I.-N. Park, S.-S. Kang, E.-H. Noh, M.-H. Kim, and T.-J. Seong, "Automatic scoring of Korean short answers by answer template description," Journal of KIISE: Computing Practices and Letters, Vol.19, No.12, pp.630-636, 2013.
  5. S. S. Kang and E. S. Jang, "Automatic scoring system for Korean short answers by student answer analysis and answer template construction," KIISE Transactions on Computing Practices, Vol.22, No.5, pp.218-224, 2016. https://doi.org/10.5626/KTCP.2016.22.5.218
  6. J. Heo and S.-Y. Park, "Design and implementation of an automatic scoring model using a voting method for descriptive answers," Journal of the Korea Society of Computer and Information, Vol.18, No.8, pp.17-25, 2013. https://doi.org/10.9708/JKSCI.2013.18.8.017
  7. M.-A. Cheon, H.-W. Seo, J.-H. Kim, E.-H. Noh, K.-H. Sung, and E. Young Lim, "Semi-automatic scoring for short Korean free-text responses using semi-supervised learning," Korean Journal of Cognitive Science, Vol.26, No.2, pp.147-165, 2015. https://doi.org/10.19066/cogsci.2015.26.2.002
  8. M.-A. Cheon, C.-H. Kim, J.-H. Kim, E.-H. Noh, K.-H. Sung, and M.-Y. Song, "Automated scoring system for Korean short-answer questions using predictability and unanimity," KIPS Transactions on Software and Data Engineering, Vol.5, No.11, pp.527-534, 2016. https://doi.org/10.3745/KTSDE.2016.5.11.527
  9. J.-Y. Choi and H.-S. Lim, "E-commerce data based Sentiment analysis model implementation using natural language processing model," Journal of the Korea Convergence Society, Vo.11, No.11, pp.33-39, 2020. https://doi.org/10.15207/JKCS.2020.11.11.033
  10. S. Hwang and D. Kim, "BERT-based classification model for Korean documents," The Journal of Society for e-Business Studies, Vol.25, No.1, pp.203-214, 2020. https://doi.org/10.7838/JSEBS.2020.25.1.203
  11. T.-H. Kim, D.-B. Cho, H.-Y. Lee, H.-J. Won, and S.-S. Kang, "Sentiment analysis system by using BERT language model," in Proceedings of the KIPS Spring Conference 2020, Vol.27, No.2, pp.975-977, 2020.
  12. S. Park, H. Yang, M. Choe, M. Ha, K. Chung, and M. Koo, "Sentimental analysis of YouTube Korean comments using KoBERT," in Proceedings of Korea Software Congress 2020, pp.1385-1387, 2020.
  13. Y.-J. Lee and H.-J. Choi, "Joint Learning-based KoBERT for emotion recognition in Korean," in Proceedings of Korea Software Congress 2020, pp.568-570, 2020.
  14. K. H. Park and Y.-S. Jeong, "Korean daily conversation topics classification using KoBERT," in Proceedings of Korea Software Congress 2021, pp.1735-1737, 2021.
  15. A.-G. Kim and Y.-S. Jeong, "Topic classification of domestic music using KoBERT," in Proceedings of Korea Software Congress 2021, pp.1738-1740, 2021.
  16. H. Cho, Y. Yi, H. Im, J. Cha, and C. Lee, "Automatic score range classification of Korean essays using deep learning-based Korean language models -The case of KoBERT & KoGPT2-," Journal of the International Network for Korean Language and Culture, Vol.18, No.1, pp.217-241, 2021. https://doi.org/10.15652/ink.2021.18.1.217
  17. J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "BERT: Pre-training of deep bidirectional transformers for language understanding," in Proceedings of the 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp.4171-4186, 2019.
  18. J. Lee, "KcBERT: Korean comments BERT," in Proceedings of the 32nd Annual Conference on Human and Cognitive Language Technology, pp.437-440, 2020.
  19. S. Lee, H. Jang, Y. Baik, S. Park, and H. Shin, "KR-BERT: A small-scale Korean-specific language model," ArXiv, 2020. [Internet], https://arxiv.org/abs/2008.03979.
  20. M. Banko and E. Brill, "Scaling to very very large corpora for natural language disambiguation," in Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, pp.26-33, 2001.