DOI QR코드

DOI QR Code

ChatGPT의 수학적 성능 분석: 국가수준 학업성취도 평가 및 대학수학능력시험 수학 문제 풀이를 중심으로

Analyzing Mathematical Performances of ChatGPT: Focusing on the Solution of National Assessment of Educational Achievement and the College Scholastic Ability Test

  • 권오남 (서울대학교) ;
  • 오세준 (이화여자대학교사범대학부속이화.금란고등학교) ;
  • 윤정은 (인천효성고등학교) ;
  • 이경원 (단국대학교사범대학부속중학교) ;
  • 신병철 (수원외국어고등학교) ;
  • 정원 (서울대학교 대학원)
  • 투고 : 2023.05.22
  • 심사 : 2023.06.23
  • 발행 : 2023.06.30

초록

이 연구는 수학교육에서의 ChatGPT의 활용 방안 도출을 위한 기초 연구로서 국가수준 학업성취도 평가 및 대학수학능력시험 문제에 대한 ChatGPT의 응답을 분석하였다. ChatGPT는 생성형 인공지능 모델로서 여러 분야에서 주목 받고 있으며, 교육계에서도 ChatGPT 활용 방안에 대한 요구의 목소리가 높아지고 있다. 이에 이 연구에서는 3개년 국가수준 학업성취도 평가 및 대학수학능력시험 문제에 대한 ChatGPT 3.5의 응답에 대해서 정답률, 풀이 과정의 정확도, 오류 유형을 분류하여 분석하였다. ChatGPT의 국가수준 학업성취도 평가 문제 및 대학수학능력시험 문제의 정답률은 각각 37.1%, 15.97%로 나타났다. ChatGPT의 풀이 과정의 정확도는 5점 만점으로 산출하였을 때, 국가수준 학업성취도 평가는 3.44점, 대학수학능력시험은 2.49점으로 산출되었다. ChatGPT의 수학 문제를 풀이하는 데 나타나는 오류 유형은 절차적 오류와 기능적 오류로 나뉘었다. 절차적 오류는 다음 단계로의 식을 연결 짓는 과정이나 계산상의 오류를 가리키며, 기능적 오류는 ChatGPT가 텍스트를 인식, 판단, 출력하는 과정에서 발생하는 오류였다. 이러한 분석은 정답률만이 ChatGPT의 수학적 성능을 판단하는 기준이 되어서는 안 되며, 풀이 과정의 정확도나 오류유형까지도 복합적으로 고려해야 함을 시사한다.

This study conducted foundational research to derive ways to use ChatGPT in mathematics education by analyzing ChatGPT's responses to questions from the National Assessment of Educational Achievement (NAEA) and the College Scholastic Ability Test (CSAT). ChatGPT, a generative artificial intelligence model, has gained attention in various fields, and there is a growing demand for its use in education as the number of users rapidly increases. To the best of our knowledge, there are very few reported cases of educational studies utilizing ChatGPT. In this study, we analyzed ChatGPT 3.5 responses to questions from the three-year National Assessment of Educational Achievement and the College Scholastic Ability Test, categorizing them based on the percentage of correct answers, the accuracy of the solution process, and types of errors. The correct answer rates for ChatGPT in the National Assessment of Educational Achievement and the College Scholastic Ability Test questions were 37.1% and 15.97%, respectively. The accuracy of ChatGPT's solution process was calculated as 3.44 for the National Assessment of Educational Achievement and 2.49 for the College Scholastic Ability Test. Errors in solving math problems with ChatGPT were classified into procedural and functional errors. Procedural errors referred to mistakes in connecting expressions to the next step or in calculations, while functional errors were related to how ChatGPT recognized, judged, and outputted text. This analysis suggests that relying solely on the percentage of correct answers should not be the criterion for assessing ChatGPT's mathematical performance, but rather a combination of the accuracy of the solution process and types of errors should be considered.

키워드

참고문헌

  1. Kang, D. H. (2023). The advent of ChatGPT and the response of Korean language education. Korean Language and Literature, 82, 469-496.
  2. Ku, J., Park, J. H., Lee, K. S., & Park, S. (2019). Analysis of the 2018 National Assessment of Educational Achievement Results: Mathematics. Korea Institute for Curriculum and Evaluation. Research Report ORM 2019-45-3.
  3. Kim, K. S. (2023, March 17). Ministry of Education is going to construct 'Guidelines for Utilizing ChatGPT' for policy research commissioned by Ewha Womans University. NEWIS. Retrieved from https://www.newsis.com/view/?id=NISX20230316_0002230010.
  4. Kim, S., & Choi, M. K. (2022). AI-Based educational platform analysis supporting personalized mathematics learning. Communication of Mathematics Education, 36(3), 417-438.
  5. Park, H. Y., Son, B. E., & Ko, H. K. (2022). Study on the mathematics teaching and learning artificial intelligence platform analysis. Communication of Mathematics Education, 36(1), 1-21.
  6. Shin, D. K., Jung, H. K., & Lee, Y. S. (2023) Exploring the potential of using ChatGPT as a content-based English learning and teaching tool. Journal of the Korea English Education Society, 22(1), 171-192.
  7. Lee, J. B., Park, J. H., & Son, Y. R. (2020a). Analysis of the 2019 National Assessment of Educational Achievement Results: Middle school mathematics. Korea Institute for Curriculum and Evaluation. Research Report ORM 2020-23-3.
  8. Lee, J. B., Lee, K. S., & Son, Y. R. (2020b). Analysis of 2019 National Assessment of Educational Achievement Results: High school mathematics. Korea Institute for Curriculum and Evaluation. Research Report ORM 2020-23-7.
  9. Lee, J. B., Park, J. H., & Yoo, H. W. (2021a). Analysis of 2020 National Assessment of Educational Achievement Results: Middle school mathematics. Korea Institute for Curriculum and Evaluation. Research Report ORM 2021-51.
  10. Lee, J. B., Jung, H. Y., & Yoo, H. W. (2021b). Analysis of 2020 National Assessment of Educational Achievement Results: High school mathematics. Korea Institute for Curriculum and Evaluation. Research Report ORM 2021-55.
  11. Chang, S. M. (2023) ChatGPT has changed the future of writing education- Focusing on the response of writing education in the era of artificial intelligence -. Writing Research, 56, 7-34
  12. Jung, J. Y., Cho, H. M., Hwang, J. W., Moon, M. H., & Kim, I. J. (2023). ChatGPT educational revolution. Porche.
  13. Jung, H. J. (2023, February 6). ChatGPT passed the US medical exam, and the mathematics score on the Korean university scholastic ability test was '9th grade'. Korea Economy TV. Retrieved from https://www.wowtv.co.kr/NewsCenter/News/Read?articleId=A202302060139
  14. Korea Institute for Curriculum and Evaluation (2023). Guide to studying for the 2024 College Scholastic Ability Test. Research Report CAT 2023-2-1.
  15. Azaria, A. (2022). ChatGPT usage and limitations. HAL Open Science. hal-03913837.
  16. Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
  17. Frieder, S., Pinchetti, L., Griffiths, R. R., Salvatori, T., Lukasiewicz, T., Petersen, P. C., ... & Berner, J. (2023). Mathematical capabilities of ChatGPT. arXiv preprint arXiv:2301.13867.
  18. Hinton, G. E., Osindero, S., & Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18(7), 1527-1554.
  19. Kung, T. H., Cheatham, M., Medenilla, A., Sillos, C., De Leon, L., Elepano, C., ... & Tseng, V. (2023). Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLoS Digital Health, 2(2), e0000198.
  20. OpenAI. (2023). GPT-4 technical report. arXiv:2303.08774 [cs.CL]
  21. Stake, R. E. (1995). The art of case study research. Sage publications.
  22. Strauss, A., & Corbin, J. (1990). Basics of qualitative research. Sage publications.
  23. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30.
  24. Weaver, W. (1955). Machine translation of languages. In W. N. Locke & A. D. Booth (Eds.), MIT Press. (Reprint, Original work published 1949)