DOI QR코드

DOI QR Code

Automatic scoring of mathematics descriptive assessment using random forest algorithm

랜덤 포레스트 알고리즘을 활용한 수학 서술형 자동 채점

  • Received : 2024.01.24
  • Accepted : 2024.04.18
  • Published : 2024.05.31

Abstract

Despite the growing attention on artificial intelligence-based automated scoring technology as a support method for the introduction of descriptive items in school environments and large-scale assessments, there is a noticeable lack of foundational research in mathematics compared to other subjects. This study developed an automated scoring model for two descriptive items in first-year middle school mathematics using the Random Forest algorithm, evaluated its performance, and explored ways to enhance this performance. The accuracy of the final models for the two items was found to be between 0.95 to 1.00 and 0.73 to 0.89, respectively, which is relatively high compared to automated scoring models in other subjects. We discovered that the strategic selection of the number of evaluation categories, taking into account the amount of data, is crucial for the effective development and performance of automated scoring models. Additionally, text preprocessing by mathematics education experts proved effective in improving both the performance and interpretability of the automated scoring model. Selecting a vectorization method that matches the characteristics of the items and data was identified as one way to enhance model performance. Furthermore, we confirmed that oversampling is a useful method to supplement performance in situations where practical limitations hinder balanced data collection. To enhance educational utility, further research is needed on how to utilize feature importance derived from the Random Forest-based automated scoring model to generate useful information for teaching and learning, such as feedback. This study is significant as foundational research in the field of mathematics descriptive automatic scoring, and there is a need for various subsequent studies through close collaboration between AI experts and math education experts.

학교 현장과 대규모 평가에서 서술형 문항 도입을 지원하기 위한 방안 중 하나로 인공지능 기반의 자동 채점 기술이 주목받고 있음에도 불구하고, 수학 교과에서는 타 교과에 비해 이에 대한 기초 연구가 부족한 상황이다. 이에 본 연구는 중학교 1학년 수학 서술형 문항 두 개를 대상으로 랜덤 포레스트 알고리즘을 활용하여 자동 채점 모델을 개발하고 그 성능을 평가하였다. 연구 결과, 두 문항에 대한 최종 모델의 평가요소별 정확도는 각각 0.95-1.00, 0.73-0.89의 범위로 나타났으며, 이는 타 교과에 비해 상대적으로 높은 수준이다. 데이터의 양을 고려한 평가 범주 설정의 중요성을 확인하였으며, 수학 교육전문가에 의한 텍스트 전처리와 데이터 특성에 맞는 벡터화 방법의 선택이 모델의 성능 및 해석 가능성을 향상시키는 데 기여하였다. 또한, 현실적 한계로 인해 균형적인 데이터 수집이 어려운 상황에서 오버샘플링이 성능을 보완하는 유용한 방법임을 확인하였다. 교육적 활용도를 높이기 위해, 랜덤 포레스트 기반 모델에서 도출된 특성 중요도를 활용하여 피드백과 같이 교수-학습에 유용한 정보를 생성하는 추가 연구가 필요하다. 본 연구는 수학 서술형 자동 채점에 관한 기초 연구로서 의미가 있으며, 인공지능 전문가와 수학교육 전문가 간의 긴밀한 협력을 통해 다양한 후속 연구가 진행될 필요가 있다.

Keywords

References

  1. Ahn, D., & Lee, K. H. (2022). Analysis of achievement predictive factors and predictive AI model development-Focused on blended math classes. The Mathematical Education, 61(2), 257-271. http://doi.org/10.7468/mathedu.2022.61.2.257
  2. Ahuja, R., Chug, A., Kohli, S., Gupta, S., & Ahuja, P. (2019). The impact of features extraction on the sentiment analysis. Procedia Computer Science, 152, 341-348. https://doi.org/10.1016/j.procs.2019.05.008
  3. Anyfantis, D., Karagiannopoulos, M., Kotsiantis, S., & Pintelas, P. (2007). Robustness of learning techniques in handling class noise in imbalanced datasets. In C. Boukis, A. Pnevmatikakis, & L. Polymenakos (Eds.), Artificial Intelligence and Innovations 2007: from Theory to Applications: Proceedings of the 4th IFIP International Conference on Artificial Intelligence Applications and Innovations (AIAI 2007) (Vol. 4, pp. 21-28). Springer. https://doi.org/10.1007/978-0-387-74161-1_3
  4. Breiman, L. (2001). Random forests. Machine Learning, 45, 5-32. https://doi.org/10.1023/A:1010933404324
  5. Breslow, L. A., & Aha, D. W. (1997). Simplifying decision trees: A survey. The Knowledge Engineering Review, 12(1), 1-40. https://doi.org/10.1017/S0269888997000015
  6. Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321-357. https://doi.org/10.1613/jair.953
  7. Chung, G., & O'Neill, H. (1997). Methodological approaches to online scoring of essays (Technical Report 461). National Center for Research on Evaluation, Student Standards, and Testing. https://cresst.org/publications/cresst-publication-2833/
  8. Dikli, S. (2006). An overview of automated scoring of essays. The Journal of Technology, Learning and Assessment, 5(1). https://ejournals.bc.edu/index.php/jtla/article/view/1640
  9. Ha, M., Lee, K., Shin, S., Lee, J., Choi, S., Joo, J., Kim, N., Lee, H., Lee, J. H., Lee, J. R., Cho, Y., Kang, K., & Park, J. (2019). Assessment as a learning tool and utilization of artificial intelligence: WA3I project case. School Science Journal, 13(3), 271-282. http://doi.org/10.15737/ssj.13.3.201908.271
  10. Han, K. M., & Choi-Koh, S. S. (2014). An analysis of the mathematical errors on the items of the descriptive assessment in the equation of a circle. The Mathematical Education, 53(4), 509-524. https://doi.org/10.7468/mathedu.2014.53.4.509
  11. He, H., & Ma, Y. (2013). Imbalanced learning: Foundations, algorithms, and applications. Wiley-IEEE Press. 
  12. Jang, J. (2021). A study on automated english essay scoring using machine learning [Master's thesis, Seoul National University]. https://s-space.snu.ac.kr/handle/10371/176658
  13. Kang, O., Kwon, E., Hwang, H., Jeon, D., Noh, J., Woo, H., Yoon, S., Lee, H., Ryu, S., Yoon, H., Hong, C., & Jung, K. (2018). Middle school mathematics 1. Doosan Dong-A.
  14. Kil, H. H. (2018). The study of Korean stopwords list for text mining. Urimalgeul: The Korean Language and Literature (Urimalgeul), 78, 1-25. http://doi.org/10.18628/urimal.78..201809.1
  15. Kim, H. K., Kye, B. K., Lee, J. Y., Lim, W. C., Choi, I., & Lee, J. (2018). A study of mathematics education with intelligence information technology (Research Report BD18010003). Korea Foundation for the Advancement of Science and Creativity.
  16. Kim, H., & Oh, Y. (2014). The effect of essay writing-centered mathematics teaching on problem solving and mathematical disposition. Communications of Mathematical Education, 28(1), 131-154. https://doi.org/10.7468/jksmee.2014.28.1.131
  17. Kim, N., & Bae, J. (2006). Effect on mathematical inclination of elementary school students using the description style assessment. Journal of Elementary Mathematics Education in Korea, 10(2), 195-219.
  18. Kim, R. Y., & Lee, M. H. (2013). Middle school mathematics teachers' perceptions of constructed-response assessments. Journal of Educational Research in Mathematics, 23(4), 533-551.
  19. Lee, M. (2023, November 20). Kiwi: Korean intelligent word identifier. https://github.com/bab2min/Kiwi.
  20. Lee, M., & Ryu, S. (2020). Automated scoring of scientific argumentation using expert morpheme classification approaches. Journal of The Korean Association For Science Education, 40(3), 321-336. https://doi.org/10.14697/jkase.2020.40.3.321
  21. Lee, M., & Ryu, S. (2021). Automated scoring of argumentation levels and analysis of argumentation patterns using machine learning. Journal of The Korean Association For Science Education, 41(3), 203-220. https://doi.org/10.14697/jkase.2021.41.3.203
  22. Lee, S., Kim, G., Noh, S., Kim, M. K., & Kim, R. Y. (2014). Mathematics teachers' perceptions about and implementation of constructed-response assessment. Journal of the Korean School Mathematics Society, 17(2), 275-290.
  23. Lee, Y., & Park, K. (2022). Exploring a way to build an AI automated essay scoring model with insufficient data. Journal of Education & Culture, 28(5), 25-42. http://doi.org/10.24159/joec.2022.28.5.25
  24. Lee, Y., Shin, D., & Kim, H. (2022). Exploring the feasibility in applying an automated essay scoring to a writing test of Korean language. Bilingual Research, 86, 171-191. http://doi.org/10.17296/korbil.2022..86.171
  25. Ministry of Education (2022). Mathematics curriculum. Ministry of Education Notice 2022-33 [supplement 8].
  26. Ministry of Education (2023, October 10). The 2028 college entrance examination system reform proposal for preparing for the future society. Press release of MOE. https://www.moe.go.kr/boardCnts/viewRenew.do?boardID=294&boardSeq=96578&lev=0
  27. Na, G. S., Park, M., Park, Y., & Lee, H. C. (2018). A study on mathematical descriptive evaluation: Focusing on examining the recognition of mathematics teachers and searching for supporting way. School Mathematics, 20(4), 635-659. http://doi.org/10.29275/sm.2018.12.20.4.635
  28. National Council of Teachers of Mathematics (1995). Assessment standards for school mathematics. National Council of Teachers of Mathematics.
  29. National Council of Teachers of Mathematics (2000). Principles and standards for school mathematics. National Council of Teachers of Mathematics.
  30. Noh, E., Song, M., Park, J., Kim, Y., & Lee, D. (2016). Advanced refinements and application of automated scoring system for Korean large-scale assessment (Research Report RRE 2016-11). Korea Institute Of Curriculum and Evaluation.
  31. Noh, S. S., Kim, M. K., Cho, S. M., Jeong, Y. S., & Jeong, Y. A. (2008). A study of teachers' perception and status about descriptive evaluation in secondary school mathematics. Journal of Korean School Mathematics Society, 11(3), 337-397.
  32. Oh, S. J. (2023). Analysis of the impact of mathematics education research using explainable AI. The Mathematical Education, 62(3), 435-455. https://doi.org/10.7468/mathedu.2023.62.3.435
  33. Oh, S. J., & Kwon, O. N. (2023). Development of an impact identification program in mathematical education research using machine learning and network. Communication of Mathematical Education, 37(1), 21-45. http://doi.org/10.7468/jksmee.2023.37.1.21
  34. Organisation for Economic Co-operation and Development (2023), PISA 2022 results (Volume I): The State of learning and equity in education. OECD Publishing. https://doi.org/10.1787/53f23881-en
  35. Park, G. R., & Pang, J. (2008). A survey on the comprehension of basic knowledge of mathematics of 6th graders in elementary school by essay test. The Mathematical Education, 47(2), 181-195.
  36. Park, H., Lee, M., Koo, J., Baek, E., Joo, H., Jung, J., Na, M., Ryu, K., Lee, E., Oh, S., & Lee, J. (2022). Descriptive assessment tool compilation: Middle and high school mathematics (Research Material ORM 2022-150-3). Korea Institute for Curriculum and Evaluation.
  37. Park, J., & Choi, S. (2023). A study on the development of automated Korean essay scoring model using random forest algorithm. Brain, Digital, & Learning, 13(2), 131-146. https://doi.org/10.31216/BDL.20230008
  38. Park, S., & Ha, M. (2020). The development and application of automated scoring system for constructed-response assessment of 5th grade science in elementary schools using recurrent neural network. Journal of Educational Evaluation, 33(2), 297-321. http://dx.doi.org/10.31158/JEEV.2020.33.2.297
  39. Pimpalkar, A. P., & Raj, R. J. R. (2020). Influence of pre-processing strategies on the performance of ML classifiers exploiting TF-IDF and BOW features. ADCAIJ: Advances in Distributed Computing and Artificial Intelligence Journal, 9(2), 49-68. https://doi.org/10.14201/ADCAIJ2020924968
  40. Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1, 81-106. https://doi.org/10.1007/BF00116251
  41. Shin, D. (2022). Effects of scoring features on the accuracy of the automated scoring model of english. Korean Journal of Teacher Education, 38(6), 73-91. https://doi.org/10.14333/KJTE.2022.38.6.04
  42. Yoo, J. E. (2015). Random forests, an alternative data mining technique to decision tree. Journal of Educational Evaluation, 28(2), 427-448.