DOI QR코드

DOI QR Code

Data Analysis of Dropouts of University Students Using Topic Modeling

토픽모델링을 활용한 대학생의 중도탈락 데이터 분석

  • Jeong, Do-Heon (Department of Library and Information Science, Duksung Women's University) ;
  • Park, Ju-Yeon (Cha Mirisa College of Liberal Arts, Duksung Women's University)
  • Received : 2020.12.18
  • Accepted : 2021.01.07
  • Published : 2021.01.31

Abstract

This study aims to provide implications for establishing support policies for students by empirically analyzing data on university students dropouts. To this end, data of students enrolled in D University after 2017 were sampled and collected. The collected data was analyzed using topic modeling(LDA: Latent Dirichlet Allocation) technique, which is a probabilistic model based on text mining. As a result of the study, it was found that topics that were characteristic of dropout students were found, and the classification performance between groups through topics was also excellent. Based on these results, a specific educational support system was proposed to prevent dropout of university students. This study is meaningful in that it shows the use of text mining techniques in the education field and suggests an education policy based on data analysis.

본 연구의 목적은 대학생의 중도탈락 현상 데이터를 실증적으로 분석하여 대학의 학생지원정책을 수립하기 위한 시사점을 제공하는 데 있다. 이를 위해 D대학의 2017~2019년 입학생 데이터를 토픽모델링 LDA(Latent Dirichlet Allocation)를 활용하여 재학생과 제적생으로 나누어 분석하였다. 연구결과 제적생에서 특징있게 나타난 토픽은 '학적'관련하여 '학기등록 1회', '전공'관련하여 '어문계열학과', '학점'관련하여 '학사경고'이고, '대학생활'관련하여 '비교과 프로그램'에 대한 토픽은 나타나지 않았다. 다음으로 '재학생 토픽'과 '제적생 토픽'의 상호 식별 성능을 측정한 결과, SVM(Support Vector Machines)이 가장 우수한 식별 성능을 보여주었다. 이러한 실험을 통해 기계학습을 활용한 인공지능 기반의 학생 데이터 분류 기법 연구의 가능성을 확인할 수 있었다.

Keywords

References

  1. J. Y. Chung, M. S. Sun, and M. J. Jeong, "An Analysis of Institutional Factors Affecting on College Dropout Rates," Asian Journal of Education, vol. 16, no. 4, pp. 57-76, 2015. https://doi.org/10.15753/aje.2015.12.16.4.57
  2. M. Kang, E. Lee, and E. Lee, "Trends and influencing factors of college student's dropout intention," In Forum for Youth Culture, no. 58, pp. 5-30, 2019.
  3. C. Park, "Development of Prediction Model to Improve Dropout of Cyber University," Journal of the Korea Academia-Industrial Cooperation Society, vol. 21, no. 7, pp. 380-390, 2020.
  4. S. Lee and L. Park, "Analysis of Correlation between the Characteristics of University Students and Dropout," Journal of Learner-Centered Curriculum and Instruction, vol. 19, no. 11, pp. 1185-1210, 2019.
  5. E. Lee, Y. Song, J. Kim, and S. Oh, "An Exploratory Study on Determinants Predicting the Dropout Rate of 4-year Universities Using Random Forest: Focusing on the Institutional Level Factors," Journal of Educational Technology, vol. 36, no. 1, pp. 191-219, 2020. https://doi.org/10.17232/KSET.36.1.191
  6. E. H. Lee and S. Kang, "The Research Trends and Implications of College Dropouts in Korea," Journal of Learner-Centered Curriculum and Instruction, vol. 19, no. 10, pp. 169-199, 2019. https://doi.org/10.22251/jlcci.2019.19.10.169
  7. S. Han, "Exploration of Factors that Affect College Student Drop-out and Resilience," Journal of Learner-Centered Curriculum and Instruction, vol. 18, no. 24, pp. 1369-1391, 2018. https://doi.org/10.22251/jlcci.2018.18.24.1369
  8. D. M. Blei, A. Y. Ng, and M. I. Jordan, "Latent dirichlet allocation," The Journal of Machine Learning Research, vol. 3, pp. 993-1022, 2003.
  9. M. Jin and H. Ko, "Analysis of trends in mathematics education research using text mining," Journal of the Korean Society of Mathematical Education: Communications of Mathematical Education, vol. 3, no. 3, pp. 275-294, 2019.
  10. J. Chung, Y. Park, and W. Kim, "Social Media Analysis Based on Keyword Related to Educational Policy Using Topic Modeling," Journal of Internet Computing and Services, vol. 19, no. 4, pp. 53-63, 2018. https://doi.org/10.7472/JKSII.2018.19.4.53
  11. D. H. Jeong and H. S. Joo, "Discovering interdisciplinary convergence technologies using content analysis technique based on topic modeling," Journal of the Korean Society for Information Management, vol. 35, no. 3, pp. 77-100, 2018. https://doi.org/10.3743/KOSIM.2018.35.3.077
  12. D. H. Jeong and H. S. Joo, "Topical Prescriptive Analytics System for Automatic Recommendation of Convergence Technology," Biotechnology and Bioprocess Engineering, vol. 24, pp. 893-906, 2019. https://doi.org/10.1007/s12257-019-0305-1
  13. Z. Huang, X. Lu, and H. Duan, "Latent treatment pattern discovery for clinical processes," Journal of Medical Systems, vol. 37, no. 9915, 2013.
  14. D. H. Jeong and M. Song, "Time gap analysis by the topic model-based temporal technique," Journal of Informetrics, vol. 8, no. 3, pp. 776-790, 2014. https://doi.org/10.1016/j.joi.2014.07.005
  15. D. M. Mimno, H. M. Wallach, E. M. Talley, M. Leenders, and A. K. McCallum, "Optimizing semantic coherence in topic models," In Proceedings of the Conference on Empirical Methods in Natural Language Processing: EMNLP '11, pp. 262-272, 2011.
  16. M. P. Muller, G. Tomlinson, T. J. Marrie, P. Tang, A. McGeer, D. E. Low, A. S. Detsky, and W. L. Gold, "Can Routine Laboratory Tests Discriminate between Severe Acute Respiratory Syndrome and Other Causes of Community-Acquired Pneumonia?," Clinical Infectious Diseases, vol. 40, no. 8, pp. 1079-1086, 2005. https://doi.org/10.1086/428577