DOI QR코드

DOI QR Code

Predicting Learning Achievement Using Big Data Cluster Analysis - Focusing on Longitudinal Study

빅데이터 군집 분석을 이용한 학습성취도 예측 - 종단 연구를 중심으로

  • Ko, Sujeong (Department of Computer Software, Induk University)
  • 고수정 (인덕대학교 컴퓨터소프트웨어학과)
  • Received : 2018.09.01
  • Accepted : 2018.09.27
  • Published : 2018.09.30

Abstract

As the value of using Big Data is increasing, various researches are being carried out utilizing big data analysis technology in the field of education as well as corporations. In this paper, we propose a method to predict learning achievement using big data cluster analysis. In the proposed method, students in Korea Children and Youth Panel Survey(KCYPS) are classified into groups with similar learning habits using the Kmeans algorithm based on the learning habits of students of the first year at middle school, and group features are extracted. Next, using the extracted features of groups, the first grade students at the middle school in the test group were classified into groups having similar learning habits using the cosine similarity, and then the neighbors were selected and the learning achievement was predicted. The method proposed in this paper has proved that the learning habits at middle school are closely related to at the university, and they make it possible to predict the learning achievement at high school and the satisfaction with university and major.

빅데이터를 활용한 가치가 증대됨에 따라서 기업 뿐 아니라 교육 분야에서도 빅데이터 분석 기술을 활용한 여러 연구가 진행되고 있다. 본 논문에서는 빅데이터 군집 분석을 이용하여 학습성취도를 종단적으로 예측하는 방법을 제안한다. 제안한 방법에서는 한국아동 청소년패널조사(KCYPS) 자료의 중학교 1학년 학생의 학습 습관 유형을 기반으로 학생들을 Kmeans 알고리즘을 이용하여 학습 습관이 비슷한 그룹으로 분류하고, 그룹의 특징을 추출한다. 다음으로, 이와 같이 추출한 그룹의 특징을 이용하여 테스트 집합의 중학교 1학년 학생을 코사인 유사도를 사용하여 비슷한 학습 습관을 갖는 그룹으로 분류한 후, 이웃을 선정하고 학습성취도를 예측하였다. 본 논문에서 제안한 방법은 중학교의 학습 습관이 대학 및 전공 만족도까지 밀접한 영향을 미쳐서 고등학교의 학습성취도 뿐만 아니라 대학 및 전공에 대한 만족도까지도 예측이 가능하다는 것을 증명하였다.

Keywords

References

  1. K. Lee and E. Park, "The Study of the System Development on the Safe Environment of Children's Smartphone Use and Contents Recommendations", Journal of Digital Contents Society , Vol. 19 No. 5, pp. 845-852, 2018. https://doi.org/10.9728/dcs.2018.19.5.845
  2. B. Gupta, M. Goul, and B. Dinter, "Business Intelligence and Big Data in Higher Education: Status of a Multi-Year Model Curriculum Development Effort for Business School Undergraduates, MS Graduates, and MBAs", Journal of CAIS, Vol. 36, No. 23, pp. 449-476, 2015.
  3. Nationl Youth Policy Institute, 1st 7th Survey Data User's guide in Korea Children and Youth Paner Survey(KCYPS), National Youth Policy Institute, Seoul, 2017.
  4. S. Lee and Y. Lee, "An Analysis of Annual Changes on the Determining Factors Multicultural Acceptability for Using Data Mining", Korean Journal of Youth Studies, Vol. 24, No. 4, pp. 1-26, 2017.
  5. M. Lee, "Analysis of Predictive Factors of School Violence Behavior and Its Solution Using Neural Network Analysis", Korean Journal of Association for Learner centered Curriculum and Instruction, Vol. 17 No. 22, pp. 537-561, 2017.
  6. K. Jung and W. Jeong, "Identifying Latent Classes in Children's School Adjustment Using the Cluster Analysis and Testing Eco-system Variables as Predictors of Latent Classes", Korean Journal of Forum for youth culture, Vol. 32, pp. 119-143, 2012.
  7. K. Lee, M. Lee, and Y. Kim, "Research on blog search technique using Kmeans", The Proceeding of Korea Intelligent Information Systems Society - The Fall Conference, pp. 269-275, 2009.
  8. M. Arif, "Application of Data Mining Using Artificial Neural Network : Survey", International Journal of Database Theory and Application, Vol. 8 No. 1, pp.245-270, 2015.
  9. M. Chen, S. Mao, and Y. Liu, "Big Data: A Survey", Journal of Mobile Networks and Applications, Vol. 19, No. 2, pp 171-209, 2014. https://doi.org/10.1007/s11036-013-0489-0
  10. Soo Jung Lee, "Performance Analysis of Similarity Reflecting Jaccard Index for Solving Data Sparsity in Collaborative Filtering", Journal of Computer Education, Vol. 19, No. 4, pp. 59-66, 2016.
  11. S. Kwon, S. Kim, O. Tak, and H. Jeong, "A Study on the Clustering Method of Row and Multiplex Housing in Seoul Using Kmeans Clustering Algorithm and Hedonic Model", Journal of Intelligence and Information System, Vol. 23, No. 3, pp. 95-118, 2017. https://doi.org/10.13088/jiis.2017.23.1.095
  12. Kabacoff Robert, R in Action-Data analysis and graphics with R, Oreilly&AssociatesInc, 2015.
  13. J. Herlocker, J. A. Konstan, and J. Riedl, "An Empirical Analysis of Design Choices in Neighborhood-Based Collaborative Filtering Algorithms", Information Retrieval, Vol. 5, No. 4, pp. 287-310, 2002. https://doi.org/10.1023/A:1020443909834
  14. Kwang-Sung Jun, Kyu-Baek Hwang, "An Efficient Collaborative Filtering Method Based on k-Nearest Neighbor Learning for Large-Scale Data", Korea Information Science Society, Vol. 35(1C), pp. 376-380, 2008.
  15. M. Khoshneshin and W. Nick Street "Collaborative filtering via euclidean, embedding", The Proceedings of the fourth ACM conference on Recommender systems, pp. 87-94, 2010.
  16. Jun Wang, Arjen P. De Vries, and Marcel J. T. Reinders, "Unifying user-based and item-based collaborative filtering approaches by similarity fusion", In SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, 2006.
  17. T. Chai and R. R. Draxler, "Root mean square error (RMSE) or mean absolute error (MAE)? - Arguments against avoiding RMSE in the literature", Geoscientific Model Development, Vol. 7, No. 3, pp. 1247-1250, 2014. https://doi.org/10.5194/gmd-7-1247-2014

Cited by

  1. Analysis of Core Concepts in Problem Solving and Programming Unit of Informatics Subject Textbooks in Middle School Revised in 2015 vol.21, pp.1, 2018, https://doi.org/10.9728/dcs.2020.21.1.63
  2. COVID-19 Pandemic and Investor Herding Behavior vol.22, pp.7, 2021, https://doi.org/10.9728/dcs.2021.22.7.1083