Browse > Article
http://dx.doi.org/10.3745/KTSDE.2015.4.11.509

Prediction of Correct Answer Rate and Identification of Significant Factors for CSAT English Test Based on Data Mining Techniques  

Park, Hee Jin (서울과학기술대학교 IT정책전문대학원 산업정보시스템전공)
Jang, Kyoung Ye (서울과학기술대학교 IT정책전문대학원 산업정보시스템전공)
Lee, Youn Ho (서울과학기술대학교 글로벌융합산업공학과)
Kim, Woo Je (서울과학기술대학교 글로벌융합산업공학과)
Kang, Pil Sung (고려대학교 산업경영공학부)
Publication Information
KIPS Transactions on Software and Data Engineering / v.4, no.11, 2015 , pp. 509-520 More about this Journal
Abstract
College Scholastic Ability Test(CSAT) is a primary test to evaluate the study achievement of high-school students and used by most universities for admission decision in South Korea. Because its level of difficulty is a significant issue to both students and universities, the government makes a huge effort to have a consistent difficulty level every year. However, the actual levels of difficulty have significantly fluctuated, which causes many problems with university admission. In this paper, we build two types of data-driven prediction models to predict correct answer rate and to identify significant factors for CSAT English test through accumulated test data of CSAT, unlike traditional methods depending on experts' judgments. Initially, we derive candidate question-specific factors that can influence the correct answer rate, such as the position, EBS-relation, readability, from the annual CSAT practices and CSAT for 10 years. In addition, we drive context-specific factors by employing topic modeling which identify the underlying topics over the text. Then, the correct answer rate is predicted by multiple linear regression and level of difficulty is predicted by classification tree. The experimental results show that 90% of accuracy can be achieved by the level of difficulty (difficult/easy) classification model, whereas the error rate for correct answer rate is below 16%. Points and problem category are found to be critical to predict the correct answer rate. In addition, the correct answer rate is also influenced by some of the topics discovered by topic modeling. Based on our study, it will be possible to predict the range of expected correct answer rate for both question-level and entire test-level, which will help CSAT examiners to control the level of difficulties.
Keywords
College Ability Scholastic Test(CSAT) Difficulties; English Test; Topic Modeling; Multiple Linear Regression; Decision Tree;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 2015 school year the CSAT questions headquarters, "2015 school year, the CSAT Press," in Proceedings 2015 school year the CSAT questions headquarter, 2014.
2 Korea Institute for Curriculum and Evaluation, "2015 school year CSAT score results press release," in Proceedings Korea Institute for Curriculum and Evaluation, 2014.
3 Korea Institute for Curriculum and Evaluation, "2015 school year CSAT plan," in Proceedings Korea Institute for Curriculum and Evaluation, 2014.
4 T. C. Kang, "CSAT Improvement Study," Ministry of Education, pp.57-77, 2013.
5 M. K. Kang and Y. M. Kim, "The internal analysis of the validation on item-types of Foreign (English) Language Domain of the current 2005 CSAT for designing the level-differentiated English tests of the 2014 CSAT," Journal of the Korea English Education Society, Vol.12, No2, pp.1-35, 2013.
6 K. S. Lee, "The effects of th number of questions per passage, the length of passage, and the topic familiarity on multiple-choice English listening and reading comprehension tests," English Teaching, Vol.54, No.4, pp.327-351, 1999.
7 N. B. Kim, "A corpus-based lexical analysis of the foreign language(English) test for the college scholastic ability test (CSAT)," English Language & Literature Teaching, Vol.14, No.4, pp.201-221, 2008.
8 K. S. Chang, "A model of predicting item difficulty of the reading test of College Scholastic Ability Test," Foreign Languages Education, Vol.11, No.1, pp.111-130, 2004.
9 Y. M. Sung, "Factor Analysis of English Test Scores in the College Scholastic Ability Test and Implications," Ph.D. dissertation, Inha University Graduate School, 2003.
10 H. W. Lee and S. Y. Lee, "A study on the relationship between the scores of TOEFIC, TOEIC and TEPS, and college academic performance," English Language & Literature Teaching, Vol.9, No.1, pp.153-171, 2003.
11 L. Breiman, J. Friedman, R. Olshen, and C. Stone, "Classification and Regression Trees," Wadsworth, 1984.
12 D. Hand, H. Mannila, and P. Smyth, "Principles of Data Mining," A Bradford Book The MIT Press, 2001.
13 F. Sebstiai, "Machine learning in automated text categorization," ACM Computing Surverys, Vol.34, No.1, 2002.
14 J. H. Bae, J. E. Son, and M. Song, "Analysis of twitter for 2012 South Korea presidential election by text mining techniques," Journal of Intelligent Information Systems, Vol.19, No.3, pp.141-156, 2013.
15 H. J. Lee and J. C. Park, "Probabilistic filtering for a biological knowledge discovery system with text mining and automatic inference," Journal of the Korea Society of Computer and Information, Vol.17, No.2, pp.139-147, 2012.   DOI
16 D. Blei, "Probabilistic topic models," Communications of the ACM, Vol.55, No.4, pp.77-84, 2012.   DOI
17 S. R. Kang, "A Study on the Readability of English Textbooks: Middle School English 1 and 2 Based on the Revised 7th English National Curriculum," Master Dissertation, Inha University Graduate School, 2010.