Browse > Article
http://dx.doi.org/10.3743/KOSIM.2012.29.2.205

A Study on Improving the Performance of Document Classification Using the Context of Terms  

Song, Sung-Jeon (연세대학교 문헌정보학과 대학원)
Chung, Young-Mee (연세대학교 문헌정보학과)
Publication Information
Journal of the Korean Society for information Management / v.29, no.2, 2012 , pp. 205-224 More about this Journal
Abstract
One of the limitations of BOW method is that each term is recognized only by its form, failing to represent the term's meaning or thematic background. To overcome the limitation, different profiles for each term were defined by thematic categories depending on contextual characteristics. In this study, a specific term was used as a classification feature based on its meaning or thematic background through the process of comparing the context in those profiles with the occurrences in an actual document. The experiment was conducted in three phases; term weighting, ensemble classifier implementation, and feature selection. The classification performance was enhanced in all the phases with the ensemble classifier showing the highest performance score. Also, the outcome showed that the proposed method was effective in reducing the performance bias caused by the total number of learning documents.
Keywords
document classification; context profile; term weighting; ensemble classifier; feature selection;
Citations & Related Records
Times Cited By KSCI : 4  (Citation Analysis)
연도 인용수 순위
1 David, D. L. (2004). Reuters-21578 text categorization test collection distribution 1.0. Retrieved from http://www.daviddlewis.com/resources/testcollections/reuters21578/
2 Gabrilovich, E., & Markovitch, S. (2009). Wikipedia-based semantic interpretation for natural language processing. Journal of Artificial Intelligence Research, 34(2009), 443-498. http://dx.doi.org/10.1613/jair.2669
3 Huynh, D., Tran, D., Ma, W., & Sharma, D. (2011). A new term ranking method based on relation extraction and graph model for text classification. Proceedings of the Australasian Computer Science Conference (ACSC 2011), Perth, Australia, 145-152.
4 Porter, M. (1980). An algorithm for suffix stripping. Program, 14(3), 130-137.   DOI
5 Sable, C., McKeown, K., & Church, K. W. (2002). NLP found helpful (at least for one text categorization task). Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMLNLP) 2002, 172-179.
6 Wang, P., & Domeniconi, C. (2008). Building semantic kernels for text classification using wikipedia. Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 713-721.
7 정은경 (2009). 문서범주화 성능 향상을 위한 의미기반 자질확장에 관한 연구. 정보관리학회지, 26(3), 261-278. http://dx.doi.org/10.3743/KOSIM.2009.26.3.261(Chung, Eun-Kyung (2009). A semantic-based feature expansion approach for improving the effectiveness of text categorization by using WordNet. Journal of the Korean Society for Information Management, 26(3), 261-278. ttp://dx.doi.org/10.3743/KOSIM.2009.26.3.261)   과학기술학회마을   DOI
8 김판준 (2008). 용어 가중치부여 기법을 이용한 로치오 분류기의 성능 향상에 관한 연구. 정보관리학회지, 25(1), 211-233. http://dx.doi.org/10.3743/KOSIM.2008.25.1.211(Kim, Pan-Jun (2008). A Study on the Performance Improvement of Rocchio Classifier with Term Weighting Methods. Journal of the Korean Society for Information Management, 25(1), 211-233. http://dx.doi.org/10.3743/KOSIM.2008.25.1.211)   과학기술학회마을   DOI
9 이재윤 (2005). 문헌간 유사도를 이용한 SVM 분류기의 문헌분류성능 향상에 관한 연구. 정보관리학회지, 22(3), 261-287.(Lee, Jae Yun (2005). Improving the performance of SVM text categorization with inter-document similarities. Journal of the Korean Society for Information Management, 22(3), 261-287.)   과학기술학회마을   DOI
10 이지혜, 정영미 (2009). 지도적 잠재의미색인(LSI) 기법을 이용한 의견 문서 자동 분류에 관한 실험적 연구. 정보관리학회지, 26(3), 451-462. http://dx.doi.org/10.3743/KOSIM.2009.26.3.451(Lee, Ji-Hye & Chung, Young Mee (2009). An experimental study on opinion classification using supervised latent semantic indexing (LSI). Journal of the Korean Society for Information Management, 26(3), 451-462. http://dx.doi.org/10.3743/KOSIM.2009.26.3.451)   과학기술학회마을   DOI