[KSCI] Korea Science Citation Index Service

A Hangul Document Classification System using Case-based Reasoning

Lee, Jae-Sik (아주대학교 경영대학)
Lee, Jong-Woon (대우정보시스템 건설시스템팀)

Publication Information

Asia pacific journal of information systems / v.12, no.2, 2002 , pp. 179-195 More about this Journal

Abstract

In this research, we developed an efficient Hangul document classification system for text mining. We mean 'efficient' by maintaining an acceptable classification performance while taking shorter computing time. In our system, given a query document, k documents are first retrieved from the document case base using the k-nearest neighbor technique, which is the main algorithm of case-based reasoning. Then, TFIDF method, which is the traditional vector model in information retrieval technique, is applied to the query document and the k retrieved documents to classify the query document. We call this procedure 'CB_TFIDF' method. The result of our research showed that the classification accuracy of CB_TFIDF was similar to that of traditional TFIDF method. However, the average time for classifying one document decreased remarkably.

Keywords

Citations & Related Records

Reference

1	Joachims, T. A., 'Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization,' Proc. 14th Int'l Conf. Machine Learning, 1997, pp. 143-151
2	이형일, 향상된 메모리기반 추론에 의한 지능형 문서여과 에이전트 구현, 명지대학교 컴퓨터 공학과 박사학위 논문, 1999
3	Baeza-Yates, R. and B. Ribeiro-Neto, Modern Information Retrieval, Addison Wesley, 1999
4	Cho, W.V., Knowledge Discovery from Distributed and Textual Data, Ph.D. Dissertation, Dept. of Computer Science, Hong Kong University of Science and Technology, 1999
5	Salton, G. and C. Buckley, 'Term-weighting Approaches in Automatic Retrieval,' Information Processing and Management, Vol. 24, No. 5, 1988, pp. 513-523 DOI ScienceOn
6	Trybula, W,J., Text Mining and Knowledge Discernment: An Exploratory Investigation, Ph.D. Dissertation, The University of Texas at Austin, 1999
7	Watson, I., Applying Case-Based Reasoning: Techniques for Enterprise Systems, Morgan Kaufman Pub. Inc., 1997
8	Mladenic, D., 'Text-Learning and Related Intelligent Agents : A Survey,' IEEE Intelligent Systems, 1999
9	Kolodner, J., Case-Based Reasoning, Morgan Kaufman Pub. Inc., 1993
10	김시천, Memory-Based Reasoning을 이용한 HTML 문서분류 시스템의 설계및 구축, 아주대학교 경영정보학과 석사학위 논문, 1999
11	Aamodt, A. and E. Plaza, 'Case-Based Reasoning: Foundational Issues, Methodological Variations, and System Approaches,' Artificial Intelligence Communications, Vol. 7, No. 1, 1996, pp. 9-13
12	Lewis, D.D. and M. Ringuette, 'Comparison of Two Learning Algorithms for Text Categorization,' Proc. 3rd Ann. Symp. Document Analysis and lniormation Retrieval, 1994, pp. 81-93
13	한글공학 연구소, 한국어 분석 라이브러리 HAM 사용 설명서, 한성대학교, 1999
14	Linoff, G. and M.J. A. Berry, Mastering Data Mining, Wiley, 2000
15	Gudivada, V., V.V. Raghhavan, W.I. Grosky, and R. Kasanagottu, 'Information Retrieval on the World Wide Web,' IEEE Internet Computing, 1997
16	안수산, 신경식, '데이터마이닝 기법을 활용한 스팸메일의 분류 및 예측모형 구축에 관한 연구, 한국지능정보시스템학회 2000년 추계학술 대회 논문집, 2000, pp. 359-366
17	Weiner, E., J. O. Pedersen and A. S., Weigend, 'A Neural Network Approach to Topic Spotting,' Proc. 4th Ann. Symp. Document Analysis and Information Retrieval, 1995, pp. 197-208

KSCI

A Hangul Document Classification System using Case-based Reasoning 사례기반 추론을 이용한 한글 문서분류 시스템

A Hangul Document Classification System using Case-based Reasoning