Browse > Article

A Hangul Document Classification System using Case-based Reasoning  

Lee, Jae-Sik (아주대학교 경영대학)
Lee, Jong-Woon (대우정보시스템 건설시스템팀)
Publication Information
Asia pacific journal of information systems / v.12, no.2, 2002 , pp. 179-195 More about this Journal
Abstract
In this research, we developed an efficient Hangul document classification system for text mining. We mean 'efficient' by maintaining an acceptable classification performance while taking shorter computing time. In our system, given a query document, k documents are first retrieved from the document case base using the k-nearest neighbor technique, which is the main algorithm of case-based reasoning. Then, TFIDF method, which is the traditional vector model in information retrieval technique, is applied to the query document and the k retrieved documents to classify the query document. We call this procedure 'CB_TFIDF' method. The result of our research showed that the classification accuracy of CB_TFIDF was similar to that of traditional TFIDF method. However, the average time for classifying one document decreased remarkably.
Keywords
Citations & Related Records
연도 인용수 순위
  • Reference
1 Joachims, T. A., 'Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization,' Proc. 14th Int'l Conf. Machine Learning, 1997, pp. 143-151
2 이형일, 향상된 메모리기반 추론에 의한 지능형 문서여과 에이전트 구현, 명지대학교 컴퓨터 공학과 박사학위 논문, 1999
3 Baeza-Yates, R. and B. Ribeiro-Neto, Modern Information Retrieval, Addison Wesley, 1999
4 Cho, W.V., Knowledge Discovery from Distributed and Textual Data, Ph.D. Dissertation, Dept. of Computer Science, Hong Kong University of Science and Technology, 1999
5 Salton, G. and C. Buckley, 'Term-weighting Approaches in Automatic Retrieval,' Information Processing and Management, Vol. 24, No. 5, 1988, pp. 513-523   DOI   ScienceOn
6 Trybula, W,J., Text Mining and Knowledge Discernment: An Exploratory Investigation, Ph.D. Dissertation, The University of Texas at Austin, 1999
7 Watson, I., Applying Case-Based Reasoning: Techniques for Enterprise Systems, Morgan Kaufman Pub. Inc., 1997
8 Mladenic, D., 'Text-Learning and Related Intelligent Agents : A Survey,' IEEE Intelligent Systems, 1999
9 Kolodner, J., Case-Based Reasoning, Morgan Kaufman Pub. Inc., 1993
10 김시천, Memory-Based Reasoning을 이용한 HTML 문서분류 시스템의 설계및 구축, 아주대학교 경영정보학과 석사학위 논문, 1999
11 Aamodt, A. and E. Plaza, 'Case-Based Reasoning: Foundational Issues, Methodological Variations, and System Approaches,' Artificial Intelligence Communications, Vol. 7, No. 1, 1996, pp. 9-13
12 Lewis, D.D. and M. Ringuette, 'Comparison of Two Learning Algorithms for Text Categorization,' Proc. 3rd Ann. Symp. Document Analysis and lniormation Retrieval, 1994, pp. 81-93
13 한글공학 연구소, 한국어 분석 라이브러리 HAM 사용 설명서, 한성대학교, 1999
14 Linoff, G. and M.J. A. Berry, Mastering Data Mining, Wiley, 2000
15 Gudivada, V., V.V. Raghhavan, W.I. Grosky, and R. Kasanagottu, 'Information Retrieval on the World Wide Web,' IEEE Internet Computing, 1997
16 안수산, 신경식, '데이터마이닝 기법을 활용한 스팸메일의 분류 및 예측모형 구축에 관한 연구, 한국지능정보시스템학회 2000년 추계학술 대회 논문집, 2000, pp. 359-366
17 Weiner, E., J. O. Pedersen and A. S., Weigend, 'A Neural Network Approach to Topic Spotting,' Proc. 4th Ann. Symp. Document Analysis and Information Retrieval, 1995, pp. 197-208