Browse > Article

Text Document Categorization using FP-Tree  

Park, Yong-Ki (경북대학교 컴퓨터과학과)
Kim, Hwang-Soo (경북대학교 컴퓨터과학과)
Abstract
As the amount of electronic documents increases explosively, automatic text categorization methods are needed to identify those of interest. Most methods use machine learning techniques based on a word set. This paper introduces a new method, called FPTC (FP-Tree based Text Classifier). FP-Tree is a data structure used in data-mining. In this paper, a method of storing text sentence patterns in the FP-Tree structure and classifying text using the patterns is presented. In the experiments conducted, we use our algorithm with a #Mutual Information and Entropy# approach to improve performance. We also present an analysis of the algorithm via an ordinary differential categorization method.
Keywords
Text Document Categorization; FP-Tree; Mutual Information; Entropy;
Citations & Related Records
연도 인용수 순위
  • Reference
1 D.D.Lewis, An evaluation of phrasal and clustered representations on a text categorization task, In Proceedings of SIGIR-92, pages 37-50, 1992
2 Jiawei Han, Jian Pet, Yiwen Yin Runying Mao, Mining Frequent Patterns without Candidate Generation, Data Mining and Knowledge Discovery 2004
3 Gerard Salton, Chris Buckley, 571 stopword list for the experimental SMART information retrieval system at Cornell University http://www.lextek.com/manuals/onix/stopwords2.html
4 W.Lam, C.Y.Ho, Using a generalized instance set for automatic text categorization, In Proceedings of SIGIR-98, pages 81-89, 1998
5 G.A. Miller, WordNet: A Dictionary Browser, 1st Int'l Conf. Information in data 1985
6 David J.C. Mackay, Information Theory, Inference, and Learning Algorithm. Cambridge University Press 2003
7 T.Joachims, Text categorization with support vector machines: learning with many relevant features, In Proceedings of ECML-98, pages 137-142, 1998
8 Yiming Yang and J. O. Pedersen, A Comparative Study on Feature Selection in Text Categorization, Proceedings of the 14th International Conference on Machine Learning pages 412-420 1997
9 D.D.Lewis 'Reuters-21578' http://www.research.att.com/~lewis
10 R.E.Schapire, Y.Singer, BoosTexter: a boosting-based system for text categorization, Mach. Learn. 39 2000
11 S. T. Dumais, J. Platt, D. Heckerman, M. Sahami, Inductive learning algorithms and representations for text categorization. Proceedings of ACM CIKM98 pages 148-155, 1998