Browse > Article
http://dx.doi.org/10.3745/JIPS.2008.4.2.077

Neural Text Categorizer for Exclusive Text Categorization  

Jo, Tae-Ho (School of Computer and Information Engineering Inha University)
Publication Information
Journal of Information Processing Systems / v.4, no.2, 2008 , pp. 77-86 More about this Journal
Abstract
This research proposes a new neural network for text categorization which uses alternative representations of documents to numerical vectors. Since the proposed neural network is intended originally only for text categorization, it is called NTC (Neural Text Categorizer) in this research. Numerical vectors representing documents for tasks of text mining have inherently two main problems: huge dimensionality and sparse distribution. Although many various feature selection methods are developed to address the first problem, the reduced dimension remains still large. If the dimension is reduced excessively by a feature selection method, robustness of text categorization is degraded. Even if SVM (Support Vector Machine) is tolerable to huge dimensionality, it is not so to the second problem. The goal of this research is to address the two problems at same time by proposing a new representation of documents and a new neural network using the representation for its input vector.
Keywords
Disk Neural Text Categorizer; Text Categorization; NewsPage.com;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Androutsopoulos, K. Koutsias, K. V. Chandrinos, and C. D. Spyropoulos, “An Experimental Comparison of Naïve Bayes and Keyword-based Anti-spam Filtering with personal email message”, The Proceedings of 23rd ACM SIGIR, pp.160-167, 2000
2 M. T. Hagan, Demuth, H.B., and Beale, M. Neural Network Design, PWS Publishing Company, 1995
3 J. Rennie, “Improving multi-class text classification with support vector machine”, Master's thesis, Massachusetts Institute of Technology, 2001
4 F. Sebastiani, “Machine Learning in Automated Text Categorization”, ACM Computing Survey, Vol.34, No.1, pp.1-47, 2002   DOI   ScienceOn
5 E. D. Wiener, “A Neural Network Approach to Topic Spotting in Text”, The Thesis of Master of University of Colorado, 1995
6 Y. Yang, “An evaluation of statistical approaches to text categorization”, Information Retrieval, Vol.1, No.1-2, pp.67-88, 1999
7 H. Lodhi, C. Saunders, J. Shawe-Taylor, N. Cristianini, and C. Watkins, Text Classification with String Kernels, Journal of Machine Learning Research, Vol.2, No.2, pp.419-444, 2002   DOI
8 M. Hearst, “Support Vector Machines”, IEEE Intelligent Systems, Vol.13, No.4, pp.18-28, 1998   DOI   ScienceOn
9 P. Jackson, and I. Mouliner, Natural Language Processing for Online Applications: Text Retrieval, Extraction and Categorization, John Benjamins Publishing Company, 2002
10 T. Joachims, “Text Categorization with Support Vector Machines: Learning with many Relevant Features”, The Proceedings of $10^{th}$ European Conference on Machine Learning, pp.143-151, 1998
11 T. Martin, H. B. Hagan, H. Demuth, and M. Beale, Neural Network Design, PWS Publishing Company, 1995
12 B. Massand, G. Linoff, and D. Waltz, “Classifying News Stories using Memory based Reasoning”, The Proceedings of $15^{th}$ ACM International Conference on Research and Development in Information Retrieval, pp.59-65, 1992
13 D. Mladenic, and M. Grobelink, “Feature Selection for unbalanced class distribution and Naïve Bayes”, The Proceedings of International Conference on Machine Learning, pp.256-267, 1999
14 J. C. Platt, “Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines”, Technical Report MSR-TR-98-14, 1998
15 V. I. Frants, J. Shapiro, and V. G. Voiskunskii, Automated Information Retrieval: Theory and Methods, Academic Press, 1997
16 R. O. Duda, P. E. Hart, P. E., and D. G. Stork, Pattern Classification, John Wiley & Sons, Inc, 2001
17 S. Haykin, Neural Networks: Comprehensive Foundation, Macmillan College Publishing Company, 1994
18 M.E. Ruiz, and P. Srinivasan, “Hierarchical Text Categorization Using Neural Networks”, Information Retrieval, Vol.5, No.1, pp.87-118, 2002   DOI   ScienceOn
19 N. Cristianini, and J. Shawe-Taylor, Support Vector Machines and Other Kernel-based Learning Methods, Cambridge University Press, 2000
20 H. Drucker, D. Wu, and V. N. Vapnik, “Support Vector Machines for Spam Categorization”, IEEE Transaction on Neural Networks, Vol.10, No.5, pp.1048-1054, 1999   DOI   ScienceOn
21 T. M. Mitchell, T. M., Machine Learning, McGraw-Hill, 1997