Browse > Article
http://dx.doi.org/10.5391/IJFIS.2008.8.1.052

Modified Version of SVM for Text Categorization  

Jo, Tae-Ho (School of Computer and Information Engineering Inha University)
Publication Information
International Journal of Fuzzy Logic and Intelligent Systems / v.8, no.1, 2008 , pp. 52-60 More about this Journal
Abstract
This research proposes a new strategy where documents are encoded into string vectors for text categorization and modified versions of SVM to be adaptable to string vectors. Traditionally, when the traditional version of SVM is used for pattern classification, raw data should be encoded into numerical vectors. This encoding may be difficult, depending on a given application area of pattern classification. For example, in text categorization, encoding full texts given as raw data into numerical vectors leads to two main problems: huge dimensionality and sparse distribution. In this research, we encode full texts into string vectors, and apply the modified version of SVM adaptable to string vectors for text categorization.
Keywords
String Vector; Text Categorization; Support Vector Machine;
Citations & Related Records
연도 인용수 순위
  • Reference
1 D. Mladenic, and M. Grobelink, "Feature Selection for unbalanced class distribution and Naïve Bayes", The Proceedings of International Conference on Machine Learning, pp256-267, 1999
2 E. D. Wiener, "A Neural Network Approach to Topic Spotting in Text", The Thesis of Master of University of Colorado, 1995
3 Androutsopoulos, K. Koutsias, K. V. Chandrinos, and C. D. Spyropoulos, "An Experimental Comparison of Naïve Bayes and Keyword-based Anti-spam Filtering with personal email message", The Proceedings of 23rd ACM SIGIR, pp160-167, 2000
4 M. Hearst, "Support Vector Machines", IEEE Intelligent Systems, Vol 13, No 4, pp18-28, 1998
5 Mitchell, T. M., Machine Learning, McGraw-Hill, 1997
6 B. Massand, G. Linoff, and D. Waltz, "Classifying News Stories using Memory based Reasoning", The Proceedings of 15th ACM International Conference on Research and Development in Information Retrieval, 1992, pp59-65, 1992
7 P. Jackson, and I. Mouliner, Natural Language Processing for Online Applications: Text Retrieval, Extraction and Categorization, John Benjamins Publishing Company, 2002
8 Y. Yang, "An evaluation of statistical approaches to text categorization", Information Retrieval, Vol 1, No 1-2, pp67-88, 1999
9 F. Sebastiani, "Machine Learning in Automated Text Categorization", ACM Computing Survey, Vol 34, No 1, pp1-47, 2002   DOI   ScienceOn
10 H. Lodhi, C. Saunders, J. Shawe-Taylor, N. Cristianini, and C. Watkins, Text Classification with String Kernels, Journal of Machine Learning Research, Vol 2, No 2, pp419-444, 2002   DOI
11 T. Joachims, "Text Categorization with Support Vector Machines: Learning with many Relevant Features", The Proceedings of 10th European Conference on Machine Learning, pp143-151, 1998
12 M. E. Ruiz, and P. Srinivasan, "Hierarchical Text Categorization Using Neural Networks", Information Retrieval, Vol 5, No 1, 2002, pp87-118, 2002   DOI   ScienceOn