Browse > Article

Text Classification based on a Feature Projection Technique with Robustness from Noisy Data  

고영중 (서강대학교 산업기술연구소)
서정연 (서강대학교)
Abstract
This paper presents a new text classifier based on a feature projection technique. In feature projections, training documents are represented as the projections on each feature. A classification process is based on individual feature projections. The final classification is determined by the sum from the individual classification of each feature. In our experiments, the proposed classifier showed high performance. Especially, it have fast execution speed and robustness with noisy data in comparison with k-NN and SVM, which are among the state-of-art text classifiers. Since the algorithm of the proposed classifier is very simple, its implementation and training process can be done very simply. Therefore, it can be a useful classifier in text classification tasks which need fast execution speed, robustness, and high performance.
Keywords
Text Classification; Feature Projections; Text Classifier;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Y. Ko, J, Park, and J, Seo, 'Automatic text categorization using the importance of sentences,' Proceedings 'of the 19th International Conference on Computational Lin- guistics (COLING'2002), pp.474-480, 2002   DOI
2 D. D. Lewis. 'Naive (bayes) at forty: The independence assumption in information retrieval,' European Conference on Machine Learning, 1998
3 A. McCallum and K. Nigram, 'A comparison of event models for naive bayes text classification,' AAAI '98 workshop on Learning for Text Categorization, 1998
4 D. D. Lewis and M. Ringuette, 'A comparison of two learning algorithms for text categorization,' Proceedings of the 3rd Annual Symposium on Document Analysis and Information Retrieval, 1994
5 C. Cortes and V. Vapnik. 'Support vector networks,' Machine Learning, 20:273-297, 1995   DOI
6 T. Joachims. 'Text categorization with support vector machines: learning with many relevant features,' European Conference on Machine Learning (ECML), 1998
7 E. Wiener, J. O. Pedersen, and A. S. Weigend. 'A neural network approach to topic spotting,' Proceedings of the Fourth Annual Symposium on Document Analysis and Information Retrieval (SDAIR'95), 1995
8 Y. Yang and X. Liu, 'A re-examination of text categorization methods,' Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval, SICIR'99, pp. 42-49, 1999   DOI
9 Y. Yang, 'An evaluation of statistical approaches to text categorization,' Information Retrieval Journal, May, 1999   DOI
10 M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam, and S. Slattery, 'Learning to construct knowledge bases from the world wide web,' Artificial Intelligence, 118 (1-2), pp. 69-113, 2000   DOI   ScienceOn
11 A. Akkus and H. A. Guvenir, 'K nearest neighbor classification on feature projections,' Proceedings of ICML'96, Italy, pp. 12-19, 1996
12 Y. Yang. 'Expert netword: Effective and efficient learning from human decisions in text categorizatin and retrieval,' 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'94), pp.13-22, 1994
13 D. D. Lewis, R. E. Schapire, J. P. Callan and R. Papka, 'Training algorithms for linear text classifiers,' Proceedings of the 19th International Conference on Research and Development in Information Retrieval(SIGIR'96), pp.289-297, 1996   DOI
14 I. Sirin and H. A. Guvenir, 'An algorithm for classification by feature partitioning,' Technical Report, Department of Computer Engineering and Information Science, Bilkent University, 1993
15 G. Salton and M. J. McGill, Introduction to modern information retrieval, McGraw-Hill, Inc, 1983
16 K. Nigam, A. McCallum, S. Thrun, T. Mitchell, 'Learning to classify text from labeled and unlabeled documents,' Proceedings of 15th National Conference on Artificial Intelligence (AAAI-98), 1998