Browse > Article
http://dx.doi.org/10.3745/JIPS.2009.5.1.005

SVD-LDA: A Combined Model for Text Classification  

Hai, Nguyen Cao Truong (School of Electronics and Computer Engineering, Chonnam National University)
Kim, Kyung-Im (School of Electronics and Computer Engineering, Chonnam National University)
Park, Hyuk-Ro (School of Electronics and Computer Engineering, Chonnam National University)
Publication Information
Journal of Information Processing Systems / v.5, no.1, 2009 , pp. 5-10 More about this Journal
Abstract
Text data has always accounted for a major portion of the world's information. As the volume of information increases exponentially, the portion of text data also increases significantly. Text classification is therefore still an important area of research. LDA is an updated, probabilistic model which has been used in many applications in many other fields. As regards text data, LDA also has many applications, which has been applied various enhancements. However, it seems that no applications take care of the input for LDA. In this paper, we suggest a way to map the input space to a reduced space, which may avoid the unreliability, ambiguity and redundancy of individual terms as descriptors. The purpose of this paper is to show that LDA can be perfectly performed in a "clean and clear" space. Experiments are conducted on 20 News Groups data sets. The results show that the proposed method can boost the classification results when the appropriate choice of rank of the reduced space is determined.
Keywords
Latent Dirichlet Allocation; Singular Value Decomposition; Input Filtering; Text Classification; Data Preprocessing;
Citations & Related Records
연도 인용수 순위
  • Reference
1 G. Heinrich, 'Parameter estimation for text analysis,' Technical report-University of Leipzig, Germany, 2005
2 F. Sebastiani, 'Machine learning in automated text categorization,' ACM Computing Surveys, Vol.34, no.1, 2002, pp.1-47   DOI   ScienceOn
3 T. Hofmann, J. Puzicha, and M. I. Jordan, 'Unsupervised learning from dyadic data,' Advances in Neural Information Processing Systems, Volume 11. MIT Press, 1999
4 T. Minka and J. Lafferty, 'Expectation-propagation for the generative aspect model,' Proc. UAI, 2002
5 A. Berger, A. D. Pietra, and J. D. Pietra, 'A maximum entropy approach to natural language processing,' Computational Linguistics, Vol.22, no.1, 1996, pp.39-71
6 S. Deerwester, G. W. Furnas, and T. K. Landauer, 'Indexing by latent semantic analysis,' Journal of the American Society for Info, Science, Vol.41, No.6, 1990, pp.391-407   DOI
7 D. M. Blei, A. Ng, and M. I. Jordan, 'Latent Dirichlet Allocation,' JMLR, Vol.3, 2003, pp.993-1022   DOI
8 Ramesh Nallapati and William Cohen, 'Link-PLSALDA: A new unsupervised model for topics and the influence of blogs,' AAAI, 2008
9 T. Hofmann, 'Probabilistic latent semantic indexing,' Proceedings of SIGIR'99, 1999
10 Tuomo Kakkonen, Niko Myller, and Erkki Sutinen, 'Applying Latent Dirichlet Allocation to Automatic Essay Grading,' FinTAL 2006, LNAI 4139, pp.110–120, 2006   DOI   ScienceOn
11 Y. Yang and J. O. Pedersen, 'A Comparative Study on Feature Selection in Text Categorization,' Proceedings of the Fourteenth International Conference on Machine Learning, 1997, pp.412-420
12 http://www.puffinwarellc.com/p3b.htm
13 http://en.wikipedia.org/wiki/Information_retrieval
14 Zhiwei Zhang, Xuan-Hieu Phan, Susumu Horiguchi, 'An Efficient Feature Selection using Hidden Topics in Text Categorization,' 22nd International Conference on Advanced Information Networking and Application, 2008
15 C. Andrieu, N. D. Freitas, A. Doucet, and M. I. Jordan, 'An introduction to MCMC for machine learning,' Machine Learning, Vol.50, 2003, pp. 5–43   DOI
16 B.C. Russell, A.A. Efros, J. Sivic, W.T. Freeman, and A. Zisserman, 'Using Multiple Segmentations to Discover Objects and their Extent in Image Collections,' Proceedings of CVPR, June, 2006
17 T. Hofmann, 'Latent semantic models for collaborative filtering,' ACM TOIS, Vol.22, no.1, 2004, pp.89-115   DOI   ScienceOn
18 F. Sebastiani, 'Machine learning in automated text categorization,' ACM Computing Surveys, Vol.34, no.1, 2002, pp.1-47   DOI   ScienceOn