Representation of Texts into String Vectors for Text Categorization

Jo, Tae-Ho;

doi:10.5626/JCSE.2010.4.2.110

Journal of Computing Science and Engineering

Volume 4 Issue 2
/
Pages.110-127
/
2010
/
1976-4677(pISSN)
/
2093-8020(eISSN)

Korean Institute of Information Scientists and Engineers (한국정보과학회)

DOI QR Code

Representation of Texts into String Vectors for Text Categorization

Jo, Tae-Ho (School of Computer and Information Engineering, Inha University)

Received : 2009.08.07
Accepted : 2010.02.16
Published : 2010.06.30

https://doi.org/10.5626/JCSE.2010.4.2.110 Citation PDF

Download PDF

⟨ Previous Next ⟩

Abstract

In this study, we propose a method for encoding documents into string vectors, instead of numerical vectors. A traditional approach to text categorization usually requires encoding documents into numerical vectors. The usual method of encoding documents therefore causes two main problems: huge dimensionality and sparse distribution. In this study, we modify or create machine learning-based approaches to text categorization, where string vectors are received as input vectors, instead of numerical vectors. As a result, we can improve text categorization performance by avoiding these two problems.

Keywords

Cited by

An innovative multi-segment strategy for the classification of legal judgments using the k-nearest neighbour classifier 2017, https://doi.org/10.1007/s40747-017-0042-z
How to Improve Text Summarization and Classification by Mutual Cooperation on an Integrated Framework vol.60, 2016, https://doi.org/10.1016/j.eswa.2016.05.001
Statistical Text Summarization Using a Category-Based Language Model on a Bootstrapping Framework vol.27, pp.03, 2018, https://doi.org/10.1142/S0218213018500148

Journal of Computing Science and Engineering

Representation of Texts into String Vectors for Text Categorization

Abstract

Keywords

Cited by

Detail Search