DOI QR코드

DOI QR Code

Graph based KNN for Optimizing Index of News Articles

  • Received : 2016.08.12
  • Accepted : 2016.10.10
  • Published : 2016.09.30

Abstract

This research proposes the index optimization as a classification task and application of the graph based KNN. We need the index optimization as an important task for maximizing the information retrieval performance. And we try to solve the problems in encoding words into numerical vectors, such as huge dimensionality and sparse distribution, by encoding them into graphs as the alternative representations to numerical vectors. In this research, the index optimization is viewed as a classification task, the similarity measure between graphs is defined, and the KNN is modified into the graph based version based on the similarity measure, and it is applied to the index optimization task. As the benefits from this research, by modifying the KNN so, we expect the improvement of classification performance, more graphical representations of words which is inherent in graphs, the ability to trace more easily results from classifying words. In this research, we will validate empirically the proposed version in optimizing index on the two text collections: NewsPage.com and 20NewsGroups.

Keywords

References

  1. T. Jo, "The Implementation of Dynamic Document Organization using Text Categorization and Text Clustering," PhD Dissertation of University of Ottawa, 2006.
  2. N. F. Noy and C. D. Hafner, "State of the Art in Ontology Design," AI Magazine, vol. 18, no. 3, 1997.
  3. D. Allemang and J. Hendler, Semantic Web for the Working Ontologies, Mrgan Kaufmann, 2011.
  4. F. Sebastiani, "Machine Learning in Automated Text Categorization," ACM Computing Survey, vol. 34, no. 1, pp. 1-47, 2002. https://doi.org/10.1145/505282.505283
  5. H. Lodhi, C. Saunders, J. Shawe-Taylor, N. Cristianini and C. Watkins, "Text Classification with String Kernels," Journal of Machine Learning Research, vol. 2, no. 2, pp. 419-444, 2002.
  6. C. S. Leslie, E. Eskin, A. Cohen, J. Weston, and W. S. Noble, "Mismatch String Kernels for Discriminative Protein Classification," Bioinformatics, vol. 20, no. 4, pp. 467-476, 2004. https://doi.org/10.1093/bioinformatics/btg431
  7. R. J. Kate and R. J. Mooney, "Using String Kernels for Learning Semantic Parsers," in Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, pp. 913-920, 2006.
  8. T. Jo and D. Cho, "Index based Approach for Text Categorization," International Journal of Mathematics and Computers in Simulation, vol. 2, no. 1, pp. 127-132, 2008.
  9. T. Jo, "Single Pass Algorithm for Text Clustering by Encoding Documents into Tables," Journal of Korea Multimedia Society, vol. 11, no. 12, pp. 1749-1757, 2008.
  10. T. Jo, "Device and Method for Categorizing Electronic Document Automatically," Patent Document, 10-2009-0041272, 10-1071495, 2011.
  11. T. Jo, "Normalized Table Matching Algorithm as Approach to Text Categorization," Soft Computing, vol. 19, no. 4, pp. 839-849, 2015. https://doi.org/10.1007/s00500-014-1411-9
  12. T. Jo, "Inverted Index based Modified Version of KMeans Algorithm for Text Clustering," Journal of Information Processing Systems, vol. 4, no. 2, pp. 67-76, 2008. https://doi.org/10.3745/JIPS.2008.4.2.067
  13. T. Jo, "Representation of Texts into String Vectors for Text Categorization," Journal of Computing Science and Engineering, vol. 4, no. 2, pp. 110-127, 2010. https://doi.org/10.5626/JCSE.2010.4.2.110
  14. T. Jo, "NTSO (Neural Text Self Organizer): A New Neural Network for Text Clustering," Journal of Network Technology, vol. 1, no. 1, pp. 31-43, 2010.
  15. T. Jo, "NTC (Neural Text Categorizer): Neural Network for Text Categorization," International Journal of Information Studies, vol. 2, no. 2, pp83-96, 2010.