Fig. 1. Associations of digital documents keywords
Fig. 2. Matrix representations for keyword data
Fig. 3. Overview of constructing an inverted index using q-grams
Fig. 4. Keyword samples
Fig. 4. Number of clusters with member counts
Fig. 5. Samples of top-20 cluster labels
Fig. 6. Sample cluster detail for “Finite Element Method”
Table 1. Data statistics
References
- O. Egozi, S. Markovitch, E. Gabrilovich, "Concept-Based Information Retrieval Using Explicit Semantic Analysis", ACM Transactions on Information Systems, Vol.29, No.2, pp.1-34, 2011. DOI: https://dx.doi.org/10.1145/1961209.1961211
- L. Li, R. Zhou, D. Huang, "Two-phase biomedical named entity recognition using CRFs", Computational Biology and Chemistry, Vol.33, No.4, pp.334-338, 2009. DOI: https://dx.doi.org/10.1016/j.compbiolchem.2009.07.004
- R. Meng, S. Zhao, S. Han, D. He, P. Brusilovsky, Y. Chi, "Deep Keyphrase Generation", Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.582-592, 2017. DOI: https://dx.doi.org/10.18653/v1/P17-1054
- Y. G. Kim, J. H. Suh, S. C. Park, "Visualization of patent analysis for emerging technology", Expert Systems with Applications, Vol.34, No.3, pp.1804-1812, 2008. DOI: https://dx.doi.org/10.1016/j.eswa.2007.01.033
- R. Meng, S. Zhao, S. Han, D. He, P. Brusilovsky, Y. Chi, "Deep Keyphrase Generation", Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.582-592, 2017. DOI: https://dx.doi.org/10.18653/v1/P17-1054
- J. Liu, J. Shang, C. Wang, X. Ren, J. Han, "Mining Quality Phrases from Massive Text Corpora", Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data - SIGMOD '15, pp.1729-1744, 2015. DOI: https://dx.doi.org/10.1145/2723372.2751523
- C. C. Aggarwal, C. A. Zhai, Survey of Text Clustering Algorithms. In Mining Text Data, pp.77-128, Springer US, 2012.
- C. D. Manning, P. Raghavan, H. Schutze, Introduction to Information Retrieval. Cambridge University Press, 2008.
- P. Willett, "The Porter stemming algorithm: then and now", Program, Vol.40, No.3, pp.219-223, 2006. DOI: https://dx.doi.org/10.1108/00330330610681295
- M. Sahami, T. D. Heilman, "A web-based kernel function for measuring the similarity of short text snippets", Proceedings of the 15th international conference on World Wide Web - WWW '06, pp.377-386, 2006. DOI: https://dx.doi.org/10.1145/1135777.1135834
- S. Tan, Y. Wang, G. Wu, "Adapting centroid classifier for document categorization", Expert Systems with Applications, Vol.38, No.8, pp.10264-10273, 2011. DOI: https://dx.doi.org/10.1016/j.eswa.2011.02.114
- T. Hasegawa, S. Sekine, R. Grishman, "Discovering relations among named entities from large corpora", Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics - ACL '04, pp.415-422, 2004. DOI: https://dx.doi.org/10.3115/1218955.1219008