Browse > Article
http://dx.doi.org/10.9716/KITS.2019.18.4.135

Keyword Reorganization Techniques for Improving the Identifiability of Topics  

Yun, Yeoil (국민대학교 비즈니스IT 전문대학원)
Kim, Namgyu (국민대학교 경영정보학부)
Publication Information
Journal of Information Technology Services / v.18, no.4, 2019 , pp. 135-149 More about this Journal
Abstract
Recently, there are many researches for extracting meaningful information from large amount of text data. Among various applications to extract information from text, topic modeling which express latent topics as a group of keywords is mainly used. Topic modeling presents several topic keywords by term/topic weight and the quality of those keywords are usually evaluated through coherence which implies the similarity of those keywords. However, the topic quality evaluation method based only on the similarity of keywords has its limitations because it is difficult to describe the content of a topic accurately enough with just a set of similar words. In this research, therefore, we propose topic keywords reorganizing method to improve the identifiability of topics. To reorganize topic keywords, each document first needs to be labeled with one representative topic which can be extracted from traditional topic modeling. After that, classification rules for classifying each document into a corresponding label are generated, and new topic keywords are extracted based on the classification rules. To evaluated the performance our method, we performed an experiment on 1,000 news articles. From the experiment, we confirmed that the keywords extracted from our proposed method have better identifiability than traditional topic keywords.
Keywords
Classification Rules; Keywords Reorganization; Text Mining; Topic Evaluation; Topic Modeling;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Hu. Y., J.B. Graber, B. Satinoff, and A. Smith, "Interactive Topic Modeling", Machine Learning, Vol.95, No.3, 2014, 423-469.   DOI
2 Jey, H.L. and T. Baldwin, "The Sensitivity of Topic Coherence Evaluation to Topic Cardinality", Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics : Human Language Technologies, 2016, 483-487.
3 Jey, H.L., N. David, and T. Baldwin, "Machine Reading Tea Leaves : Automatically Evaluating Topic Coherence and Topic Model Quality", Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, 2014, 530-539.
4 Lu, W. and C. Zhai, "Opinion Integration Through Semi-Supervised Topic Modeling", WWW '08 Proceedings of the 17th international conference on World Wide Web, 2008, 121-130.
5 Mikolov T., I. Sutskever, K. Chen, G.S. Corrado, and J. Dean, "Distributed Representations of Words and Phrases and Their Compositionality", NIPS '13 Proceedings of the 26th International Conference on Neural Information Processing Systems-Volume 2, 2013, 3111-3119.
6 Mimmo, D., H.M. Wallach, E. Talley, M. Leenders, and A. McCallum, "Optimizing Semantic Coherence in Topic Models", EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2011, 262-272.
7 Newman, D., J.H. Lau, G. Karl, and T. Baldwin, "Automatic Evaluation of Topic Coherence", HLT '10 Human Language Technologies : The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, 2010, 100-108.
8 Papadimitriou, C.H., P. Raghavan, H. Tamaki, and S. Vempala, "Latent Semantic Indexing : A Probabilistic Analysis", PODS '98 Proceedings of the 17th ACM SIGACTSIGMOD-SIGART Symposium on Principles of Database Systems, 1998, 159-168.
9 Wang, C. and D.M. Blei, "Collaborative Topic Modeling for Recommending Scientific Articles", KDD '11 Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2011, 448-456.
10 Vineet, M., R.S. Caceres, and K.M. Carter, "Evaluating Topic Quality Using Model Clustering", 2014 IEEE Symposium on Computational Intelligence and Data Mining, 2014, 178-185.
11 Yan, X., J. Guo, Y. Lan, and X. Cheng, "A Biterm Topic Model for Short Texts", WWW '13 Proceedings of the 22nd international conference on World Wide Web, 2013, 1445-1456.
12 Hofmann, T., "Probabilistic Latent Semantic Indexing", UAI '99 Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence , 1999, 289-296.
13 Aletras, N. and S. Mark, "Evaluating topic coherence using distributional semantics", Proceedings of the 10th International Conference on Computational Semantics, 2013, 13-22.
14 Blei, D.M., A.Y. Ng, and M.I. Jordan, "Latent Dirichlet Allocation", Journal of Machine Learning Research, Vol.3, No.4-5, 2003, 993-1022.
15 Chang, J., B.G. Jordan, G. Sean, W. Chong, and M.B. David, "Reading Tea Leaves : How Humans Interpret Topic Models", NIPS '09 Proceedings of the 22nd International Conference on Neural Information Processing Systems, 2009, 288-296.
16 Fang, A., C. Macdonald, I. Ounis, and P. Habel., "Using Word Embedding to Evaluate the Coherence of Topics from Twitter Data", SIGIR '16 Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2016, 1057-1060.
17 Gao, W., P. Li, and K. Darwish, "Joint Topic Modeling for Event Summarization Across News and Social Media Streams", CIKM '12 Proceedings of the 21st ACM International Conference on Information and Knowledge Management, 2012, 1173-1182.
18 Greene, D., D.O. Callaghan, and P. Cunnungham, "How Many Topics? Stability Analysis for Topic Models", ECMLPKDD '14 Proceedings of the 2014th European Conference on Machine Learning and Knowledge Discovery in Databases-Volume Part I , 2014, 498-513.
19 Gretarsson, B., J.O. Donovan, S. Bostandjiev, T. Hollerer, A. Asuncion, D. Newman, and P. Smyth, "TopicNets : Visual Analysis of Large Text Corpora with Topic Modeling", ACM Transactions on Intelligent Systems and Technology, Vol.3, No.2, 2012, 1-26.
20 Hu, B. and M. Ester, "Spatial Topic Modeling in Online Social Media for Location Recommendation", RecSys '13 Proceedings of the 7th ACM Conference on Recommender Systems, 2013, 25-32.
21 Asuncion, H.U., A.U. Asuncion, and R.N. Taylor, "Software Traceability with Topic Modeling", 2010 ACM/IEEE 32nd International Conference on Software Engineering, 2010, 95-104.