DOI QR코드

DOI QR Code

A Method of Calculating Topic Keywords for Topic Labeling

토픽 레이블링을 위한 토픽 키워드 산출 방법

  • 김은회 (서일대학교 소프트웨어공학과) ;
  • 서유화 (숭실대학교 베어드교양대학)
  • Received : 2020.08.26
  • Accepted : 2020.09.15
  • Published : 2020.09.30

Abstract

Topics calculated using LDA topic modeling have to be labeled separately. When labeling a topic, we look at the words that represent the topic, and label the topic. Therefore, it is important to first make a good set of words that represent the topic. This paper proposes a method of calculating a set of words representing a topic using TextRank, which extracts the keywords of a document. The proposed method uses Relevance to select words related to the topic with discrimination. It extracts topic keywords using the TextRank algorithm and connects keywords with a high frequency of simultaneous occurrence to express the topic with a higher coverage.

Keywords

References

  1. 박종순, 김창식, "빅데이터 연구동향 분석: 토픽모델링을 중심으로," 디지털산업정보학회논문지, 제15권, 제1호, 2019, pp.1-7. https://doi.org/10.17662/KSDIM.2019.15.1.001
  2. 김창식, 김남규, 곽기영, "머신러닝 및 딥러닝 연구동향 분석: 토픽모델링을 중심으로," 디지털산업정보학회논문지, 제15권, 제2호, 2019, pp.19-28. https://doi.org/10.17662/ksdim.2019.15.2.019
  3. David M. Blei, Andrew Y. Ng, and Michael I. Jordan, "Latent Dirichlet Allocation," Journal of Machine Learning Research, Vol. 3, Mar. 2003, pp. 993-1022..
  4. Q. Mei, X. Shen, and C.X. Zhai, "Automatic labeling of multinomial topic models." In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, 2007, pp.490-499.
  5. Jey Han Lau, David Newman, Sarvnaz Karimi, and Timothy Baldwin, "Best topic word selection for topic labelling," In Proceedings of the 23rd International Conference on Computational Linguistics: Posters (COLING'10), Association for Computational Linguistics, 2010, pp.605-613.
  6. Jey Han Lau, Karl Grieser, David Newman, and Timothy Baldwin, "Automatic labelling of topic models," In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1, Association for Computational Linguistics, 2011, pp.1536-1545.
  7. Ioana Hulpus, Conor Hayes, Marcel Karnstedt, and Derek Greene, "Unsupervised graph-based topic labelling using dbpedia," In Proceedings of the sixth ACM international conference on Web search and data mining (WSDM '13), Association for Computing Machinery, 2013, pp.465-474.
  8. S. Bhatia, J. H. Lau, and T. Baldwin, "Automatic labelling of topics with neural embeddings," in 26th COLING International Conference on Computational Linguistics, 2016, pp.953-963.
  9. Mihalcea, Rada and Tarau, Paul, "TextRank: Bringing Order into Text," Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Jul. 2004, pp.404-411.
  10. Carson Sievert and Kenneth E. Shirley, "LDAvis: A method for visualizing and interpreting topics," Proceedings of the Workshop on Interactive Language Learning, Visualization, and Interfaces, 2014, pp.63-70.
  11. Mallet, http://mallet.cs.umass.edu/
  12. Gensim, https://radimrehurek.com/gensim/
  13. Komoran, https://www.shineware.co.kr/products/komoran/