A Method of Calculating Topic Keywords for Topic Labeling

Kim, Eunhoe;Suh, Yuhwa;

doi:10.17662/ksdim.2020.16.3.025

Journal of Korea Society of Digital Industry and Information Management (디지털산업정보학회논문지)

Volume 16 Issue 3
/
Pages.25-36
/
2020
/
1738-6667(pISSN)
/
2713-9018(eISSN)

Korea Society of Digital Industry and Information Management (디지털산업정보학회)

DOI QR Code

A Method of Calculating Topic Keywords for Topic Labeling

토픽 레이블링을 위한 토픽 키워드 산출 방법

Kim, Eunhoe ;
Suh, Yuhwa

김은회 (서일대학교 소프트웨어공학과) ;
서유화 (숭실대학교 베어드교양대학)

Received : 2020.08.26
Accepted : 2020.09.15
Published : 2020.09.30

https://doi.org/10.17662/ksdim.2020.16.3.025 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Topics calculated using LDA topic modeling have to be labeled separately. When labeling a topic, we look at the words that represent the topic, and label the topic. Therefore, it is important to first make a good set of words that represent the topic. This paper proposes a method of calculating a set of words representing a topic using TextRank, which extracts the keywords of a document. The proposed method uses Relevance to select words related to the topic with discrimination. It extracts topic keywords using the TextRank algorithm and connects keywords with a high frequency of simultaneous occurrence to express the topic with a higher coverage.

Keywords

References

박종순, 김창식, "빅데이터 연구동향 분석: 토픽모델링을 중심으로," 디지털산업정보학회논문지, 제15권, 제1호, 2019, pp.1-7. https://doi.org/10.17662/KSDIM.2019.15.1.001
김창식, 김남규, 곽기영, "머신러닝 및 딥러닝 연구동향 분석: 토픽모델링을 중심으로," 디지털산업정보학회논문지, 제15권, 제2호, 2019, pp.19-28. https://doi.org/10.17662/ksdim.2019.15.2.019
David M. Blei, Andrew Y. Ng, and Michael I. Jordan, "Latent Dirichlet Allocation," Journal of Machine Learning Research, Vol. 3, Mar. 2003, pp. 993-1022..
Q. Mei, X. Shen, and C.X. Zhai, "Automatic labeling of multinomial topic models." In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, 2007, pp.490-499.
Jey Han Lau, David Newman, Sarvnaz Karimi, and Timothy Baldwin, "Best topic word selection for topic labelling," In Proceedings of the 23rd International Conference on Computational Linguistics: Posters (COLING'10), Association for Computational Linguistics, 2010, pp.605-613.
Jey Han Lau, Karl Grieser, David Newman, and Timothy Baldwin, "Automatic labelling of topic models," In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1, Association for Computational Linguistics, 2011, pp.1536-1545.
Ioana Hulpus, Conor Hayes, Marcel Karnstedt, and Derek Greene, "Unsupervised graph-based topic labelling using dbpedia," In Proceedings of the sixth ACM international conference on Web search and data mining (WSDM '13), Association for Computing Machinery, 2013, pp.465-474.
S. Bhatia, J. H. Lau, and T. Baldwin, "Automatic labelling of topics with neural embeddings," in 26th COLING International Conference on Computational Linguistics, 2016, pp.953-963.
Mihalcea, Rada and Tarau, Paul, "TextRank: Bringing Order into Text," Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Jul. 2004, pp.404-411.
Carson Sievert and Kenneth E. Shirley, "LDAvis: A method for visualizing and interpreting topics," Proceedings of the Workshop on Interactive Language Learning, Visualization, and Interfaces, 2014, pp.63-70.
Mallet, http://mallet.cs.umass.edu/
Gensim, https://radimrehurek.com/gensim/
Komoran, https://www.shineware.co.kr/products/komoran/

Journal of Korea Society of Digital Industry and Information Management (디지털산업정보학회논문지)

A Method of Calculating Topic Keywords for Topic Labeling

토픽 레이블링을 위한 토픽 키워드 산출 방법

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)