DOI QR코드

DOI QR Code

토픽모델링을 활용한 한국산업경영시스템학회지의 최근 연구주제 분석

Recent Research Trend Analysis for the Journal of Society of Korea Industrial and Systems Engineering Using Topic Modeling

  • 박동준 (부경대학교 통계.데이터사이언스 전공) ;
  • 구평회 (부경대학교 시스템경영.안전공학부) ;
  • 오형술 (강원대학교 AI소프트웨어학과) ;
  • 윤 민 (부경대학교 응용수학과)
  • Dong Joon Park (Major of Statistics and Data Science, Pukyong National University) ;
  • Pyung Hoi Koo (Division of Systems Management and Safety Engineering, Pukyong National University) ;
  • Hyung Sool Oh (Department of AI & Software, Kangwon National University) ;
  • Min Yoon (Department of Applied Mathematics, Pukyong National University)
  • 투고 : 2023.08.30
  • 심사 : 2023.09.15
  • 발행 : 2023.09.30

초록

The advent of big data has brought about the need for analytics. Natural language processing (NLP), a field of big data, has received a lot of attention. Topic modeling among NLP is widely applied to identify key topics in various academic journals. The Korean Society of Industrial and Systems Engineering (KSIE) has published academic journals since 1978. To enhance its status, it is imperative to recognize the diversity of research domains. We have already discovered eight major research topics for papers published by KSIE from 1978 to 1999. As a follow-up study, we aim to identify major topics of research papers published in KSIE from 2000 to 2022. We performed topic modeling on 1,742 research papers during this period by using LDA and BERTopic which has recently attracted attention. BERTopic outperformed LDA by providing a set of coherent topic keywords that can effectively distinguish 36 topics found out this study. In terms of visualization techniques, pyLDAvis presented better two-dimensional scatter plots for the intertopic distance map than BERTopic. However, BERTopic provided much more diverse visualization methods to explore the relevance of 36 topics. BERTopic was also able to classify hot and cold topics by presenting 'topic over time' graphs that can identify topic trends over time.

키워드

과제정보

This work was supported by a Research Grant of Pukyong National University(2023). We appreciate anonymous referees in commenting to improve the quality of our paper.

참고문헌

  1. Angelov, D., Top2Vec: Distributed Representations of Topics, https://arxiv.org/abs/2008.09470
  2. BERTopic, https://maartengr.github.io/BERTopic/index.html.
  3. Blei, D.M., Ng, A.Y., and Jordan, M.I., Latent Dirichlet Allocation, Journal of Machine Learning Research, 2003, 3, pp. 993-1022.
  4. Carnerud, D., 25 Years of Quality Management Research-Outlines and Trends, International Journal of Quality & Reliability Management, 2018, Vol. 35, No. 1, pp. 208-231. https://doi.org/10.1108/IJQRM-01-2017-0013
  5. Cho, S.G. and Kim, S.B., Finding Meaningful Pattern of Key Words in IIE Transactions Using Text Mining, Journal of the Korean Institute of Industrial Engineers, 2012, Vol. 38, No. 1, pp. 67-73. https://doi.org/10.7232/JKIIE.2012.38.1.067
  6. Cho, G.H., Lim, S.Y., and Hur, S., An Analysis of the Research Methodologies and Techniques in the Industrial Engineering Using Text Mining, Journal of the Korean Institute of Industrial Engineers, 2014, Vol. 40, No. 1, pp. 52-59. https://doi.org/10.7232/JKIIE.2014.40.1.052
  7. C-TF-IDF, https://maartengr.github.io/BERTopic/getting_started/ctfidf/ctfidf.html.
  8. Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., and Harshman, R., Indexing by Latent Semantic Analysis, Journal of the American Society for Information Science, 1990, Vol. 41, No. 6, pp. 391-407. https://doi.org/10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9
  9. Diaz-Papkovich, A., Anderson-Trocme, L., and Gravel, S., A Reviw of UMAP in Population Genetics, Journal of Human Genetics, 2021, Vol. 66, pp. 85-91. https://doi.org/10.1038/s10038-020-00851-4
  10. Egger, R. and Yu, J., A Topic Modeling Comparison Between LDA, NMF, Top2Vec, and BERTopic to Demystify Twitter Posts, Frontiers in Socialogy, 2022, May, Vol. 7, pp. 1-16. https://doi.org/10.3389/fsoc.2022.886498
  11. Gaussier, E. and Coutte, C., Relation between PLSA and NMF and Implications, Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2005, August, pp. 601-602.
  12. GENSIM Latent Dirichlet Allocation, https://radimrehurek.com/gensim/models/ldamodel.html
  13. Hapke, H., Howard, C., and Lane, H., Natural Language Processing in Action: Understanding, Analyzing, and Generating Text with Python, 2019, Manning.
  14. Hearst, M., What is Text Mining?, SIMA, https://www.jaist.ac.jp/~bao/MOT-Ishikawa/Furt-herReadingNo1.pdf.
  15. Hofmann, T., Unsupervised Learning by Probabilistic Latent Semantic Analysis, Machine Learning, 2001, 42, pp. 177-196. https://doi.org/10.1023/A:1007617005950
  16. Hong, J.L., Yu, M.R., and Choi, B.R., An Analysis of Mobile Augmented Reality App Reviews Using Topic Modeling, Journal of Digital Contents Society, 2019, Vol. 20, No. 7, pp. 1417-1427. https://doi.org/10.9728/dcs.2019.20.7.1417
  17. Jeong, B.K. and Lee, H.Y., Research Topics in Industrial Engineering 2001-2015, Journal of the Korean Institute of Industrial Engineers, 2016, Vol. 42, No. 6, pp. 421-431. https://doi.org/10.7232/JKIIE.2016.42.6.421
  18. Jin, Y., Development of Word Cloud Generator Software Based on Python, Science Direct, 2017, Vol. 174, pp. 788-792. https://doi.org/10.1016/j.proeng.2017.01.223
  19. Jin, S.A., Heo, G.E., Jeong, Y.K., and Song, M., Topic-Network Based Topic Shift Detection on Twitter, Journal of the Korean Society for Information Management, 2013, Vol. 3, pp. 285-302. https://doi.org/10.3743/KOSIM.2013.30.1.285
  20. Kim, S.K. and Jang, S.Y., A Study on the Research Trends in Domestic Industrial and Management Engineering Using Topic Modeling, Journal of the Korea Management Engineers Society, 2016, Vol. 21, No. 3, pp. 71-95.
  21. Kim, M.K., Lee, Y., and Han, C.H., Analysis of Consulting Research Trends Using Topic Modeling, Journal of Korean Society of Industrial and Systems Engineering, 2017, Vol. 40, No. 4, pp. 46-54. https://doi.org/10.11627/jkise.2017.40.4.046
  22. Ko, K.S. and Yang, J.K., Industrial Safety Risk Analysis Using Spatial Analytics and Data Mining, Journal of Korean Society of Industrial and Systems Engineering, 2017, Vol. 40, No. 4, pp. 46-54. https://doi.org/10.11627/jkise.2017.40.4.046
  23. Kwon, S.H., Anomaly Detection of Big Time Series Data Using Machine Learning, Journal of Korean Society of Industrial and Systems Engineering, 2020, Vol. 43, No. 2, pp. 33-38. https://doi.org/10.11627/jkise.2020.43.2.033
  24. Landauer, T.K., Foltz, P.W., and Laham, D., An Introduction to Latent Semantic Analysis, Discourse Processes, 1998, Vol. 25:2-3, pp. 259-284. https://doi.org/10.1080/01638539809545028
  25. Langley, P., Selection of Relevant Features in Machine Learning, AAAI Technical Report FS-94-02, 1994, pp. 127-131.
  26. Newman, D., Lau, J.H., Grieser, K., and Baldwin, T., Automatic Evaluation of Topic Coherence, Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the ACL, 2010, pp. 100-108.
  27. Park, C.E. and Lee, C.K., Sentimental Analysis of Korean Movie Review Using Variational Inference and RNN Based on BERT, The Korean Institute of Information Scientists and Engineers Transactions on Computing Practices, 2019, Vol. 25, No. 11, pp. 552-558. https://doi.org/10.5626/KTCP.2019.25.11.552
  28. Park, D.J., Oh, H.S., Kim, H.G., and Yoon, M., Topic Modeling Analysis Comparision for Research Topic in Korean Society of Industrial and Systems Engineering: Concentrated on Research Papers from 1978- 1999, Journal of Korean Society of Industrial and Systems Engineering, 2021, Vol. 44, No. 4, pp. 113-127. https://doi.org/10.11627/jksie.2021.44.4.113
  29. Park, J.H. and Song, M., A Study on the Research Trends in Library & Information Science in Korea using Topic Modeling, Journal of the Korean Society for Information Management, 2013, Vol. 1, pp. 7-32.
  30. pyLDAvis Documentation Release 2.2.2. August 24, 2018., https://buildmedia.readthedocs.org/media/pdf/py ldavis/latest/pyldavis.pdf.
  31. Ramage, D., Rosen, E., Chuang, J., Manning, C.D., and McFarland, D.A., Topic Modeling for the Social Sciences, NIPS Workshop, 2009, pp. 1-4.
  32. Ree, S.B., Analysis of Research Trends in Journal of Korean Society for Quality Management by Text Mining Processing, The Journal of Korean Society for Quality Management, 2019, 47, pp 597-613.
  33. Seo, H.B. and Lee, H.Y., PSS Research Trend, Proceeding of Spring Conference in the Korea Society for Simulation, 2017, pp. 997-1017.
  34. Syed, S. and Spruit, M., Full-Text or Abstract? Examining Topic Coherence Scores Using Latent Dirichlet Allocation, International Conference on Data Science and Advanced Analytics, 2017, IEEE, pp. 165-174.
  35. Teh, Y.W., Jordan, M., Beal, M.J., and Blei, D.M., Sharing Clusters Among Related Groups: Hierarchical Dirichlet Processes, Journal of the American Statistical Association, 2006, 101, pp. 1566-1581. https://doi.org/10.1198/016214506000000302
  36. Vayansky, I. and Kumar, S.A.P., A Review of Topic Modeling Methods, Information Systems, 2020, 94, pp. 1-15. https://doi.org/10.1016/j.is.2020.101582