Browse > Article
http://dx.doi.org/10.5351/KJAS.2021.34.5.795

Topic change monitoring study based on Blue House national petition using a control chart  

Lee, Heeyeon (Department of Biostatistics of the Catholic Research Coordinating Center, Catholic University)
Choi, Jieun (Department of Statistics, Dankook University)
Lee, Sungim (Department of Statistics, Dankook University)
Son, Won (Department of Statistics, Dankook University)
Publication Information
The Korean Journal of Applied Statistics / v.34, no.5, 2021 , pp. 795-806 More about this Journal
Abstract
Recently, as text data through online channels have become vast, there is a growing interest in research that summarizes and analyzes them. One of the fundamental analyses of text data is to extract potential topics. Although the researcher may read all the data and summarize the contents one by one, it is not easy to deal with large amounts of data. Blei and Lafferty (2007) and Blei et al. (2003) proposed topic modeling methods for extracting topics using a statistical model. Since the text data is generally collected over time, it is worthwhile to monitor the topic's changes. In this study, we propose a topic index based on the results of the topic model. In addition, a control chart, a representative tool for statistical process management, is applied to monitor the topic index over time. As a practical example, we use text data collected from Blue House National Petition boards between March 5, 2018, and March 5, 2020.
Keywords
text data; LDA model; topic monitoring; EWMA chart;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Bang H and Moon H (2019). A study on the methodology to express the main topics of text in time series using text mining, Journal of the Korean data and information science socieity, 30, 1259-1276.   DOI
2 Blei DM, Ng AY, and Jordan MI (2003). Latent Dirichlet allocation, Journal of Machine Learning Research, 3, 993-1022.
3 Griffiths TL and Steyvers M (2004). Finding scientific topics. In Proceedings of the National Academy of Sciences of the United States of America, 101, 5228-5235.   DOI
4 Montgomery DG (2000). Introduction to Statistical Quality Control, John Wiley & Sons, New York.
5 Arun R, Suresh V, Madhavan CEV, and Murthy MN (2010). On finding the natural number of topics with latent dirichlet allocation: Some observation, Pacific-Asia conference on Knowledge Discovery and Data Mining, Par I, LNAI (6118), 391-402.
6 Blei DM, Jordan MI (2003). Modeling annotated data. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 127-134.
7 Cao J, Xia T, Li J, and Zhang Y, and Tang S (2009). A density-based method for adaptive LDA model selection, Neurocomputing, 72, 1775-1781.   DOI
8 Deveaud R, SanJuan E, and Bellot P (2014). Accurate and effective latent concept modeling for ad hoc information retrieval, Document Numerique, 17, 61-84.   DOI
9 Lucas JM and Saccucci MS (1990). Exponentially weighted moving average control schemes: properties and enhancement, Technometrics, 32, 1-12.   DOI
10 Roberts SW (1959). Control chart tests based on geometric moving averages, Technometrics, 41, 97-101.
11 Blei DM and Lafferty JD (2007). A correlated topic model of science, The Annals of Applied Statistics, 1, 17-45.   DOI
12 Knoth S (2007). Accurate ARL calculation for EWMA control charts monitoring simultaneously normal mean and variance, Sequential Analysis, 26, 151-264.   DOI