A Topic Analysis of Fine Particle Matter by Using Newspaper Articles (신문기사를 이용한 미세먼지 이슈의 토픽 분석)
-
- The Journal of the Korea Contents Association
- /
- v.22 no.6
- /
- pp.1-14
- /
- 2022
This study aims to identify topics in newspaper articles related to fine particle matter and to investigate the characteristics and time series trend of each topic. Related national newspaper articles during 1990 and 2021 were collected from Bigkinds. A total of 18 topics have been discovered using LDA, and 11 clusters deduced from clustering. Hot topics include related products/residence, overseas cause(China), power plant as a domestic cause, nationwide emergency reduction measures, international cooperation, political issues, current situation & countermeasure in other countries, and consumption patterns. Cold topics include the concentration standard and indoor air quality improvement. These findings would be useful in inferring the political direction and strategies. In particular, the consumer protection policy should be expanded as the related market is growing. It will also be necessary to pursue policies that will promote public safety and health, and that will enhance public consensus and international cooperation.
From January 2020 to October 2021, more than 500,000 academic studies related to COVID-19 (Coronavirus-2, a fatal respiratory syndrome) have been published. The rapid increase in the number of papers related to COVID-19 is putting time and technical constraints on healthcare professionals and policy makers to quickly find important research. Therefore, in this study, we propose a method of extracting useful information from text data of extensive literature using LDA and Word2vec algorithm. Papers related to keywords to be searched were extracted from papers related to COVID-19, and detailed topics were identified. The data used the CORD-19 data set on Kaggle, a free academic resource prepared by major research groups and the White House to respond to the COVID-19 pandemic, updated weekly. The research methods are divided into two main categories. First, 41,062 articles were collected through data filtering and pre-processing of the abstracts of 47,110 academic papers including full text. For this purpose, the number of publications related to COVID-19 by year was analyzed through exploratory data analysis using a Python program, and the top 10 journals under active research were identified. LDA and Word2vec algorithm were used to derive research topics related to COVID-19, and after analyzing related words, similarity was measured. Second, papers containing 'vaccine' and 'treatment' were extracted from among the topics derived from all papers, and a total of 4,555 papers related to 'vaccine' and 5,971 papers related to 'treatment' were extracted. did For each collected paper, detailed topics were analyzed using LDA and Word2vec algorithms, and a clustering method through PCA dimension reduction was applied to visualize groups of papers with similar themes using the t-SNE algorithm. A noteworthy point from the results of this study is that the topics that were not derived from the topics derived for all papers being researched in relation to COVID-19 (