DOI QR코드

DOI QR Code

A Convergence Study on the Topic and Sentiment of COVID19 Research in Korea Using Text Analysis

텍스트 분석을 이용한 코로나19 관련 국내 논문의 주제 및 감성에 관한 융합 연구

  • Heo, Seong-Min (Dept. of Applied Mathematics, Kumoh National Institute of Technology) ;
  • Yang, Ji-Yeon (Dept. of Applied Mathematics, Kumoh National Institute of Technology)
  • 허성민 (금오공과대학교 응용수학과) ;
  • 양지연 (금오공과대학교 응용수학과)
  • Received : 2021.02.24
  • Accepted : 2021.04.20
  • Published : 2021.04.28

Abstract

The purpose of this study was to explore research topics and examine the trend in COVID19 related research papers. We identified eight topics using latent Dirichlet allocation and found acceptable validity in comparison with the structural topic model. The subtopics have been extracted using k-means clustering and plotted in PCA space. Additionally, we discovered the topics bearing negative tones and warning signs by sentiment analysis. The results flagged up the issues of the topics, Biomedical Related, International Dynamics and Psychological Impact. The findings could serve as a guideline for researchers who explore new research directions and policymakers who need to make decisions about which research projects to support.

본 연구에서는 코로나19 관련 연구논문의 연구주제를 탐색하고 동향을 검토하고 있다. 또한 감성분석을 통해 부정적인 어조가 강한 경고가 되는 주제들을 알아본다. 잠재 디리슐레 할당(LDA)를 이용하여 총 8개의 토픽을 발견하였고, 이를 구조적 토픽 모델링(STM)과 비교하여 비교적 안정적인 결과임을 확인하였다. 또한 k-means 군집 알고리즘을 통해 각 토픽별로 세부 연구주제를 발견하였고 주성분 분석을 이용하여 이를 시각적으로 표현하였다. 감성분석을 통해 각 토픽별 긍정적, 부정적인 단어들을 살펴보고 감성점수를 계산하여 연구논문의 주된 어조를 파악하였는데, 특히 생물 의학 관련, 국제적 역학관계, 심리적 영향과 관련된 연구에서 부정적인 어조가 강한 것으로 나타나 해당 부문에 대해서 주의와 관심이 요구된다. 향후 연구자들이 연구의 방향성을 탐색하고 정책결정자들이 연구지원 사업을 결정하는데 기초자료로 활용될 수 있을 것이다.

Keywords

References

  1. Ministry of Health and Welfare, http://ncov.mohw.go.kr/
  2. F. Stephany, N. Stoehr, P. Darius, L. Neuhauser, O. Teutloff & F. Braesemann. (2020). The CoRisk-Index: A data-mining approach to identify industry-specific risk assessments related to COVID-19 in real-time. arXiv preprint arXiv:2003.12432.
  3. R. M. del Rio-Chanona, P. Mealy, A. Pichler, F. Lafond & J. D. Farmer. (2020). Supply and demand shocks in the COVID-19 pandemic: An industry and occupation perspective. Oxford Review of Economic Policy, 36(Supplement_1), 94-137.
  4. S. Ramelli & A. Wagner. (2020). What the stock market tells us about the consequences of COVID-19. Mitigating the COVID Economic Crisis: Act Fast and Do Whatever, 63-70.
  5. K. Lybarger, M. Ostendorf, M. Thompson & M. Yetisgen. (2020). Extracting covid-19 diagnoses and symptoms from clinical text: A new annotated corpus and neural event extraction framework. arXiv preprint arXiv:2012.00974.
  6. X. Cheng, Q. Cao & S. S. Liao. (2020). An overview of literature on COVID-19, MERS and SARS: Using text mining and latent Dirichlet allocation. Journal of Information Science, 1-17. DOI : 10.1177/0165551520954674
  7. J. H. Bettencourt-Silva et al. (2020). Exploring the Social Drivers of Health During a Pandemic: Leveraging Knowledge Graphs and Population Trends in COVID-19. Studies in Health Technology and Informatics, 275, 6-11. DOI : 10.3233/SHTI200684
  8. A. Walker, C. Hopkins & P. Surda. (2020). Use of Google Trends to investigate loss-of-smell-related searches during the COVID-19 outbreak. In International forum of allergy & rhinology, 10(7), 839-847. DOI : 10.1002/alr.22580
  9. K. Garcia & L. Berton. (2021). Topic detection and sentiment analysis in Twitter content related to COVID-19 from Brazil and the USA. Applied Soft Computing, 101. DOI : 10.1016/j.asoc.2020.107057
  10. A. Abd-Alrazaq, D. Alhuwail, M. Househ, M. Hamdi & Z. Shah. (2020). Top concerns of tweeters during the COVID-19 pandemic: infoveillance study. Journal of medical Internet research, 22(4). DOI : 10.2196/19016
  11. K. Chakraborty, S. Bhatia, S. Bhattacharyya, J. Platos, R. Bag & A. E. Hassanien. (2020). Sentiment Analysis of COVID-19 tweets by Deep Learning Classifiers-A study to show how popularity is affecting accuracy in social media. Applied Soft Computing, 97. DOI : 10.1016/j.asoc.2020.106754
  12. S. K. Brooks et al. (2020). The psychological impact of quarantine and how to reduce it: rapid review of the evidence. The lancet, 395(10227), 912-920. DOI : 10.1016/S0140-6736(20)30460-8
  13. A. Kusters & E. Garrido. (2020). Mining PIGS. A structural topic model analysis of Southern Europe based on the German newspaper Die Zeit (1946-2009). Journal of Contemporary European Studies, 28(4), 477-493. DOI : 10.1080/14782804.2020.1784112
  14. B. M'sik & B. M. Casablanca. (2020). Topic Modeling Coherence: A Comparative Study between LDA and NMF Models using COVID'19 Corpus. International Journal, 9(4). DOI : 10.30534/ijatcse/2020/231942020
  15. S. M. Heo & J. Y. Yang. (2020). Analysis of Research Topics and Trends on COVID-19 in Korea Using Latent Dirichlet Allocation (LDA). Journal of The Korea Society of Computer and Information, 25(12), 83-91. DOI : 10.9708/jksci.2020.25.12.083
  16. D. H. Lee, Y. J. Kim, D. H. Lee, H. H. Hwang, S. K. Nam & J. Y. Kim. (2020). The Influence of Public Fear, and Psycho-social Experiences during the Coronavirus Disease 2019(COVID-19) Pandemic on Depression and Anxiety in South Korea. The Korean Journal of Counseling and Psychotherapy, 32(4), 2119-2156. DOI : 10.23844/kjcp.2020.11.32.4.2119
  17. E. J. Kim, H. M. Sim, J. W. Won & B. J. Kang. (2020). Mapping the COVID-19 Issues from an Urban Perspective in South Korea - Text Mining Analysis Focused on Newspaper Articles. Journal of the Urban Design Institute of Korea Urban Design, 21(6), 163-179. DOI : 10.38195/judik.2020.12.21.6.163
  18. Y. H. Kim. (2020). Exploration of social conflict issues and future signals since the outbreak of COVID-19 in Korea: Using the keywords of news articles. In conference of Korean Academy of Social Welfare, 565-589.
  19. S. Y. Song & H. K. Kim. (2020). Exploring Factors Influencing College Students' Satisfaction and Persistent Intention to Take Non-Face-to-Face Courses during the COVID-19 Pandemic. Asian Journal of Education, 21(4), 1099-1126. DOI : 10.15753/aje.2020.12.21.4.1099
  20. S. B. Kim. (2020). COVID-19 and the Complex Geopolitics of Emerging Security : The Emergence of Pandemic and the Transformation of World Politics. Korean Political Science Review, 54(4), 53-81. DOI : 10.18854/kpsr.2020.54.4.003
  21. M. W. Lee & J. E. You. (2020). The Socio-Economic Effects of COVID-19: Focusing on Consumer Expenditure and Labor Market. Asia-Pacific Journal of Business & Commerce, 12(3), 121-141. DOI : 10.35183/ajbc.2020.11.12.3.121
  22. J. S. Kim, N. K. Kang, S. M. Park, E. J. Lee & K. T. Chung. (2020). Diagnostic Techniques for SARS-CoV-2 Detection. Journal of Life Science, 30(8), 731-741. DOI : 10.5352/JLS.2020.30.8.731
  23. H. G. Oh. (2020). Analysis of major social changes and information security issues after COVID-19. Communications of the Korean Institute of Information Scientists and Engineers, 38(9), 48-56.
  24. S. M. Lee, S. E. Ryu. & S. J. Ahn. (2020). Mass Media and Social Media Agenda Analysis Using Text Mining : focused on '5-day Rotation Mask Distribution System'. JOURNAL OF THE KOREA CONTENTS ASSOCIATION. 20(6), 460-469. DOI : 10.5392/JKCA.2020.20.06.460
  25. D. M. Blei, A. Y. Ng & M. I. Jordan. (2003). Latent dirichlet allocation. the Journal of machine Learning research, 3, 993-1022. DOI : 10.1162/jmlr.2003.3.4-5.993
  26. J. Y. Yang. (2019). Convergence Study on Research Topics for Thyroid Cancer in Korea. Journal of the Korea Convergence Society, 10(2), 75-81. DOI : 10.15207/JKCS.2019.10.2.075
  27. M. E. Roberts, B. M., Stewart & E. M. Airoldi. (2016). A model of text for experimentation in the social sciences. Journal of the American Statistical Association, 111(515), 988-1003. DOI : 10.1080/01621459.2016.1141684
  28. M. E. Roberts, B. M. Stewart & D. Tingley. (2019). Stm: An R package for structural topic models. Journal of Statistical Software, 91(1), 1-40. DOI : 10.18637/jss.v091.i02
  29. J. Cao, T. Xia, J. Li, Y. Zhang & S. Tang. (2009). A density-based method for adaptive LDA model selection. Neurocomputing, 72(7-9), 1775-1781. DOI : 10.1016/j.neucom.2008.06.011
  30. R. Arun, V. Suresh, C. V. Madhavan & M. N. Murthy. (2010, June). On finding the natural number of topics with latent dirichlet allocation: Some observations. In Pacific-Asia conference on knowledge discovery and data mining (pp. 391-402). Berlin, Heidelberg. : Springer. DOI : 10.1007/978-3-642-13657-3_43
  31. T. L. Griffiths & M. Steyvers. (2004). Finding scientific topics. Proceedings of the National academy of Sciences, 101(suppl 1), 5228-5235. DOI: 10.1073/pnas.0307752101
  32. R. Deveaud, E. SanJuan & P. Bellot. (2014). Accurate and effective latent concept modeling for ad hoc information retrieval. Document numerique, 17(1), 61-84. DOI : 10.3166/DN.17.1.61-84
  33. K. Krippendorff. (2018). Content analysis: An introduction to its methodology. Los Angeles : Sage publications.
  34. A. F. Hayes & K. Krippendorff. (2007). Answering the call for a standard reliability measure for coding data. Communication methods and measures, 1(1), 77-89. DOI : 10.1080/19312450709336664
  35. C. Buchta, M. Kober, I. Feinerer & K. Hornik. (2012). Spherical k-means clustering. Journal of Statistical Software, 50(10), 1-22.
  36. P. J. Rousseeuw. (1987). Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of computational and applied mathematics, 20, 53-65. DOI : 10.1016/0377-0427(87)90125-7
  37. I. T. Jolliffe. (2002). Principal Component Analysis. New York : Springer-Verlag
  38. M. Hu & B. Liu. (2004, August). Mining and summarizing customer reviews. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 168-177). Seattle : KDD'04
  39. F. A. Nielsen. (2011). A new ANEW: Evaluation of a word list for sentiment analysis in microblogs. arXiv preprint arXiv:1103.2903.
  40. H. M. Salihu, A. A. Salinas-Miranda, L. Hill & K. Chandler. (2013). Survival of pre-viable preterm infants in the United States: a systematic review and meta-analysis. In Seminars in perinatology, 37(6), 389-400. DOI : 10.1053/j.semperi.2013.06.021
  41. H. J. Song. et al. (2020). In validations we trust? The impact of imperfect human annotations as a gold standard on the quality of validation of automated content analysis. Political Communication, 37(4), 550-572. DOI : 10.1080/10584609.2020.1723752