DOI QR코드

DOI QR Code

Misinformation Detection and Rectification Based on QA System and Text Similarity with COVID-19

  • Insup Lim (School of Business, Hanyang University) ;
  • Namjae Cho (School of Business, Hanyang University)
  • Received : 2021.02.18
  • Accepted : 2021.10.24
  • Published : 2021.10.31

Abstract

As COVID-19 spread widely, and rapidly, the number of misinformation is also increasing, which WHO has referred to this phenomenon as "Infodemic". The purpose of this research is to develop detection and rectification of COVID-19 misinformation based on Open-domain QA system and text similarity. 9 testing conditions were used in this model. For open-domain QA system, 6 conditions were applied using three different types of dataset types, scientific, social media, and news, both datasets, and two different methods of choosing the answer, choosing the top answer generated from the QA system and voting from the top three answers generated from QA system. The other 3 conditions were the Closed-Domain QA system with different dataset types. The best results from the testing model were 76% using all datasets with voting from the top 3 answers outperforming by 16% from the closed-domain model.

Keywords

References

  1. Attia, Z. E., Arafa, W., and Gheith, M., "An automatic short answer correction system based on the course material", Int. J. Intell. Eng. Syst, Vol. 11, No. 3, 2018, pp. 159-163.  https://doi.org/10.22266/ijies2018.0630.17
  2. CDC, "Coronavirus Disease (COVID-19)", Centers for Disease Control and Prevention, 2020, https://www.cdc.gov/coronavirus/2019-ncov/faq.html. 
  3. WHO, "Situation Report-1 21 January 2020", World Health, 2020, 251. 
  4. Cui, L. and Lee, D., "CoAID: COVID-19 Healthcare Misinformation Dataset", arXiv preprint arXiv:2006.00885, 2020 
  5. Devlin, J., Chang, M. W., Lee, K., and Toutanova, K., "Bert: Pre-training of deep bidirectional transformers for language understanding", arXiv preprint arXiv:1810.04805, 2018. 
  6. Elhadad, M. K., Li, K. F., and Gebali, F., "Detecting Misleading Information on COVID-19" IEEE Access, Vol. 8, 2020, pp. 165201-165215.  https://doi.org/10.1109/ACCESS.2020.3022867
  7. Gilda, S., "Evaluating machine learning algorithms for fake news detection", In 2017 IEEE 15th Student Conference on Research and Development (SCOReD), 2017, pp. 110-115. 
  8. Granik, M. and Mesyura, V., "Fake news detection using naive Bayes classifier", In 2017 IEEE First Ukraine Conference on Electrical and Computer Engineering (UKRCON), 2017, pp. 900-903. 
  9. Jangid, J., "Overfitter/biobert_embedding", BioBERT-Embedding Github, 2020. https://github.com/Overfitter/biobert_embedding/blob/master/LICENSE. 
  10. Lee, J., Yoon, W., Kim, S., Kim, D., Kim, S., So, C. H., and Kang, J., "BioBERT: a pre-trained biomedical language representation model for biomedical text mining", Bioinformatics, Vol. 36, No. 4, 2020, pp. 1234-1240.  https://doi.org/10.1093/bioinformatics/btz682
  11. Lewis, P., Denoyer, L., and Riedel, S.. "Unsupervised question answering by cloze translation, arXiv preprint arXiv:1906.04980, 2019. 
  12. Li, B. and Han, L., "Distance weighted cosine similarity measure for text classification", International Conference on Intelligent Data Engineering and Auto mated Learning, Springer, Berlin, Heidelberg, 2013. 
  13. Maiya, A. S., "Ktrain: A Low-Code Library for Augmented Machine Learning", arXiv preprint arXiv:2004.10703, 2020. 
  14. Memon, S. A. and Carley, K. M., "Characterizing covid-19 misinformation communities using a novel twitter dataset", arXiv preprint arXiv:2008.00791, 2020 
  15. Moller, T., Reina, A., Jayakumar, R., and Pietsch, M., "COVID-QA: A Question Answering Dataset for COVID-19", In Proceedings of the 1st Workshop on NLP for COVID-19 at ACL 2020, 2020. 
  16. Semnani, S. J. and Pandey, M., "Revisiting the Open-Domain Question Answering Pipeline, arXiv preprint arXiv:2009.00914, 2020. 
  17. WHO, "Managing the COVID-19 infodemic: Promoting healthy behaviours and mitigating the harm from misinformation and disinformation", WHO.Int,, 2020, htps://www.who.int/news/item/23-09-2020-managing-the-covid-19-infodemic-promoting-healthy-behaviours-and-mitigating-the-harm-from-misinformation-and-disinformation. 
  18. WHO, "Let's flatten the infodemic curve. Who.Int, 2020a, https://www.who.int/news-room/spotlight/let-s-flatten-the-infodemic-curve. 
  19. Wang, L. L., Lo, K., Chandrasekhar, Y., Reas, R., Yang, J., Eide, D., ... and Mooney, P., "CORD-19: The Covid-19 Open Research Dataset, ArXiv., 2020. 
  20. World Health Organization (WHO), "Frequently Asked Questions on novel coronavirus-update", 2020, https://www.who.int/csr/disease/coronavirusinfections/faq_dec12/en. 
  21. World Health Organization, "COVID-19 weekly epidemiological update", 2020.