DOI QR코드

DOI QR Code

COVID-19 recommender system based on an annotated multilingual corpus

  • Barros, Marcia (Large-Scale Informatics Systems Laboratory, Faculdade de Ciencias, Universidade de Lisboa) ;
  • Ruas, Pedro (Large-Scale Informatics Systems Laboratory, Faculdade de Ciencias, Universidade de Lisboa) ;
  • Sousa, Diana (Large-Scale Informatics Systems Laboratory, Faculdade de Ciencias, Universidade de Lisboa) ;
  • Bangash, Ali Haider (Shifa College of Medicine, Shifa Tameer-e-Millat University) ;
  • Couto, Francisco M. (Large-Scale Informatics Systems Laboratory, Faculdade de Ciencias, Universidade de Lisboa)
  • Received : 2021.02.26
  • Accepted : 2021.08.12
  • Published : 2021.09.30

Abstract

Tracking the most recent advances in Coronavirus disease 2019 (COVID-19)-related research is essential, given the disease's novelty and its impact on society. However, with the publication pace speeding up, researchers and clinicians require automatic approaches to keep up with the incoming information regarding this disease. A solution to this problem requires the development of text mining pipelines; the efficiency of which strongly depends on the availability of curated corpora. However, there is a lack of COVID-19-related corpora, even more, if considering other languages besides English. This project's main contribution was the annotation of a multilingual parallel corpus and the generation of a recommendation dataset (EN-PT and EN-ES) regarding relevant entities, their relations, and recommendation, providing this resource to the community to improve the text mining research on COVID-19-related literature. This work was developed during the 7th Biomedical Linked Annotation Hackathon (BLAH7).

Keywords

Acknowledgement

This work was supported by FCT through funding of Deep Semantic Tagger (DeST) project (ref. PTDC/CCI-BIO/28685/2017) and LASIGE Research Unit (ref. UIDB/00408/2020 and ref. UIDP/00408/2020); and FCT and FSE through funding of PhD Scholarship, ref. 2020.05393.BD, PhD Scholarship, ref. SFRH/BD/128840/2017, and PhD Scholarship, ref. SFRH/BD/145221/2019.

References

  1. Chen Q, Allot A, Lu Z. LitCovid: an open database of COVID-19 literature. Nucleic Acids Res 2021;49:D1534-D1540. https://doi.org/10.1093/nar/gkaa952
  2. Lu Wang L, Lo K, Chandrasekhar Y, Reas R, Yang J, Eide D, et al. CORD-19: The Covid-19 Open Research Dataset. Preprint at: https://arxiv.org/abs/2004.10706 (2020).
  3. Barros M, Moitinho A, Couto FM. Using research literature to generate datasets of implicit feedback for recommending scientific items. IEEE Access 2019;7:176668-176680. https://doi.org/10.1109/access.2019.2958002
  4. Tworowski D, Gorohovski A, Mukherjee S, Carmi G, Levy E, Detroja R, et al. COVID19 Drug Repository: text-mining the literature in search of putative COVID19 therapeutics. Nucleic Acids Res 2021;49:D1113-D1121. https://doi.org/10.1093/nar/gkaa969
  5. Couto FM, Lamurias A. MER: a shell script and annotation server for minimal named entity recognition and linking. J Cheminform 2018;10:58. https://doi.org/10.1186/s13321-018-0312-9
  6. Sousa D, Couto FM. BiOnt: deep learning using multiple biomedical ontologies for relation extraction. In: Advances in Information Retrieval. ECIR 2020. Lecture Notes in Computer Science, Vol. 12036 (Jose JM, Yilmaz E, Magalhaes J, Castells P, Ferro N, Silva MJ, et al., eds.). Cham: Springer, 2020. pp. 367-374.
  7. Shani G, Gunawardana A. Evaluating recommendation systems. In: Recommender Systems Handbook (Ricci F, Rokach L, Shapira B, Kantor P, eds.). Boston: Springer, 2011. pp. 257-297.
  8. Sousa D, Lamurias A, Couto FM. A hybrid approach toward biomedical relation extraction training corpora: combining distant supervision with crowdsourcing. Database (Oxford) 2020;2020:baaa104. https://doi.org/10.1093/database/baaa104