DOI QR코드

DOI QR Code

Improving accessibility and distinction between negative results in biomedical relation extraction

  • Sousa, Diana (LASIGE, Departamento de Informatica, Faculdade de Ciencias, Universidade de Lisboa) ;
  • Lamurias, Andre (LASIGE, Departamento de Informatica, Faculdade de Ciencias, Universidade de Lisboa) ;
  • Couto, Francisco M. (LASIGE, Departamento de Informatica, Faculdade de Ciencias, Universidade de Lisboa)
  • Received : 2020.03.17
  • Accepted : 2020.05.26
  • Published : 2020.05.28

Abstract

Accessible negative results are relevant for researchers and clinicians not only to limit their search space but also to prevent the costly re-exploration of research hypotheses. However, most biomedical relation extraction datasets do not seek to distinguish between a false and a negative relation among two biomedical entities. Furthermore, datasets created using distant supervision techniques also have some false negative relations that constitute undocumented/ unknown relations (missing from a knowledge base). We propose to improve the distinction between these concepts, by revising a subset of the relations marked as false on the phenotype-gene relations corpus and give the first steps to automatically distinguish between the false (F), negative (N), and unknown (U) results. Our work resulted in a sample of 127 manually annotated FNU relations and a weighted-F1 of 0.5609 for their automatic distinction. This work was developed during the 6th Biomedical Linked Annotation Hackathon (BLAH6).

Keywords

References

  1. Blohm P, Frishman G, Smialowski P, Goebels F, Wachinger B, Ruepp A, et al. Negatome 2.0: a database of non-interacting proteins derived by literature mining, manual annotation and protein structure analysis. Nucleic Acids Res 2014;42:D396-D400. https://doi.org/10.1093/nar/gkt1079
  2. Kohler S, Carmody L, Vasilevsky N, Jacobsen JOB, Danis D, Gourdine JP, et al. Expansion of the Human Phenotype Ontology (HPO) knowledge base and resources. Nucleic Acids Res 2019;47:D1018-D1027. https://doi.org/10.1093/nar/gky1105
  3. Lamurias A, Clarke LA, Couto FM. Extracting microRNA-gene relations from biomedical literature using distant supervision. PLoS One 2017;12:e0171929. https://doi.org/10.1371/journal.pone.0171929
  4. Sousa D, Lamurias A, Couto FM. A silver standard corpus of human phenotype-gene relations. In: The 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019 Jun 2-7, Minneapolis, MN, USA. Stroudsburg: Association for Computational Linguistics, 2019. pp. 1487-1492.
  5. Kim JD, Wang Y, Fujiwara T, Okuda S, Callahan TJ, Cohen KB. Open Agile text mining for bioinformatics: the PubAnnotation ecosystem. Bioinformatics 2019;35:4372-4380. https://doi.org/10.1093/bioinformatics/btz227