DOI QR코드

DOI QR Code

PharmacoNER Tagger: a deep learning-based tool for automatically finding chemicals and drugs in Spanish medical texts

  • Received : 2019.03.15
  • Accepted : 2019.05.27
  • Published : 2019.06.30

Abstract

Automatically detecting mentions of pharmaceutical drugs and chemical substances is key for the subsequent extraction of relations of chemicals with other biomedical entities such as genes, proteins, diseases, adverse reactions or symptoms. The identification of drug mentions is also a prior step for complex event types such as drug dosage recognition, duration of medical treatments or drug repurposing. Formally, this task is known as named entity recognition (NER), meaning automatically identifying mentions of predefined entities of interest in running text. In the domain of medical texts, for chemical entity recognition (CER), techniques based on hand-crafted rules and graph-based models can provide adequate performance. In the recent years, the field of natural language processing has mainly pivoted to deep learning and state-of-the-art results for most tasks involving natural language are usually obtained with artificial neural networks. Competitive resources for drug name recognition in English medical texts are already available and heavily used, while for other languages such as Spanish these tools, although clearly needed were missing. In this work, we adapt an existing neural NER system, NeuroNER, to the particular domain of Spanish clinical case texts, and extend the neural network to be able to take into account additional features apart from the plain text. NeuroNER can be considered a competitive baseline system for Spanish drug and CER promoted by the Spanish national plan for the advancement of language technologies (Plan TL).

Keywords

References

  1. Krallinger M, Rabal O, Lourenco A, Oyarzabal J, Valencia A. Information retrieval and text mining technologies for chemistry. Chem Rev 2017;117:7673-7761. https://doi.org/10.1021/acs.chemrev.6b00851
  2. Yadav V, Bethard S. A survey on recent advances in named entity recognition from deep learning models. In: Proceedings of the 27th International Conference on Computational Linguistics (Bender EM, Derczynski L, Isabelle P, eds.), 2018 Aug 20-26, Santa Fe, New Mexico, USA. Stroudsburg: Association for Computational Linguistics, 2018. pp. 2145-2158.
  3. Kleinberg B, Mozes M, van der Toolen Y, Verschuere B. NEMANOS: named entity-based text anonymization for open science. OSF Preprints 2017 Jun 4 [Epub]. https://doi.org/10.31219/osf.io/w9nhb.
  4. Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. Ithaca: arXiv, Cornell University, 2013. Accessed 2019 May 1. Available from: http://arxiv.org/abs/1301.3781.
  5. Dernoncourt F, Lee JY, Szolovits P. NeuroNER: an easy-to-use program for named-entity recognition based on neural networks. Ithaca: arXiv, Cornell University, 2017. Accessed 2019 May 1. Available from: http://arxiv.org/abs/1705.05487.
  6. Krallinger M, Leitner F, Rabal O, Vazquez M, Oyarzabal J, Valencia A. CHEMDNER: the drugs and chemical names extraction challenge. J Cheminform 2015;7:S1. https://doi.org/10.1186/1758-2946-7-S1-S1
  7. Hawizy L, Jessop DM, Adams N, Murray-Rust P. ChemicalTagger: a tool for semantic text-mining in chemistry. J Cheminform 2011;3:17. https://doi.org/10.1186/1758-2946-3-17
  8. Usie A, Cruz J, Comas J, Solsona F, Alves R. CheNER: a tool for the identification of chemical entities and their classes in biomedical literature. J Cheminform 2015;7:S15. https://doi.org/10.1186/1758-2946-7-S1-S15
  9. Liu S, Tang B, Chen Q, Wang X. Drug name recognition: approaches and resources. Information 2015;6:790-810. https://doi.org/10.3390/info6040790
  10. Segura-Bedmar I, Martinez P, Segura-Bedmar M. Drug name recognition and classification in biomedical texts. A case study outlining approaches underpinning automated systems. Drug Discov Today 2008;13:816-823. https://doi.org/10.1016/j.drudis.2008.06.001
  11. Vazquez M, Krallinger M, Leitner F, Valencia A. Text mining for drugs and chemical compounds: methods, tools and applications. Mol Inform 2011;30:506-519. https://doi.org/10.1002/minf.201100005
  12. Ho-Dac LM, Tanguy L, Grauby C, Mby AH, Malosse J, Riviere L, et al. LITL at CLEF eHealth2016: recognizing entities in French biomedical documents. In: CLEF eHealth 2016 (Balog K, Cappellato L, Ferro N, Macdonald C, eds.), 2016 Sep 5-8, Evora, Portugal. hal-0136592. pp. 81-93.
  13. Lipton ZC, Berkowitz J, Elkan C. A critical review of recurrent neural networks for sequence learning. Ithaca: arXiv, Cornell University, 2015. Accessed 2019 May 1. Available from: http:// arxiv.org/abs/1506.00019.
  14. Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput 1997;9:1735-1780. https://doi.org/10.1162/neco.1997.9.8.1735
  15. Dernoncourt F, Lee JY, Uzuner O, Szolovits P. De-identification of patient notes with recurrent neural networks. J Am Med Inform Assoc 2017;24:596-606. https://doi.org/10.1093/jamia/ocw156
  16. Padro L, Stanilovsky E. FreeLing 3.0: towards wider multilinguality. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012) (Calzolari N, Choukri K, Declerck T, Dogan MU, Maegaard B, Mariani J, et al., eds.), 2012 May 21-27, Istanbul, Turkey. Paris: European Language Resources Association, 2012. pp. 2473-2479.