DOI QR코드

DOI QR Code

An empirical evaluation of electronic annotation tools for Twitter data

  • Weissenbacher, Davy (Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania) ;
  • O'Connor, Karen (Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania) ;
  • Hiraki, Aiko T. (Database Center for Life Science, Research Organization of Information and Systems) ;
  • Kim, Jin-Dong (Database Center for Life Science, Research Organization of Information and Systems) ;
  • Gonzalez-Hernandez, Graciela (Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania)
  • 투고 : 2020.03.29
  • 심사 : 2020.06.16
  • 발행 : 2020.05.28

초록

Despite a growing number of natural language processing shared-tasks dedicated to the use of Twitter data, there is currently no ad-hoc annotation tool for the purpose. During the 6th edition of Biomedical Linked Annotation Hackathon (BLAH), after a short review of 19 generic annotation tools, we adapted GATE and TextAE for annotating Twitter timelines. Although none of the tools reviewed allow the annotation of all information inherent of Twitter timelines, a few may be suitable provided the willingness by annotators to compromise on some functionality.

키워드

참고문헌

  1. Shaban H. Twitter reveals its daily active user number for the first time. The Washington Post, 2019. Accessed 2020 Apr 30. Available from: https://www.washingtonpost.com/technology/2019/02/07/twitter-reveals-its-daily-active-user-numbers-first-time/.
  2. Rizzo G, Pereira B, Varga A, van Erp M, Basave AE. Lessons learnt from the Named Entity rEcognition and Linking (NEEL) challenge series. Semantic Web J 2017;8:667-700. https://doi.org/10.3233/SW-170276
  3. Workshop on Noisy User-generated Text (WNUT). Stroudsburg: Association for Computational Linguistics, 2020. Accessed 2020 Apr 30. Available from: https://www.aclweb.org/anthology/venues/wnut/.
  4. SemEval Portal. Stroudsburg: Association for Computational Linguistics, 2020. Accessed 2020 Apr 30. Available from: https://aclweb.org/aclwiki/SemEval_Portal.
  5. Social Media Mining for Health Applications (#SMM4H). Wordpress.com, 2020. Available from: https://healthlanguageprocessing.org/smm4h/.
  6. Lopez C, Partalas I, Balikas G, Derbas N, Martin A, Reutenauer C, et al. CAp 2017 challenge: Twitter Named Entity Recognition. Preprint at https://arxiv.org/abs/1707.07568 (2017).
  7. FIRE 2016 Microblog track. Information extraction from Microblogs posted during disasters. Forum for Information Retrieval Evaluation, 2016. Accessed 2020 Apr 30. Available from: https://sites.google.com/site/fire2016microblogtrack/information-extraction-from-microblogs-posted-during-disasters.
  8. Cresci S, La Polla MN, Tardelli S, Tesconi M. #tweeTag: a web-based annotation tool for Twitter data. Pisa: Istituto di Informatica e Telematica, 2016.
  9. General Architecture for test engineering. Sheffield: University of Sheffield, 2020. Accessed 2020 Apr 30. Available from: https://gate.ac.uk/.
  10. Brat rapid annotation tool. Brat, 2020. Accessed 2020 Apr 30. Available from: https://brat.nlplab.org/index.html.
  11. TextAE. TextAE, 2020. Accessed 2020 Apr 30. Available from: https://textae.pubannotation.org/.
  12. Neves M, Seva J. An extensive review of tools for manual annotation of documents. Brief Bioinform 2019 Dec 15 [Epub]. Available from: https://doi.org/10.1093/bib/bbz130.
  13. LightTag. LightTag, 2020. Accessed 2020 Apr 30. Available from: https://www.lighttag.io/.
  14. WebAnno. WebAnno, 2020. Accessed 2020 Apr 30. Available from: https://webanno.github.io/webanno/.
  15. YEDDA. San Francisco: GibHub, 2020. Accessed 2020 Apr 30. A lightweight collaborative text span annotation tool. Available from: https://github.com/jiesutd/YEDDA.
  16. Slate. A super-lightweight annotation tool for experts. San Francisco: GibHub, 2020. Accessed 2020 Apr 30. Available from: https://github.com/jkkummerfeld/slate.
  17. EHost. Annotation Tool: The extensible Human Oracle Suite of Tools (eHOST). San Francisco: GibHub, 2020. Accessed 2020 Apr 30. Available from: https://github.com/chrisleng/ehost.
  18. Golder S, Chiuve S, Weissenbacher D, Klein A, O'Connor K, Bland M, et al. Pharmacoepidemiologic evaluation of birth defects from health-related postings in social media during pregnancy. Drug Saf 2019;42:389-400. https://doi.org/10.1007/s40264-018-0731-6
  19. Teamware. GATE Teamware: collaborative annotation factories. Sheffield: University of Sheffield, 2020. Accessed 2020 Apr 30. Available from: https://gate.ac.uk/teamware/.
  20. Kholghi M, Sitbon L, Zuccon G, Nguyen A. Active learning reduces annotation time for clinical concept extraction. Int J Med Inform 2017;106:25-31. https://doi.org/10.1016/j.ijmedinf.2017.08.001
  21. Cunningham H, Maynard D, Bontcheva K, Tablan V, Dimitrov M, Dowman M, et al. Performance evaluation of language analysers. Sheffield: University of Sheffield, 2020. Accessed 2020 Apr 30. Available from: https://gate.ac.uk/releases/gate-5.1-beta1-build3397-ALL/doc/tao/splitch10.html#sec:eval:annotationdiff.
  22. PubAnnotation. Kashiwa: Database Center for Life Science, 2020. Accessed 2020 Apr 30. Available from: https://pubannotation.org/.