[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.5808/GI.2019.17.2.e19

Towards cross-platform interoperability for machine-assisted text annotation

de Castilho, Richard Eckart (UKP Lab, Technical University Darmstadt)
Ide, Nancy (Vassar College)
Kim, Jin-Dong (Database Center for Life Science, Research Organization of Information and Systems)
Klie, Jan-Christoph (UKP Lab, Technical University Darmstadt)
Suderman, Keith (Vassar College)

Publication Information

Abstract

In this paper, we investigate cross-platform interoperability for natural language processing (NLP) and, in particular, annotation of textual resources, with an eye toward identifying the design elements of annotation models and processes that are particularly problematic for, or amenable to, enabling seamless communication across different platforms. The study is conducted in the context of a specific annotation methodology, namely machine-assisted interactive annotation (also known as human-in-the-loop annotation). This methodology requires the ability to freely combine resources from different document repositories, access a wide array of NLP tools that automatically annotate corpora for various linguistic phenomena, and use a sophisticated annotation editor that enables interactive manual annotation coupled with on-the-fly machine learning. We consider three independently developed platforms, each of which utilizes a different model for representing annotations over text, and each of which performs a different role in the process.

Keywords

annotation software; biomedical text mining; interoperability;

Citations & Related Records

Reference

1	Hinrichs M, Zastrow T, Hinrichs EW. WebLicht: web-based LRT services in a distributed eScience infrastructure. In: Proceedings of the 7th International Conference on Language Resources and Evaluation (LERC 2010) (Calzolari N, Choukri K, Maegaard B, Mariani J, Odijk J, Piperidis S, et al., eds.), 2010 May 19-21, Valletta, Malta. Paris: European Language Resources Association, 2010. pp. 489-493.
2	Ide N, Pustejovsky J, Cieri C, Nyberg E, DiPersio D, Shi C, et al. The Language Application Grid. In: Proceedings of the 9th International Conference on Language Resources and Evaluation (LERC 2014) (Calzolari N, Choukri K, Declerck T, Loftsson H, Maegaaard B, Mariani J, et al., eds.), 2014 May, Reykjavik, Iceland. Paris: European Language Resources Association, 2014. pp.22-30.
3	Labropoulou P, Galanis D, Lempesis A, Greenwood MA, Knoth P, Eckart de Castilho R, et al. OpenMinTeD: a platform facilitating text mining of scholarly content. In: Proceedings of the 11th International Conference on Language Resources and Evaluation (LERC 2018) (Calzolari N, Choukri K, Cieri C, Declerck T, Goggi S, Hasida K, et al., eds.), 2018 May 7-12, Miyazaki, Japan. Paris: European Language Resources Association, 2018.
4	Ferrucci D, Lally A, Verspoor K, Nyberg E. Unstructured information management architecture (UIMA) version 1.0. OASIS Standard, 2009 Mar 2. Burlington: OASIS, 2009.
5	Rak R, Rowley A, Carter J, Ananiadou S. Development and analysis of NLP pipelines in Argo. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations (Butt M, Hussain S, eds.), 2013 Aug 4-9, Sofia, Bulgaria. Stroudsburg: Association for Computational Linguistics, 2013. pp. 115-120.
6	Savova GK, Masanz JJ, Ogren PV, Zheng J, Sohn S, Kipper-Schuler KC, et al. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc 2010;17:507-513. DOI
7	Cunningham H, Tablan V, Roberts A, Bontcheva K. Getting more out of biomedical documents with GATE's full lifecycle open source text analytics. PLoS Comput Biol 2013;9:e1002854. DOI
8	Kim JD, Wang Y. PubAnnotation: a persistent and sharable corpus and annotation repository. In: Proceedings of the 2012 Workshop on Biomedical Natural Language Processing (BioNLP 2012) (Cohen KB, Demner-Fushman D, Ananiadou S, Webber B, Tsujii J, Pestian J, eds.), 2012 Jun 8, Montreal, Canada. Stroudsburg: Association for Computational Linguistics, 2012. pp. 202-205.
9	Eckart de Castilho R, Gurevych I. A broad-coverage collection of portable NLP components for building shareable analysis pipelines. In: Proceedings of the Workshop on Open Infrastructures and Analysis Frameworks for HLT (Ide N, Grivolla J, eds.), 2014 Aug 23, Dublin, Ireland. Stroudsburg: Association for Computational Linguistics, 2014. pp. 1-11.
10	Furrer L, Jancso A, Colic N, Rinaldi F. OGER++: hybrid multi-type entity recognition. J Cheminform 2019;11:7. DOI
11	Verhagen M, Suderman K, Wang D, Ide N, Shi C, Wright J, et al. The LAPPS interchange format. In: WLSI 2015 Revised Selected Papers of the Second International Workshop on Worldwide Language Service Infrastructure (Murakami Y, Lin D, eds.), 2015 Jan 22-23, Kyoto, Japan. Lecture Notes in Computer Science, Vol. 9442. New York: Springer, 2016. pp. 33-47.
12	Ide N, Suderman K, Verhagen M, Pustejovsky J. The Language Application Grid Web Service Exchange Vocabulary. In: WLSI 2015 Revised Selected Papers of the Second International Workshop on Worldwide Language Service Infrastructure (Murakami Y, Lin D, eds.), 2015 Jan 22-23, Kyoto, Japan. Lecture Notes in Computer Science, Vol. 9442. New York: Springer, 2016. pp. 18-32.
13	Klie JC, Bugert M, Boullosa B, Eckart de Castilho R, Gurevych I. The INCEpTION platform: machine-assisted and knowledge-oriented interactive annotation. In: Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations (Zhao D, ed.), 2018 Aug 23-29, Santa Fe, New Mexico. Stroudsburg: Association for Computational Linguistics, 2018. pp. 5-9.
14	Gotz T, Suhre O. Design and implementation of the UIMA Common Analysis System. IBM Syst J 2004;43:476-489. DOI