Browse > Article
http://dx.doi.org/10.5808/GI.2018.16.4.e40

Opinion: Strategy of Semi-Automatically Annotating a Full-Text Corpus of Genomics & Informatics  

Park, Hyun-Seok (Bioinformatics Laboratory, ELTEC College of Engineering, Ewha Womans University)
Abstract
There is a communal need for an annotated corpus consisting of the full texts of biomedical journal articles. In response to community needs, a prototype version of the full-text corpus of Genomics & Informatics, called GNI version 1.0, has recently been published, with 499 annotated full-text articles available as a corpus resource. However, GNI needs to be updated, as the texts were shallow-parsed and annotated with several existing parsers. I list issues associated with upgrading annotations and give an opinion on the methodology for developing the next version of the GNI corpus, based on a semi-automatic strategy for more linguistically rich corpus annotation.
Keywords
biomedical text mining; corpus; text analytics;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 Genomics and Informatics archives. Seoul: Korea Genome Organization, 2018. Accessed 2018 Jul 29. Available from: https://genominfo.org/articles/archive.php.
2 Oh SY, Kim JH, Kim SJ, Nam HJ, Park HS. GNI Corpus version 1.0: annotated full-text corpus of Genomics & Informatics to support biomedical information extraction. Genomics Inform 2018;16:75-77.   DOI
3 Westergaard D, Stærfeldt HH, Tonsberg C, Jensen LJ, Brunak S. A comprehensive and quantitative comparison of text-mining in 15 million full-text articles versus their corresponding abstracts. PLoS Comput Biol 2018;14:e1005962.   DOI
4 Ian C, Wilfrid H. Mathematical logic. Vol. 3. Oxford: Oxford University Press, 2007.
5 POS Tagging (State of the art). Stroudsburg: Wiki of the Association for Computational Linguistics, 2016. Accessed 2018 Jul 29. Available from: https://aclweb.org/aclwiki/POS_Tagging_(State_of_the_art).
6 Foster J, Wagner J, van Genabith J. Adapting a WSJ-trained parser to grammatically noisy text. In: Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers, 2008 Jun 16-17, Columbus, OH, USA. Stroudsburg: Association for Computational Linguistics, 2008.
7 Mikolov T, Sutskever I, Chen K, Corrado G, Dean J. Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems 26 (Burges CJ, Bottou L, Welling M, Ghahramani Z, Weinberger KQ, eds.). Red Hook: Curran Associates Inc., 2013. pp. 3113-3119.
8 Lample G, Ballesteros M, Subramanian S, Kawakami K, Dyer C. Neural architectures for named entity recognition. Ithaca: arXiv, Cornell University, 2016. Accessed 2018 Jul 29. Available from: https://arxiv.org/abs/1603.01360.
9 Wang P, Qian Y, Soong FK, He L, Zhao H. Part-of-speech tagging with bidirectional long short-term memory recurrent neural network. Ithaca: arXiv, Cornell University, 2015. Accessed 2018 Jul 29. https://arxiv.org/abs/1510.06168.
10 Sharma A, Chaudhary DR. Character recognition using neural network. Int J Eng Trends and Technol 2013;4:662-667.
11 Garaas T, Xiao M, Pomplun M. Personalized spell checking using neural networks. Boston: University of Massachusetts Boston, 2007. Accessed 2018 Jul 29. Available from: https://www.cs.umb.edu/-marc/pubs/garaas_xiao_pomplun_HCII2007.pdf.