GNI Corpus Version 1.0: Annotated Full-Text Corpus of Genomics & Informatics to Support Biomedical Information Extraction |
Oh, So-Yeon
(Bioinformatics Laboratory, ELTEC College of Engineering, Ewha Womans University)
Kim, Ji-Hyeon (Bioinformatics Laboratory, ELTEC College of Engineering, Ewha Womans University) Kim, Seo-Jin (Bioinformatics Laboratory, ELTEC College of Engineering, Ewha Womans University) Nam, Hee-Jo (Bioinformatics Laboratory, ELTEC College of Engineering, Ewha Womans University) Park, Hyun-Seok (Bioinformatics Laboratory, ELTEC College of Engineering, Ewha Womans University) |
1 | Breckbaldwin. Coding chunkers as taggers: IO, BIO, BMEWO, and BMEWO+. Accessed 2018 Jul 27. Available from: https://lingpipe-blog.com/2009/10/14/coding-chunkers-as-taggers- io-bio-bmewo-and-bmewo/. |
2 | Chinchor NA. Overview of MUC-7. In: Seventh Message Understanding Conference (MUC-7): Proceedings of a Conference, 1998 Apr 29-May 1, Fairfax, VA. |
3 | Nadeau D, Sekine S. A survey of named entity recognition and classification. Lingvist Invest 2007;30:3-26. DOI |
4 | Kim JD, Ohta T, Teteisi Y, Tsujii J. GENIA corpus manual. Technical report TR-NLP-UT-2006-1. Tokyo: Tsujii Laboratory, University of Tokyo, 2006. |
5 | Cohen AM, Hersh WR. A survey of current work in biomedical text mining. Brief Bioinform 2005;6:57-71. DOI |
6 | Zweigenbaum P, Demner-Fushman D, Yu H, Cohen KB. Frontiers of biomedical text mining: current progress. Brief Bioinform 2007;8:358-375. DOI |
7 | Biber D, Conrad S, Reppen R. Corpus Linguistics: Investigating Language Structure and Use. Cambridge: Cambridge University Press, 1998. |
8 | Genomics and Informatics archives. Seoul: Korea Genome Organization, 2018. Accessed 2018 Jul 29. Available from: https://genominfo.org/articles/archive.php. |
9 | Creative Commons, Attribution-NonCommercial 4.0 International. Mountain View: Creative Commons, 2018. Accessed 2018 Jul 18. Available from: https://creativecommons.org/li- censes/by-nc/4.0/. |
10 | Hagedorn G, Mietchen D, Morris RA, Agosti D, Penev L, Berendsohn WG, et al. Creative Commons licenses and the non-commercial condition: implications for the re-use of biodiversity information. Zookeys 2011;(150):127-149. |
11 | Shinyama Y. PDFMiner.six: Python PDF parser and analyzer. San Francisco: GitHub Inc., 2018. Accessed 2018 July 17. Available from: https://github.com/pdfminer/pdfminer.six. |
12 | Bernardi L, Ratsch E, Kania R, Saric J, Rojas JH, Schatz BR, et al. Mining information for functional genomics. IEEE Intell Syst 2002;17:66-79. DOI |
13 | Bird S, Klein E, Loper E. Natural Language Processing with Python. Sebastopol: O'Reilly Media Inc., 2009. |
14 | Perkins J. Python Text Processing with NLTK 2.0 Cookbook. Birmingham: Packt Publishing, 2010. |
15 | Collier N, Mima H, Lee SZ, Ohta T, Tateisi Y, Yakushiji A, et al. The GENIA project: knowledge acquisition from biology texts. Genome Inform 2000;11:448-449. |
16 | Tsuruoka Y. GENIA tagger: part-of-speech tagging, shallow parsing, and named entity recognition for biomedical text. Tokyo: University of Tokyo, 2006. Accessed 2018 Jul 27. Available from: http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/ tagger. |
17 | Marcus MP, Marcinkiewicz MA, Santorini B. Building a large annotated corpus of English: The Penn Treebank. Comput Linguist 1993;19:313-330. |
18 | Abney S. Parsing by chunks. In: Principle-Based Parsing (Berwick R, Abney S, Tenny C, eds.). Dordrecht: Springer, 1991. pp. 257-278. |