Organizing an in-class hackathon to correct PDF-to-text conversion errors of Genomics & Informatics 1.0 |
Kim, Sunho
(Bioinformatics & Natural Language Processing Laboratory, ELTEC College of Engineering, Ewha Womans University)
Kim, Royoung (Bioinformatics & Natural Language Processing Laboratory, ELTEC College of Engineering, Ewha Womans University) Nam, Hee-Jo (Bioinformatics & Natural Language Processing Laboratory, ELTEC College of Engineering, Ewha Womans University) Kim, Ryeo-Gyeong (Bioinformatics & Natural Language Processing Laboratory, ELTEC College of Engineering, Ewha Womans University) Ko, Enjin (Bioinformatics & Natural Language Processing Laboratory, ELTEC College of Engineering, Ewha Womans University) Kim, Han-Su (Bioinformatics & Natural Language Processing Laboratory, ELTEC College of Engineering, Ewha Womans University) Shin, Jihye (Bioinformatics & Natural Language Processing Laboratory, ELTEC College of Engineering, Ewha Womans University) Cho, Daeun (Bioinformatics & Natural Language Processing Laboratory, ELTEC College of Engineering, Ewha Womans University) Jin, Yurhee (Bioinformatics & Natural Language Processing Laboratory, ELTEC College of Engineering, Ewha Womans University) Bae, Soyeon (Bioinformatics & Natural Language Processing Laboratory, ELTEC College of Engineering, Ewha Womans University) Jo, Ye Won (Bioinformatics & Natural Language Processing Laboratory, ELTEC College of Engineering, Ewha Womans University) Jeong, San Ah (Bioinformatics & Natural Language Processing Laboratory, ELTEC College of Engineering, Ewha Womans University) Kim, Yena (Bioinformatics & Natural Language Processing Laboratory, ELTEC College of Engineering, Ewha Womans University) Ahn, Seoyeon (Bioinformatics & Natural Language Processing Laboratory, ELTEC College of Engineering, Ewha Womans University) Jang, Bomi (Bioinformatics & Natural Language Processing Laboratory, ELTEC College of Engineering, Ewha Womans University) Seong, Jiheyon (Bioinformatics & Natural Language Processing Laboratory, ELTEC College of Engineering, Ewha Womans University) Lee, Yujin (Bioinformatics & Natural Language Processing Laboratory, ELTEC College of Engineering, Ewha Womans University) Seo, Si Eun (Bioinformatics & Natural Language Processing Laboratory, ELTEC College of Engineering, Ewha Womans University) Kim, Yujin (Bioinformatics & Natural Language Processing Laboratory, ELTEC College of Engineering, Ewha Womans University) Kim, Ha-Jeong (Bioinformatics & Natural Language Processing Laboratory, ELTEC College of Engineering, Ewha Womans University) Kim, Hyeji (Bioinformatics & Natural Language Processing Laboratory, ELTEC College of Engineering, Ewha Womans University) Sung, Hye-Lynn (Bioinformatics & Natural Language Processing Laboratory, ELTEC College of Engineering, Ewha Womans University) Lho, Hyoyoung (Bioinformatics & Natural Language Processing Laboratory, ELTEC College of Engineering, Ewha Womans University) Koo, Jaywon (Bioinformatics & Natural Language Processing Laboratory, ELTEC College of Engineering, Ewha Womans University) Chu, Jion (Bioinformatics & Natural Language Processing Laboratory, ELTEC College of Engineering, Ewha Womans University) Lim, Juwon (Bioinformatics & Natural Language Processing Laboratory, ELTEC College of Engineering, Ewha Womans University) Kim, Youngju (Bioinformatics & Natural Language Processing Laboratory, ELTEC College of Engineering, Ewha Womans University) Lee, Kyungyeon (Bioinformatics & Natural Language Processing Laboratory, ELTEC College of Engineering, Ewha Womans University) Lim, Yuri (Bioinformatics & Natural Language Processing Laboratory, ELTEC College of Engineering, Ewha Womans University) Kim, Meongeun (Bioinformatics & Natural Language Processing Laboratory, ELTEC College of Engineering, Ewha Womans University) Hwang, Seonjeong (Bioinformatics & Natural Language Processing Laboratory, ELTEC College of Engineering, Ewha Womans University) Han, Shinhye (Bioinformatics & Natural Language Processing Laboratory, ELTEC College of Engineering, Ewha Womans University) Bae, Sohyeun (Bioinformatics & Natural Language Processing Laboratory, ELTEC College of Engineering, Ewha Womans University) Kim, Sua (Bioinformatics & Natural Language Processing Laboratory, ELTEC College of Engineering, Ewha Womans University) Yoo, Suhyeon (Bioinformatics & Natural Language Processing Laboratory, ELTEC College of Engineering, Ewha Womans University) Seo, Yeonjeong (Bioinformatics & Natural Language Processing Laboratory, ELTEC College of Engineering, Ewha Womans University) Shin, Yerim (Bioinformatics & Natural Language Processing Laboratory, ELTEC College of Engineering, Ewha Womans University) Kim, Yonsoo (Bioinformatics & Natural Language Processing Laboratory, ELTEC College of Engineering, Ewha Womans University) Ko, You-Jung (Bioinformatics & Natural Language Processing Laboratory, ELTEC College of Engineering, Ewha Womans University) Baek, Jihee (Bioinformatics & Natural Language Processing Laboratory, ELTEC College of Engineering, Ewha Womans University) Hyun, Hyejin (Bioinformatics & Natural Language Processing Laboratory, ELTEC College of Engineering, Ewha Womans University) Choi, Hyemin (Bioinformatics & Natural Language Processing Laboratory, ELTEC College of Engineering, Ewha Womans University) Oh, Ji-Hye (Bioinformatics & Natural Language Processing Laboratory, ELTEC College of Engineering, Ewha Womans University) Kim, Da-Young (Bioinformatics & Natural Language Processing Laboratory, ELTEC College of Engineering, Ewha Womans University) Park, Hyun-Seok (Bioinformatics & Natural Language Processing Laboratory, ELTEC College of Engineering, Ewha Womans University) |
1 | Garaas T, Xiao M, Pomplun M. Personalized spell checking using neural networks. Boston: University of Massachusetts Boston, 2011. Accessed 2020 Jul 20. Available from: https://www.cs.umb.edu/~marc/pubs/garaas_xiao_pomplun_HCII2007.pdf. |
2 | Genomics and Informatics archives. Seoul: Korea Genome Organization, 2018. Accessed 2018 Jul 29. Available from: https://genominfo.org/articles/archive.php. |
3 | Oh SY, Kim JH, Kim SJ, Nam HJ, Park HS. GNI Corpus Version 1.0: annotated full-text corpus of Genomics & Informatics to support biomedical information extraction. Genomics Inform 2018;16:75-77. DOI |
4 | Briscoe G, Mulligan C. Digital innovation: the hackathon phenomenon. Creativeworks London Working Paper No. 6. London: Creativeworks London, 2014. |
5 | Mays E, Damerau FJ, Mercer RL. Context based spelling correction. Inf Process Manag 1994;27:517-522. |
6 | Tong X, Evans DA. A statistical approach to automatic OCR error correction in context. In: Proceedings of the Fourth Workshop on Very Large Corpora (Ejerhed E, Dagan I, eds.), 1996 Aug 4, Copenhagen, Denmark. Copenhagen: University of Copenhagen, 1996. pp. 88-100. |
7 | Bassil Y, Alwani M. OCR post-processing error correction algorithm using Google online spelling suggestion. Preprint at https://arxiv.org/abs/1204.0191 (2012). |
8 | Mikolov T, Sutskever I, Chen K, Corrado G, Dean J. Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems 26 (NIPS 2013) (Burges CJ, Bottou L, Welling M, Ghahramani Z, Weinberger KQ, eds.). Red Hook: Curran Associates Inc., 2013. pp. 3111-3119. |
9 | Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, et al. Deep contextualized word representations. Preprint at https://arxiv.org/abs/1802.05365 (2018). |
10 | Varis K, Bradford D, Brimm D, Ganier L, Gerundt T, Rapp P, et al. WinMerge 2.14 Help. WinMerge, 2004-2013. Accessed 2020 Sep 3. Available from: https://manual.winmerge.org/. |
11 | Ahn JI, Jeong KJ, Ko MJ, Shin HJ, Chung HJ, Jeong HS, et al. High-concentration epigallocatechin gallate treatment causes endoplasmic reticulum stress-mediated cell death in HepG2 cells. Genomics Inform 2009;7:97-106. DOI |
12 | Shinyama Y. PDFMiner.six: Python PDF parser and analyzer. San Francisco: GitHub Inc., 2018. Accessed 2020 Jul 20. Available from: https://github.com/pdfminer/pdfminer.six. |
13 | Kissos I, Dershowitz N. OCR error correction using character correction and feature-based word classification. In: 2016 12th IAPR Workshop on Document Analysis Systems (DAS), 2016 Apr 11-14, Santorini, Greece. New York: Institute of Electrical and Electronics Engineers, 2016. |
14 | Foster J, Wagner J, van Genabith J. Adapting a WSJ-trained parser to grammatically noisy text. In: Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers, 2008 Jun 15-20, Columbus, OH, USA. Stroudsburg: Association for Computational Linguistics, 2008. pp. 221-224. |
15 | Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. Preprint at https://arxiv.org/abs/1301.3781 (2013). |
16 | Sharma A, Chaudhary DR. Character recognition using neural network. Int J Eng Trends Technol 2013;4:662-667. |
17 | Kim JM, Kim BG, Oh S. Evolutionary signature of information transfer complexity in cellular membrane proteomes. Genomics Inform 2009;7:111-121. DOI |